An ML technique that divides a model into multiple “expert” models.
Main components:
- The experts (the sub-learners/networks)
- A gating function
The output of the gate is used to combine the outputs of the experts in some way.
Given a learned gating network and the experts , one typical example is:
Where the gating network with weights is defined as:
In plain English, the outputs of the experts are combined in a weighted sum weighted by the gating scores.
A more modern way of using MoE is to do sparsely-gated mixture of experts.