An ML technique that divides a model into multiple “expert” models.

Main components:

The output of the gate is used to combine the outputs of the experts in some way.

Given a learned gating network and the experts , one typical example is:

Where the gating network with weights is defined as:

In plain English, the outputs of the experts are combined in a weighted sum weighted by the gating scores.

A more modern way of using MoE is to do sparsely-gated mixture of experts.