A gating approach to sparsely-gated MoE that routes tokens to only the top-k experts.
“Top-k” refers to the experts with the k-highest gating scores as decided by the gating function.
A gating approach to sparsely-gated MoE that routes tokens to only the top-k experts.
“Top-k” refers to the experts with the k-highest gating scores as decided by the gating function.