A gating approach to sparsely-gated MoE that routes tokens to only the top-k experts.

“Top-k” refers to the experts with the k-highest gating scores as decided by the gating function.