Similar to top-k gating, except noise is added to the gating logits during training to:

  • Promote exploration
  • Improve load balancing