Similar to top-k gating, except noise is added to the gating logits during training to: Promote exploration Improve load balancing