AILAB Blog: Neural Networks with MC-SMoE: Merging and Compressing for Efficiency

3.15.2024

Neural Networks with MC-SMoE: Merging and Compressing for Efficiency

The world of artificial intelligence is witnessing a significant stride forward with the introduction of MC-SMoE, a novel approach to enhance neural network efficiency. This technique, explored in the paper "Merge then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy," aims to revolutionize the way we handle Sparsely activated Mixture-of-Experts (SMoE) models.

Vanilla SMoE models often encounter two major hurdles: high memory usage, stemming from duplicating network layers into multiple expert copies, and redundancy in experts, as common learning-based routing policies tend to suffer from representational collapse. The critical question this paper addresses is whether we can craft a more compact SMoE model by consolidating expert information.

Conventional model merging methods have not been effective in expert merging for SMoE due to two key reasons: the overshadowing of critical experts by redundant information and the lack of appropriate neuron permutation alignment for each expert.

To tackle these issues, the paper proposes M-SMoE, which utilizes routing statistics to guide expert merging. This process begins with aligning neuron permutations for experts, forming dominant experts and their group members, and then merging every expert group into a single expert. The merging considers each expert's activation frequency as their weight, reducing the impact of less significant experts.

The advanced technique, MC-SMoE (Merge, then Compress SMoE), goes a step further by decomposing merged experts into low-rank and structurally sparse alternatives. This method has shown remarkable results across 8 benchmarks, achieving up to 80% memory reduction and a 20% reduction in floating-point operations per second (FLOPs) with minimal performance loss.

The MC-SMoE model is not just a leap forward in neural network design; it's a testament to the potential of artificial intelligence to evolve in more efficient and scalable ways.

Paper - "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"

AILAB Blog

3.15.2024

Neural Networks with MC-SMoE: Merging and Compressing for Efficiency

No comments:

Post a Comment