Introducing Mixtral 8x7B: Mistral AI's Breakthrough Sparse Mixture-of-Experts Model

Mistral AI, on its steadfast mission to empower the developer community with cutting-edge open models, proudly presents Mixtral 8x7B—a high-quality sparse mixture of expert models (SMoE) with open weights. Under the Apache 2.0 license, Mixtral outshines benchmarks, surpassing Llama 2 70B with 6x faster inference and offering the best cost/performance trade-offs. This open-weight model proves to be a formidable competitor, even outperforming GPT3.5 on various standard benchmarks.

Mixtral Highlights:

  1. Handles a context of 32k tokens with grace.
  2. Multilingual capabilities: English, French, Italian, German, and Spanish.
  3. Demonstrates robust performance in code generation.
  4. Achieved an impressive score of 8.3 on MT-Bench as an instruction-following model.
  5. Pushing the Frontier of Open Models with Sparse Architectures

Mixtral is a decoder-only model utilizing a sparse mixture-of-experts network. With a unique feedforward block, it selects from 8 distinct parameter groups, enhancing model parameters while efficiently managing cost and latency. Despite its 46.7B total parameters, Mixtral utilizes only 12.9B parameters per token, maintaining processing speed and cost-effectiveness comparable to a 12.9B model.

Performance Comparison

Mixtral outshines Llama 2 70B and GPT3.5 across various benchmarks, offering a superior quality versus inference budget tradeoff. Detailed benchmarks reveal Mixtral's truthfulness and reduced biases compared to Llama 2, making it a strong contender in the open-source model landscape.

Instructed Models

Mistral introduces Mixtral 8x7B Instruct, optimized for careful instruction following. Scoring 8.30 on MT-Bench, it stands as the best open-source model, rivaling the performance of GPT3.5. Mistral can be fine-tuned to ban specific outputs, ensuring moderation in applications that demand it.

Open-Source Deployment Stack

To facilitate community usage, Mistral AI contributes changes to the vLLM project, integrating Megablocks CUDA kernels for efficient inference. Skypilot enables the deployment of vLLM endpoints on any cloud instance, providing accessibility to Mixtral.

Experience Mixtral on Our Platform

Mistral AI currently deploys Mixtral 8x7B behind the mistral-small endpoint, which is available in beta. Register now for early access to all generative and embedding endpoints.


Mistral AI extends gratitude to CoreWeave and Scaleway teams for their invaluable technical support during model training.

No comments:

Post a Comment