6.27.2024

Gemma 2 is now available to researchers and developers


Artificial Intelligence (AI) holds immense potential to tackle some of humanity's most pressing challenges. However, to truly harness this potential, it is crucial that AI tools are accessible to everyone. This philosophy is at the core of the Gemma initiative. Earlier this year, we introduced Gemma, a family of lightweight, state-of-the-art open models derived from the same research and technology behind the Gemini models. The Gemma family has grown to include CodeGemma, RecurrentGemma, and PaliGemma, each tailored for specific AI tasks and available through integrations with partners like Hugging Face, NVIDIA, and Ollama.

Today, we are excited to officially launch Gemma 2 for researchers and developers worldwide. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is more powerful and efficient than its predecessor, boasting significant safety enhancements. Remarkably, the 27B model rivals the performance of models over twice its size, a feat achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs. This new standard in open model efficiency and performance is set to revolutionize the AI landscape.

Gemma 2’s architecture has been redesigned for exceptional performance and inference efficiency. At 27B, it delivers the best performance in its size class and offers competitive alternatives to much larger models. The 9B model also leads its category, outperforming models like Llama 3 8B. For a detailed performance breakdown, refer to the technical report. Additionally, Gemma 2 offers unmatched efficiency and cost savings. It runs full-precision inference on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU, making high-performance AI more accessible and budget-friendly.

Optimized for blazing fast inference across various hardware, Gemma 2 runs efficiently on everything from powerful gaming laptops to high-end desktops and cloud-based setups. You can experience its full precision in Google AI Studio, unlock local performance with the quantized version on your CPU using Gemma.cpp, or deploy it on your home computer with an NVIDIA RTX or GeForce RTX via Hugging Face Transformers. This flexibility ensures that developers and researchers can seamlessly integrate Gemma 2 into their workflows.

Accessibility is a key feature of Gemma 2. It is available under a commercially-friendly Gemma license, allowing for innovation sharing and commercialization. The model is compatible with major AI frameworks, including Hugging Face Transformers, JAX, PyTorch, and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. Gemma 2 is also optimized with NVIDIA TensorRT-LLM for running on NVIDIA-accelerated infrastructure, and integration with NVIDIA’s NeMo is forthcoming. Fine-tuning options are available today with Keras and Hugging Face, with more parameter-efficient options in development.

Starting next month, Google Cloud customers can easily deploy and manage Gemma 2 on Vertex AI. Additionally, the new Gemma Cookbook offers practical examples and recipes to guide you in building applications and fine-tuning Gemma 2 models for specific tasks, making it easier to integrate Gemma with your preferred tools for tasks like retrieval-augmented generation.

Responsible AI development is a cornerstone of our approach. We provide developers and researchers with resources to build and deploy AI responsibly, including through our Responsible Generative AI Toolkit. The open-sourced LLM Comparator helps with in-depth evaluation of language models. Starting today, developers can use the companion Python library for comparative evaluations and visualize results in the app. We are also working on open sourcing our text watermarking technology, SynthID, for Gemma models.

Our robust internal safety processes ensure the integrity of Gemma 2. We filter pre-training data and conduct rigorous testing against comprehensive metrics to identify and mitigate biases and risks, publishing our results on public safety benchmarks. The first Gemma launch led to over 10 million downloads and numerous inspiring projects, like Navarasa’s AI model celebrating India’s linguistic diversity. With Gemma 2, developers can embark on even more ambitious projects, unlocking new levels of performance and potential.

Gemma 2 is now available in Google AI Studio for testing its full capabilities without hardware constraints. The model weights can be downloaded from Kaggle and Hugging Face Models, with Vertex AI Model Garden availability coming soon. To support research and development, Gemma 2 is free on Kaggle or via a free tier for Colab notebooks. First-time Google Cloud customers may qualify for $300 in credits, and academic researchers can apply for the Gemma 2 Academic Research Program for additional Google Cloud credits, with applications open through August 9.

No comments:

Post a Comment