AILAB Blog: open-source

Showing posts with label open-source. Show all posts

5.08.2024

Open-Source Text-to-Speech (TTS)

There are several open-source Text-to-Speech (TTS) systems available, each with unique features and capabilities. Here's a list of some well-known open-source TTS projects:

Mozilla TTS - An open-source TTS engine based on deep learning techniques, developed by Mozilla as part of their Common Voice project. It focuses on creating natural-sounding speech using neural networks.
MaryTTS - A modular, multilingual TTS system developed at the Technische Universität Darmstadt. It supports several languages and is known for its flexibility and quality.
eSpeak - A compact open-source software speech synthesizer for English and other languages, known for its simplicity and small footprint.
Festival Speech Synthesis System - Developed by the University of Edinburgh, Festival offers a general framework for building speech synthesis systems as well as including examples of various modules.
Tacotron 2 (by Google) - Although not a complete TTS system on its own, Tacotron 2 is an open-source neural network architecture for speech synthesis. Google has published the research and some implementations are available.
Mimic (by Mycroft AI) - Mimic is an open-source TTS project that can produce high-quality speech. It has several versions, with Mimic 3 focusing on deep learning models.
Flite - A lightweight speech synthesis engine developed at Carnegie Mellon University, designed to run small devices.
ESPnet-TTS - Part of the ESPnet project, this is a neural network-based TTS system that aims to produce high-quality speech synthesis.

These projects vary greatly in terms of complexity, quality, and the languages they support. Some are more research-oriented, while others are aimed at end-users or developers looking to integrate TTS into their applications.

3.27.2024

Introducing DBRX: A New State-of-the-Art Open LLM

Databricks has created a new state-of-the-art open-source large language model (LLM) called DBRX. DBRX surpasses established open models on various benchmarks, including code, math, and general language understanding. Here's a breakdown of the key points:

What is DBRX?

Transformer-based decoder-only LLM trained with next-token prediction
Fine-grained mixture-of-experts (MoE) architecture (132B total parameters, 36B active parameters)
Pretrained on 12 trillion tokens of carefully curated text and code data
Uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA)
Achieves high performance on long-context tasks and RAG (Retrieval-Augmented Generation)

How does DBRX compare?

Outperforms GPT-3.5 on most benchmarks and is competitive with closed models like Gemini 1.0 Pro
Achieves higher quality scores on code (HumanEval) and math (GSM8k) compared to other open models

Benefits of DBRX

Open-source and available for download and fine-tuning
Efficient training process (4x less compute compared to previous models)
Faster inference compared to similar-sized models due to MoE architecture
Integrates with Databricks tools and services for easy deployment

Getting Started with DBRX

Available through Databricks Mosaic AI Foundation Model APIs (pay-as-you-go)
Downloadable from Databricks Marketplace for private hosting
Usable through Databricks Playground chat interface

Future of DBRX

Expected advancements and new features in the future
DBRX serves as a foundation for building even more powerful and efficient LLMs

Overall, DBRX is a significant development in the field of open LLMs, offering high-quality performance, efficient training, and ease of use.

1.30.2024

Unveiling H2O-Danube-1.8B: A Milestone in Language Model Efficiency

In the fast-evolving domain of natural language processing, the H2O-Danube-1.8B model emerges as a significant breakthrough. Developed by H2O.ai, this model stands on the shoulders of giants like Llama 2 and Mistral, propelling forward the efficiency and effectiveness of language models. With an impressive training regimen on 1 trillion tokens, H2O-Danube-1.8B defies conventional expectations, offering a compelling blend of performance and resourcefulness.

Despite its expansive training dataset, the true marvel of H2O-Danube-1.8B lies in its adept use of advanced techniques, including Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT), which refine its capabilities as a chat model. This innovation is underscored by its open-source availability under the Apache 2.0 license, inviting a broad spectrum of developers and researchers to engage with, improve upon, and tailor the model to new applications.

The model's prowess is not merely in its architectural innovations but also in its remarkable achievements across various benchmarks, including commonsense reasoning, world knowledge, and reading comprehension. These feats not only illustrate the model's robust understanding and interaction capabilities but also mark a pivotal moment for AI, where access to advanced technology is increasingly democratized.

10.01.2023

The Rise and Impact of Llama: An AI Revolution

It's been an exciting journey ever since we embarked on the Llama project. Llama 1 was a breakthrough, Llama 2 added more spice, and with the release of Code Llama, the momentum has been nothing short of astonishing.

A Recap of Llama's Journey

Within just a span of seven months since the introduction of Llama 1 and the subsequent unveiling of Llama 2 and Code Llama, the community's response has been overwhelming. To put it into perspective:

Llama-based models have been downloaded over 30 million times through Hugging Face.

A staggering 10 million of these downloads occurred in the last 30 days.

Drawing parallels with PyTorch, Llama is quickly evolving as a robust platform for global AI innovation.

The Llama Community's Exponential Growth

To say Llama has impacted the AI landscape would be an understatement. The growth has been characterized by:

Cloud Adoption: Giants like AWS, Google Cloud, and Microsoft Azure are hosting Llama models. Particularly, AWS's recent collaboration as the managed API partner for Llama 2 has been a game-changer in terms of accessibility.

Innovators' Choice: Startups and innovators like Anyscale, Replicate, and DoorDash are rooting for Llama as their foundational AI tool.

Open-Source Embrace: With over 7,000 derivatives on Hugging Face, the open-source community has enhanced model performance exponentially.

Booming Developer Community: Over 7,000 Llama-related projects are currently hosted on GitHub. From new tools to 'tiny' Llama versions for mobile platforms, the creativity knows no bounds.

Hardware Integration: Top-tier hardware platforms are optimizing for Llama, further enhancing its performance.

The release of Code Llama only solidified its presence, with rapid integration on many platforms, marking a pivotal moment for AI enthusiasts.

From Research to Global Phenomenon

Llama's origin was rooted in the power of large language models (LLMs). Initially developed by a team at FAIR, it sought to harness the prowess of LLMs for various innovative applications. The results? Groundbreaking improvements and diversifications by academic researchers and the wider community.

But Llama 1 was just the beginning. The need for broader accessibility brought Llama 2 to the forefront.

Our Philosophy Behind Releasing Llama Models

At Meta, we firmly believe in open source. The logic is simple:

Research: Harnessing collective wisdom to enhance AI capabilities.

Enterprise and Commercialization: Learning through startups and enterprises to uncover AI's vast potential.

Developer Ecosystem: Utilizing new tools and strategies emerging daily in the AI domain.

Meta has always been at the forefront of advocating for an open approach, and Llama is no exception.

Future Projections

With the AI realm advancing rapidly, here are our core focal points:

Multimodal Experiences: Beyond just text, AI can integrate various modes for richer experiences.

Safety and Responsibility: With AI's potential comes the imperative need for responsible development and application.

Community Emphasis: Like PyTorch, we visualize a developer community with a voice and agency, driving the future of AI innovation.

At AILab, we consistently utilize Llama2 for our daily operations. A significant portion of our projects are predicated on various Llama2 models. We would like to extend our gratitude to Meta for this invaluable opportunity.

9.17.2023

Introducing Falcon 180B: The Next-Gen Open-Source Language Model Surpassing Previous Benchmarks

The Hugging Face AI community announced the release of Falcon 180B, an open-source large language model (LLM) with 180 billion parameters trained on 3.5 trillion tokens. This latest LLM surpasses prior models, including the previously top-ranked LLaMA 2, in scale and performance. Falcon 180B, trained using Amazon SageMaker on 4,096 GPUs, competes closely with commercial models like Google's PaLM-2. The release signifies rapid advancement in LLMs, with Falcon 180B benefiting from techniques such as LoRAs and Nvidia’s Perfusion. It is expected to see further improvement as the community fine-tunes it.

Hardware requirements

Falcon 180B Training Full fine-tuning 5120GB 8x 8x A100 80GB

Falcon 180B Training LoRA with ZeRO-3 1280GB 2x 8x A100 80GB

Falcon 180B Training QLoRA 160GB 2x A100 80GB

Falcon 180B Inference BF16/FP16 640GB 8x A100 80GB

Falcon 180B Inference GPTQ/int4 320GB 8x A100 40GB

9.16.2023

SQLCoder: a state-of-the-art LLM for SQL generation

SQLCoder, an open-source product by Defog, converts natural language questions into SQL queries.
It surpasses the performance of many open-source models, even edging out models like gpt-3.5-turbo and text-davinci-003 which are 10 times its size.
You can test SQLCoder using the provided interactive demo.

Technical Details

SQLCoder is a 15B parameter Language Learning Model (LLM) that's a refined version of StarCoder.
It's optimized for hand-crafted SQL queries of varying complexity.
On certain individual database schemas, SQLCoder rivals or even surpasses GPT-4 in performance.

Motivation

Over the past three months, enterprises in healthcare, finance, and government have used SQLCoder.
The primary advantage: it can be self-hosted, ensuring sensitive data stays on the server.
The release is Defog's way of contributing back to the community, given they built upon existing models like StarCoder.

Approach

Defog crafted a unique dataset centered on text-to-SQL tasks derived from 10 varied schemas. An additional evaluation dataset was produced from 7 new schemas.
The dataset's complexity was ensured by selecting intricate schemas comprising 4-20 tables.
Each question was categorized based on difficulty, using a method inspired by the Spider dataset.
The model fine-tuning process was split into two stages, beginning with simpler questions, leading up to the more complex ones.

Evaluation

Assessing the accuracy of SQL queries is inherently tricky due to multiple valid solutions for a single query.
Therefore, Defog had to create a unique framework to gauge the correctness of SQL queries. They've open-sourced this framework and the accompanying dataset.

Results

SQLCoder excels against all notable models, save for GPT-4, based on Defog's evaluation mechanism.
Especially, it bests some models that are much larger in size.
For specific database schemas, its performance and responsiveness match or surpass OpenAI's GPT-4.

Future Prospects

Defog plans to enhance SQLCoder by:
Incorporating more curated data and broader questions.
Utilizing advanced training techniques like Reward Modeling and RLHF.
Introducing a specialized model for data analysis combining SQL and Python.

Exploration

The model can be explored and tested via Defog's interactive demo.

This summary encapsulates the primary features, approach, and future plans for SQLCoder by Defog.

Links:

SQL Coder Model