The Mystery of Miqu-1-70b: A Mistral Leak and its Implications in AI

The AI community recently buzzed with discussions about a mysterious AI model known as "miqu-1-70b." Speculations abound that this model could be a leaked version of Mistral's advanced AI model. This blog post delves into the details of this intriguing event, its implications, and the broader context in the AI industry.

Beginning of the saga

what we know for now
  • anon on /lmg/ posts a 70b model name miqu saying it's good
  • uses the same instruct format as mistral-instruct, 32k context
  • extremely good on basic testing, similar answers to mistral-medium on perplexity's api
  • miqu uses the llama2 tokenizer, from basic testing mistral medium seems to be using it as well (anons are comparing prompt token sizes)

Mistral reaction

In a significant development, Arthur Mensch, co-founder and CEO of Mistral, acknowledged that an over-enthusiastic employee of one of their early access customers leaked a quantized and watermarked version of an older model. This revelation confirmed a connection between "miqu-1-70b" and Mistral's AI models, though not directly tying it to the current version of Mistral-Medium.

Implications in the AI Community

The "miqu-1-70b" episode reflects the dynamic nature of AI development and distribution, especially in the open-source community. It underscores the challenges in controlling the dissemination of powerful AI models and sparks discussions about responsible sharing and usage of such technologies.

Model info:


The model scores highly on MT-bench right after the Mistral-medium.


Models and HuggingFace

The "miqu-1-70b" model's emergence and the subsequent revelations have stirred excitement and debate within the AI community. It highlights the thin line between innovation and control in the rapidly evolving field of artificial intelligence. As the industry continues to grow, events like these provide valuable lessons and insights into the future of AI development and distribution.


Unveiling H2O-Danube-1.8B: A Milestone in Language Model Efficiency

In the fast-evolving domain of natural language processing, the H2O-Danube-1.8B model emerges as a significant breakthrough. Developed by H2O.ai, this model stands on the shoulders of giants like Llama 2 and Mistral, propelling forward the efficiency and effectiveness of language models. With an impressive training regimen on 1 trillion tokens, H2O-Danube-1.8B defies conventional expectations, offering a compelling blend of performance and resourcefulness.

Despite its expansive training dataset, the true marvel of H2O-Danube-1.8B lies in its adept use of advanced techniques, including Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT), which refine its capabilities as a chat model. This innovation is underscored by its open-source availability under the Apache 2.0 license, inviting a broad spectrum of developers and researchers to engage with, improve upon, and tailor the model to new applications.

The model's prowess is not merely in its architectural innovations but also in its remarkable achievements across various benchmarks, including commonsense reasoning, world knowledge, and reading comprehension. These feats not only illustrate the model's robust understanding and interaction capabilities but also mark a pivotal moment for AI, where access to advanced technology is increasingly democratized.


Exploring MambaByte: A Leap in Language Modeling

In the quest to advance language models, a groundbreaking study by researchers at Cornell University introduces "MambaByte." This innovative model, detailed in their recent publication, marks a significant shift from traditional subword tokenization approaches to a more efficient token-free system.

MambaByte is unique in its operation directly on raw bytes, bypassing the bias associated with subword tokenization. This shift, while leading to longer sequences, is adeptly managed through the model's design, ensuring computational efficiency. Notably, MambaByte outperforms many existing byte-level models and shows competitive results against state-of-the-art subword Transformers, despite managing longer sequences.

The paper highlights MambaByte's efficiency, particularly in handling the computational challenges posed by longer byte sequences, a known issue for autoregressive Transformers. The model's architecture, based on a linear-time approach for sequence modeling, allows for faster inference and effective resource utilization.

In summary, MambaByte stands out as a promising alternative in the field of language modeling, particularly for tasks that benefit from token-free approaches. Its capability to efficiently process long sequences without compromising performance paves the way for more advanced and versatile language models in the future.

Read more


Prompt Engineering with Llama 2


Why now? The transformative potential of AI has been significantly advanced since Vaswani et al. (2017) introduced transformer neural networks, primarily designed for machine translation. This innovation led to the era of generative AI, characterized by diffusion models for image creation and large language models (LLMs) as deep learning networks programmed using natural language. Unlike traditional ML models, these LLMs do not require extensive training or tuning, heralding a new era of technological deployment and innovation. This process of programming language models using natural language to accomplish specific tasks is known as Prompt Engineering.

Llama Models

In 2023, Meta unveiled the Llama language models, including Llama Chat, Code Llama, and Llama Guard. These models represent the state-of-the-art in general-purpose LLMs and are available in various sizes:

Llama 2 Models

  • llama-2-7b: Base pretrained 7 billion parameter model
  • llama-2-13b: Base pretrained 13 billion parameter model
  • llama-2-70b: Base pretrained 70 billion parameter model
  • And several specialized chat and code fine-tuned models

Getting an LLM

Deploying large language models can be done through self-hosting, cloud hosting, or hosted APIs, each with its own advantages. Self-hosting offers privacy and security, cloud hosting provides customization, and hosted APIs are the most straightforward for beginners.

Hosted APIs

These are the simplest starting points for using LLMs. Key endpoints include:

  • completion: Generates a response to a given prompt.
  • chat_completion: Generates the next message in a message series, providing context for applications like chatbots.


LLMs process information in chunks called tokens, which are roughly equivalent to words. Each model has its tokenization scheme and a maximum context length that your prompt cannot exceed.

Notebook Setup

The guide includes a practical example using the Llama 2 chat model with Replicate and LangChain to set up a chat completion API.

Completion APIs

Llama 2 models tend to be verbose, explaining their rationale. We'll explore how to manage response length effectively.

Chat Completion APIs

This involves sending a list of structured messages to the LLM, providing it with context or history to continue the conversation.

LLM Hyperparameters

Temperature and top_p are two crucial hyperparameters that influence the creativity and determinism of the output.

Prompting Techniques

  • Explicit Instructions: Detailed instructions yield better results.
  • Stylization and Formatting: Adjusting the style or format of the response.
  • Restrictions: Limiting sources or types of information.
  • Zero-Shot and Few-Shot Prompting: Techniques using examples to guide the model's response.
  • Role Prompting: Assigning a specific role to the model for more consistent responses.
  • Chain-of-Thought: Encouraging step-by-step reasoning.
  • Self-Consistency: Enhancing accuracy by aggregating multiple responses.
  • Retrieval-Augmented Generation (RAG)
  • Incorporating external information into prompts for more accurate responses.

Program-Aided Language Models (PAL)

Combining LLMs with code generation for tasks like calculations.

Limiting Extraneous Tokens

Techniques to minimize unnecessary content in model responses.

Additional References

  • PromptingGuide.ai
  • LearnPrompting.org
  • Lil'Log Prompt Engineering Guide


OpenAI Unveils Groundbreaking Embedding Models and API Enhancements

OpenAI has recently announced significant updates to their AI models and API, marking a new era in machine learning and AI application development. The introduction of two new embedding models, text-embedding-3-small and text-embedding-3-large, promises enhanced performance in clustering and retrieval tasks. Additionally, updates to the GPT-4 Turbo models, including gpt-4-0125-preview and gpt-4-turbo-preview, showcase improvements in AI efficiency and capabilities. OpenAI also revised their text moderation model to ensure higher standards of privacy and security, assuring users that their data will not be used for model training. Furthermore, the new API usage management tools and reduced pricing for GPT-3.5 Turbo are aimed at better supporting the developer community. These advancements highlight OpenAI's commitment to pushing the boundaries of AI technology.

P.S. The new GPT-4 model version, claims it fixes the laziness in code generation


Exploring the New Frontiers: Microsoft's Copilot Pro Revolutionizes


Microsoft has once again pushed the boundaries of innovation with the introduction of Copilot Pro for its suite of Office AI apps. This groundbreaking tool is poised to transform how we interact with familiar applications like Word, Excel, PowerPoint, and Outlook.

At its core, Copilot Pro integrates advanced AI capabilities into these everyday tools, making them more intuitive and efficient. Imagine drafting emails in Outlook with an AI assistant that suggests content, or creating complex Excel spreadsheets with intelligent, AI-driven data analysis. This integration could drastically reduce the time and effort required for routine tasks.

The implications of Copilot Pro are vast for productivity. In a business setting, it could lead to more efficient workflow processes, enabling employees to focus on strategic thinking and creative problem-solving. For individual users, it's like having a personal assistant that understands the intricacies of Microsoft Office tools, providing support and enhancing the overall user experience.

However, with such advancements come concerns around privacy and job displacement. Microsoft will likely need to address how user data is handled and reassure users about the ethical use of AI in these applications. Moreover, the introduction of AI in workplace tools raises questions about the future role of human skills and expertise.

As we step into this new era of AI-enhanced productivity, it's clear that Microsoft's Copilot Pro is more than just an upgrade to Office apps. It's a glimpse into a future where AI and human intelligence work hand in hand to achieve more than ever before.

Stay tuned as we continue to explore the capabilities and impacts of this exciting development in the world of technology and productivity.


Retrieval-Augmented Generation for Large Language Models: A Survey

The landscape of Natural Language Processing (NLP) is rapidly evolving with the advent of Large Language Models (LLMs) like GPT-3 and its successors. Despite their formidable capabilities, these models encounter several practical challenges, such as the tendency to generate incorrect information (hallucinations), slow updates to their knowledge bases, and a general lack of transparency in their responses. Retrieval-Augmented Generation (RAG) addresses these issues by integrating the retrieval of relevant information from external knowledge bases before generating responses with LLMs.

The significance of RAG lies in its ability to improve the accuracy of answers and reduce the frequency of model-generated hallucinations, especially in tasks that demand extensive knowledge. It also allows for the easier integration of domain-specific knowledge, enhancing the model's adaptability to new or evolving information. This is achieved by combining the parametric knowledge of LLMs, which is learned during training and embedded within the model's parameters, with non-parametric knowledge from external databases.

This paper presents a comprehensive review of the development and implementation of RAG, highlighting three main paradigms:

  • Naive RAG: The basic form of RAG, which involves retrieving information and generating responses without much optimization.
  • Advanced RAG: An improved version that incorporates optimizations in the retrieval process and integrates pre- and post-retrieval processes.
  • Modular RAG: A more sophisticated and flexible approach that allows for the addition, removal, or reconfiguration of various components depending on the task at hand.

Each of these paradigms is dissected to understand the core components of RAG: the retriever, the generator, and the augmentation methods, with a focus on key technologies within each area.

Furthermore, the paper explores how to effectively evaluate RAG models, emphasizing key metrics and abilities and introducing the latest automatic evaluation framework. It culminates with a discussion on the future of RAG, touching upon directions for vertical optimization, horizontal scalability, and the broader technical stack and ecosystem of RAG technologies.

The evolution of RAG represents a significant stride toward more accurate, reliable, and transparent language models, marking it as one of the essential methods for implementing LLMs in real-world applications 


Unveiling Apple's ML-Ferret: Pioneering Multimodal AI in Image and Language Understanding

Apple's recent introduction of the ML-Ferret model marks a significant milestone in the field of artificial intelligence, particularly in the realm of Multimodal Large Language Models (MLLMs). Developed in collaboration with Cornell University, this open-source model integrates language comprehension with advanced image analysis, pushing the boundaries of AI technology.

Understanding ML-Ferret: A Technical Overview

The core functionality of ML-Ferret lies in its ability to analyze specific regions within images, identifying elements and integrating them into queries for contextual responses. This capability allows the model to not just recognize objects in an image but to provide deeper insights by leveraging surrounding elements. For instance, when highlighting an animal in a photo, Ferret can identify the species and offer related context based on other detected elements in the image.

Ferret operates on the cutting edge of technology, utilizing 8 Nvidia A100 GPUs. This hardware prowess enables it to describe small image regions with high precision and fewer errors, particularly when trained on the GRIT dataset. The GRIT dataset itself is a marvel, comprising over 1.1 million samples rich in spatial knowledge, ensuring Ferret's proficiency in handling complex multimodal tasks.

Practical Applications and Future Directions

The introduction of Ferret opens a world of possibilities for various applications, ranging from enhanced image search capabilities to assistive technology for the visually impaired. It could revolutionize educational tools, allowing interactive learning experiences, and even assist in robotics, helping machines understand commands involving object interactions.

Looking ahead, there are potential enhancements for Ferret, including increasing the model size for better performance and expanding the dataset collection to cover more varied and complex scenarios. This continuous development underscores Apple's commitment to advancing AI and offering groundbreaking solutions.

Ferret’s Impact on Apple Devices

The integration of Ferret into Apple devices could significantly enhance user experiences. From improved image-based interactions with Siri to augmented user assistance for accessibility, Ferret's capabilities might lead to a more intuitive and comprehensive search experience within Apple's ecosystem. For developers, Ferret offers an opportunity to create innovative applications across various domains by incorporating advanced image and language understanding.

Challenges and Scalability

Despite its potential, scaling Ferret poses certain challenges, especially in competing with larger models like GPT-4 due to infrastructure limitations. This situation calls for strategic decisions from Apple, potentially involving partnerships or a deeper commitment to open-source principles to leverage collective expertise and resources.


Apple's ML-Ferret represents a paradigm shift in AI, highlighting a nuanced understanding of visual content and language. This open-source approach not only invites collaboration and innovation but also reflects Apple's broader commitment to advancing AI technology. As Ferret's capabilities unfold, it holds the promise of reshaping how we interact with technology, emphasizing a more nuanced understanding of visual content in AI applications.


Mastering the Art of Prompt Engineering with GPT Models


In the rapidly evolving field of AI and machine learning, one area that has seen significant innovation is prompt engineering, especially with the advent of Generative Pre-trained Transformers (GPT). David Shapiro's "GPT Masterclass: 4 Years of Prompt Engineering in 16 Minutes" offers a deep dive into this fascinating world, outlining the crucial concepts and methods needed to master prompt engineering with language models like GPT and others.


Shapiro brings a wealth of experience, having been involved in prompt engineering since the era of GPT-2. Now, with GPT-4 revolutionizing how we interact with AI, understanding the nuances of prompt engineering has never been more critical.

The Three Types of Prompts

Shapiro explains that there are essentially three kinds of prompts in language model operations: reductive, transformational, and generative. These types encapsulate all other kinds of prompts and are foundational to understanding and mastering language models.

Reductive Operations:

  • Reductive operations involve taking a larger input and producing a smaller output.
  • Examples include summarization, which involves saying the same thing with fewer words, and extraction, commonly used in older NLP for tasks like question answering and named entity extraction.
  • Characterization is another aspect, where the language model characterizes either the text itself or the topic within the text. This can range from identifying whether a text is fiction, a scientific article, or code, to analyzing the code within a broader context.
  • Other forms of reductive operations include evaluations (measuring, grading, or judging content) and critiquing, which involves providing critical feedback to improve something.

Transformational Operations:

  • In transformational operations, the input and output are roughly the same size and/or meaning.
  • This includes reformatting, which changes the presentation of the content, and refactoring, a concept borrowed from programming, applied to both code and structured language.
  • Language change, restructuring, modification (like changing tone or style), and clarification (making something clearer) are other crucial transformational operations.

Generative Operations:

  • Generative operations, also known as expansion or magnification operations, involve a smaller input leading to a much larger output.
  • Drafting, planning, brainstorming, hypothesizing, and amplification are part of generative operations. These processes range from creating documents and planning projects to generating ideas and expanding on topics.

Understanding Bloom's Taxonomy in Language Models

Shapiro highlights the importance of Bloom's taxonomy in understanding the capabilities of language models. This taxonomy, comprising remember, understand, apply, analyze, evaluate, and create, shows that language models have attained most, if not all, of these cognitive capabilities.

The Concept of Latency and Emergence

Shapiro discusses the concepts of latency and emergence in language models. Latent content refers to the knowledge and capabilities embedded in the model, activated by correct prompting. Emergent capabilities, such as theory of mind, implied cognition, logical reasoning, and in-context learning, demonstrate the advanced intelligence of these models.

Hallucination Equals Creativity

A fascinating point Shapiro makes is the equivalence of hallucination and creativity in language models. This capability is not a flaw but a feature, showcasing the model's creative prowess.


David Shapiro's masterclass on prompt engineering with GPT models is a revelation in the field of AI and language processing. Understanding the three types of prompts and how they interact with the cognitive capabilities of language models opens up a world of possibilities for AI applications. As we venture deeper into the realm of advanced AI, mastering these concepts will be crucial for anyone looking to leverage the full potential of GPT and similar models.


The evolutionary tree of modern LLMs


Flourishing Canopy: The Era of Transformers
The canopy of our tree is dense with Transformer-based models, which have dominated the landscape since 2019. The GPT (Generative Pretrained Transformer) series, starting from GPT-1 and evolving rapidly to GPT-2 and GPT-3, has demonstrated a remarkable ability to generate human-like text. Meanwhile, models like T5 from Google and Microsoft’s Turing-NLG have pushed the boundaries of what's possible with language comprehension and generation.

Latest Blossoms: State-of-the-Art Models
Perched at the top of our tree are the latest and most advanced LLMs. Models like GPT-4 and others from various AI labs like OpenAI, Anthropic, and DeepMind are not only more powerful but also more nuanced in their understanding of language. They're capable of tasks ranging from writing essays to coding, and even creating art or music.

The Ecosystem: Open-Source vs. Closed-Source
An important aspect of our tree is the delineation between open-source and closed-source models. Open-source models, such as those from Hugging Face’s Transformers library, provide accessibility and transparency, allowing for widespread use and innovation. On the other hand, closed-source models, often developed by big tech companies, keep their inner workings under wraps, sometimes offering more powerful capabilities but less community insight and control.

Future Growth: The Path Ahead
As we look to the sky through the leaves of this tree, it’s clear that the future of LLMs holds even greater potential. With advancements in ethical AI, interpretability, and multi-modal capabilities, the next generation of LLMs is poised to be even more integrated into our digital lives.

The journey through the evolutionary tree of modern LLMs reveals a rapid and complex growth pattern, one that reflects both technological advancements and our deepening understanding of natural language. As we continue to nurture and develop these models, we can only imagine the heights they will reach and the ways they will transform our interaction with technology.


Revolution in Prompt Compression for Language Models: The Emergence of LLMLingua

LLMLingua is transforming the world of large language models (LLMs) with its innovative prompt compression technology. This approach allows LLMs to process information more efficiently, overcoming challenges like prompt length limits and high operational costs. By compressing prompts up to 20x with minimal performance loss, LLMLingua enables more effective utilization of LLMs in various applications. The technology integrates seamlessly with advanced NLP techniques, offering a practical solution for optimizing LLM performance. LLMLingua's advancements are not just technical feats; they represent a leap forward in making AI language processing more accessible and efficient.


Microsoft is offering free courses on AI

Microsoft - AI For Beginners Curriculum

12-week, 24-lesson curriculum exploring Artificial Intelligence (AI).

Covers Symbolic AI, Neural Networks, Computer Vision, Natural Language Processing, and more. Hands-on lessons, quizzes, and labs included.


Introduction to Artificial Intelligence

This course helps you grasp key concepts in artificial intelligence.

Designed for project managers, product managers, directors, executives, and students starting a career in AI.


What Is Generative AI?

Generative AI expert Pinar Seyhan Demirdag covers the basics of generative AI, with topics including:

• What it is

• How it works

• Different types of models

• Future predictions and ethical implications


Generative AI: The Evolution of Thoughtful Online Search

Learn more about the core concepts of generative AI-driven reasoning engines and how they differ from search engine strategy.


Streamlining Your Work with Microsoft Bing Chat

Discover the power of AI chatbots.

In this course, instructor Jess Stratton teaches you how to use Microsoft Bing Chat effectively.


Ethics in the Age of Generative AI

Learn how to address ethical concerns when deploying generative AI tools and products.



Apple's MLX Framework: Harnessing the Power of Apple Silicon for AI Innovation



Apple's foray into the AI development landscape has been bolstered by the introduction of the MLX Framework, a powerful tool designed specifically for Apple Silicon. This open-source framework, developed by Apple's machine learning research team, represents a significant step in AI and machine learning, offering developers a robust platform for creating and deploying sophisticated AI models on Apple devices.

MLX Framework Overview

At its core, MLX is an array framework optimized for machine learning on Apple's processors. It stands out for its efficient and flexible approach to machine learning, providing a platform that is particularly conducive for developers working with Apple hardware. The framework is inspired by popular platforms such as PyTorch, Jax, and ArrayFire, yet introduces unique features like a unified memory model, which allows arrays to live in shared memory, simplifying operations across different device types without the need for data copies.

Key Features and Capabilities

  1. Familiar APIs: MLX's design closely mirrors that of NumPy and PyTorch, offering a Python API as well as a fully-featured C++ API. This familiarity is crucial in lowering the learning curve for developers transitioning to MLX.
  2. Innovative Memory Model: A standout feature of MLX is its unified memory model. This approach means that arrays exist in shared memory, enabling operations on any supported device type without moving data. This is particularly beneficial for developers leveraging the integrated GPU in Apple Silicon.
  3. Efficient Computation: MLX supports lazy computation, meaning arrays are only materialized when necessary, enhancing computational efficiency. Additionally, its dynamic graph construction allows for changes in the shapes of function arguments without triggering slow compilations.
  4. Advanced AI Model Support: The framework is capable of supporting a range of AI models, including transformer language models, large-scale text generation, image generation, and speech recognition. This versatility makes it a valuable tool for a wide array of machine learning tasks.

Practical Applications

MLX shines in practical applications such as language model training, text generation, image generation, and speech recognition. It outperforms other frameworks like PyTorch in certain benchmarks, particularly in image generation speeds and larger batch sizes.


Apple's MLX Framework marks a significant milestone in the AI development sphere, especially for those working within the Apple ecosystem. Its introduction not only addresses technical challenges but also opens new avenues for AI and machine learning research and development on Apple devices. The framework’s design, inspired by existing popular platforms, combined with its unique features, positions it as a compelling choice for machine learning researchers and developers keen on exploring AI innovations on Apple hardware.

Further Exploration

For those interested in delving deeper into MLX and its capabilities, the GitHub repository for MLX provides extensive resources, including documentation, examples, and detailed commit information. This repository is a valuable resource for anyone looking to explore the practical applications and inner workings of the MLX Framework.