Monday, October 2, 2023 Lecture, Fall 2023, MIT 6.5940

Large generative models (e.g., large language models, diffusion models) have shown remarkable performances, but their enormous scale demands significant computation and memory resources. To make them more accessible, it is crucial to improve their efficiency. This course will introduce efficient deep learning computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning. Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on the laptop.

Sunday, October 1, 2023

The Rise and Impact of Llama: An AI Revolution

It's been an exciting journey ever since we embarked on the Llama project. Llama 1 was a breakthrough, Llama 2 added more spice, and with the release of Code Llama, the momentum has been nothing short of astonishing.

A Recap of Llama's Journey

Within just a span of seven months since the introduction of Llama 1 and the subsequent unveiling of Llama 2 and Code Llama, the community's response has been overwhelming. To put it into perspective:

Llama-based models have been downloaded over 30 million times through Hugging Face.

A staggering 10 million of these downloads occurred in the last 30 days.

Drawing parallels with PyTorch, Llama is quickly evolving as a robust platform for global AI innovation.

The Llama Community's Exponential Growth

To say Llama has impacted the AI landscape would be an understatement. The growth has been characterized by:

Cloud Adoption: Giants like AWS, Google Cloud, and Microsoft Azure are hosting Llama models. Particularly, AWS's recent collaboration as the managed API partner for Llama 2 has been a game-changer in terms of accessibility.

Innovators' Choice: Startups and innovators like Anyscale, Replicate, and DoorDash are rooting for Llama as their foundational AI tool.

Open-Source Embrace: With over 7,000 derivatives on Hugging Face, the open-source community has enhanced model performance exponentially.

Booming Developer Community: Over 7,000 Llama-related projects are currently hosted on GitHub. From new tools to 'tiny' Llama versions for mobile platforms, the creativity knows no bounds.

Hardware Integration: Top-tier hardware platforms are optimizing for Llama, further enhancing its performance.

The release of Code Llama only solidified its presence, with rapid integration on many platforms, marking a pivotal moment for AI enthusiasts.

From Research to Global Phenomenon

Llama's origin was rooted in the power of large language models (LLMs). Initially developed by a team at FAIR, it sought to harness the prowess of LLMs for various innovative applications. The results? Groundbreaking improvements and diversifications by academic researchers and the wider community.

But Llama 1 was just the beginning. The need for broader accessibility brought Llama 2 to the forefront.

Our Philosophy Behind Releasing Llama Models

At Meta, we firmly believe in open source. The logic is simple:

Research: Harnessing collective wisdom to enhance AI capabilities.

Enterprise and Commercialization: Learning through startups and enterprises to uncover AI's vast potential.

Developer Ecosystem: Utilizing new tools and strategies emerging daily in the AI domain.

Meta has always been at the forefront of advocating for an open approach, and Llama is no exception.

Future Projections

With the AI realm advancing rapidly, here are our core focal points:

Multimodal Experiences: Beyond just text, AI can integrate various modes for richer experiences.

Safety and Responsibility: With AI's potential comes the imperative need for responsible development and application.

Community Emphasis: Like PyTorch, we visualize a developer community with a voice and agency, driving the future of AI innovation.

At AILab, we consistently utilize Llama2 for our daily operations. A significant portion of our projects are predicated on various Llama2 models. We would like to extend our gratitude to Meta for this invaluable opportunity.

Saturday, September 30, 2023

ChatGPT's Newest Upgrade: Real-time Web Browsing

In a world that values current and reliable information, OpenAI has stepped up its game with the latest enhancement to ChatGPT. The beloved AI, which previously had a knowledge cutoff in September 2021, is no longer restricted to that timeline. Excitingly, ChatGPT can now actively browse the internet, ensuring users receive up-to-the-minute and authoritative insights.

What does this mean for users? It means that every query can now be supplemented with direct links to sources from the web. So, not only will you get the vast knowledge already embedded in ChatGPT, but you'll also have the added benefit of real-time, sourced data from the expansive digital universe.

The future of AI-assisted searches and data extraction looks brighter with this evolution. Dive in and explore the limitless bounds of information with the newly empowered ChatGPT!

Friday, September 29, 2023

Machine Learning for Everybody – Full Course


"Machine Learning for Everybody – Full Course" is a comprehensive guide designed to introduce beginners to the fascinating world of machine learning (ML). This course takes you step-by-step from the foundational concepts to advanced techniques, ensuring that you gain a deep understanding of how ML algorithms work and how they can be applied to real-world scenarios. With illustrative examples, hands-on projects, and clear explanations, this course is perfect for anyone looking to dive into ML, whether you're a student, professional, or just a curious learner.

Thursday, September 28, 2023

Bridging The AI Gap: Microsoft and Meta Join Forces

The rapid development of AI technology is transforming the way we engage with the digital world, and it’s clear that the power of collaboration can amplify this transformation.

Last week, we provided a glimpse into our vision of creating a seamless AI copilot experience aimed at helping individuals effortlessly tackle any task. At the heart of this vision lies Bing. It's not just a search engine; it’s the underpinning of our AI experiences, ensuring they are deeply rooted in the freshest web data and information available.

While Microsoft’s AI innovations power a range of products within our ecosystem, our mission doesn’t stop there. We pride ourselves on being more than just a product company; we’re a platform that empowers others to realize their AI aspirations. This mindset has opened the doors to some exhilarating partnerships, and today, I’m overjoyed to share one such development.

We’re embarking on a new journey with Meta. Our collaboration will see the integration of Bing into Meta AI’s chat experiences. This means that users will receive answers that are not only accurate but also in tune with real-time search data. From engaging with Meta AI to chatting on platforms like WhatsApp, Messenger, and Instagram, users will witness an enriched AI interaction.

Our commitment to Meta is a testament to our shared ambition: harnessing AI to foster innovation and enhance user experiences. As we further this partnership, our primary goal remains - to infuse the magic of powerful and relevant AI into the tools and platforms that are indispensable to people’s daily lives.

Here's to shaping a future where technology intuitively complements every facet of our lives.

OpenAI and Jony Ive's Potential Collaboration on an AI Device

In the ever-evolving world of technology, leaders from top-notch firms often discuss possible innovations, and it seems like Jony Ive, the design genius behind many of Apple's iconic products, and Sam Altman, the head of OpenAI, are no exception.

Recent reports suggest that the duo is brainstorming a novel device with AI capabilities. Interestingly, this isn’t just a discussion between Altman and Ive. Masayoshi Son, the visionary behind SoftBank, also weighed in on the concept. However, whether Son will continue to be involved in this potential project remains uncertain.

Details about the device remain shrouded in mystery. The design, functionalities, and even the decision to take the idea from concept to reality are yet to be ascertained. However, the fact that both Ive and Altman have discussed potential designs does pique interest.

If realized, such a device could bolster OpenAI’s standing in the tech industry, providing them with a significant edge. The question of who would take the responsibility to release this device is also in the air. While it remains speculative, Sam Altman's conversation with the manufacturer, Humane, indicates potential collaboration routes.

As always, such discussions amongst tech titans often lead to innovative products that can redefine user experiences. It remains to be seen what this collaboration might bring to the world of AI and technology.

At AILab, we have been diligently developing our AI device, Pocket AI, for over six months.

We are on track to release it in the coming year.

Stay tuned for more updates on this intriguing development!

Wednesday, September 27, 2023

Microsoft's Leap into Advanced AI: Bing’s Upgrades and Beyond

In a recent event in New York, Microsoft unveiled a series of AI-driven enhancements to Bing and various Windows features, signaling the tech giant's commitment to staying at the forefront of artificial intelligence innovation.

Bing Welcomes DALL-E 3

One of the headline announcements was the integration of OpenAI's DALL-E 3 model into Bing. This advancement follows Microsoft’s previous step, where it enabled consumers to generate images using DALL-E in Bing Chat earlier this year. At that time, Microsoft remained mum about the specific DALL-E version but has now confirmed the transition to DALL-E 3. This means users can expect more intricate image renderings, with a particular focus on the nuances of features like fingers, eyes, and shadows.

Promoting Responsible AI Use

Microsoft is not just aiming for better AI capabilities; it is equally focused on responsible AI usage. The latest iteration will see the addition of invisible digital watermarks on all AI-generated images, aptly termed as Content Credentials. Backed by cryptographic measures and abiding by the standards of the "Coalition for Content Provenance and Authenticity (C2PA)", this watermarking ensures greater transparency in the realm of AI images. It’s worth noting that other tech giants like Adobe, Intel, and Sony are also backing the C2PA initiative.

A More Personal Bing Experience

Bing is also evolving to offer a more tailored search experience. Drawing upon your prior interactions with Bing Chat, the search engine will now provide answers that align more with your personal interests. Microsoft illustrates this with a simple example: If you've previously searched for your favorite sports team, Bing might notify you if that team has a match in a city you plan to visit.

Although this personalization might raise eyebrows, Microsoft assures users that they have the option to opt-out. This means that if someone isn’t keen on having their chat history influence their search results, they can easily turn this feature off.

Making Searches More Efficient

Microsoft's research suggests that a significant chunk of users - more than 60%, to be exact - end up modifying their initial search query multiple times. This often arises due to the lack of personalized context. By tapping into a user's previous searches or current research trends, Microsoft believes the search process can be made more seamless and efficient.

Expansion to Microsoft 365

Lastly, the tech behemoth announced that Bing Chat Enterprise will now support multimodal Visual Search and Image Creator. This is great news for the 160 million-plus Microsoft 365 users who will soon benefit from enhanced AI chatbot capabilities in their workplace.

In Conclusion

Microsoft's recent announcements underscore their commitment to not only advancing AI capabilities but also ensuring its responsible use. As the line between technology and daily life continues to blur, it’s reassuring to see tech leaders like Microsoft prioritize both innovation and ethics. As users, all we can do is eagerly await these features and perhaps, keep tweaking those search queries a little less.

Tuesday, September 26, 2023

ChatGPT: The Next Evolution with Voice and Image Capabilities

OpenAI is thrilled to announce the rollout of new voice and image features in ChatGPT! This evolution offers a more intuitive interaction, allowing users to voice chat with ChatGPT and visually show its context by sharing images.

Broadening the Horizons: The What and Why

Using these new features, users can:

  • Snap and Share: Whether it's a fascinating landmark while traveling or a snapshot of the fridge's contents, ChatGPT can provide insights, recipes, and more.
  • Math Homework Assistance: Parents can help their children by snapping a photo of a math problem and receiving hints.
  • Availability: Over the next two weeks, Plus and Enterprise users can look forward to accessing these voice and image features. Voice capability will be available on both iOS and Android, while the image feature will be available across all platforms.

Diving Deeper into the Features

1. Engage in Voice Conversations with ChatGPT

Users can now verbally converse with ChatGPT, opening a plethora of opportunities such as bedtime stories or settling debates.

Getting Started with Voice:

  • Navigate to Settings → New Features on the mobile app.
  • Opt into voice conversations.
  • Tap the headphone button on the home screen and choose a voice from five options.

This innovation is backed by a new text-to-speech model and leverages Whisper, OpenAI's open-source speech recognition system.

2. Chat About Images

By tapping the photo button, users can now provide ChatGPT with visual context.

Getting Started with Images:

  • For iOS or Android users, tap the plus button first.
  • Share the desired image or use the drawing tool for more specificity.

The image understanding hinges on the prowess of multimodal GPT-3.5 and GPT-4 models.

Safety and Gradual Deployment

At OpenAI, our mission is to foster AGI that's both safe and beneficial. Here's our approach:


While voice technology heralds immense potential, it can also be misused. We are committed to limiting its scope to specific use cases, such as voice chat. Notable collaborations, such as with Spotify, are leveraging this technology responsibly.

Image Input:

Challenges with vision-based models include hallucinations and high-stakes interpretations. To ensure responsible deployment, we've taken significant measures:

  • User Experience: Collaborating with "Be My Eyes," an app for the visually impaired, has enriched our understanding of the feature's practical applications and limitations.
  • Technical Safeguards: We've curtailed ChatGPT's capability to analyze or comment on individuals to uphold privacy.

Feedback from real-world usage will be paramount in refining these safeguards.

Model Limitations:

ChatGPT excels in specific domains but has limitations, particularly with non-English, non-roman scripts. Users are urged to use ChatGPT responsibly, especially for specialized topics.

Expanding Access

The excitement doesn't end here! Following the initial rollout to Plus and Enterprise users, we're eager to introduce these capabilities to a broader user base, including developers.

Stay tuned, and dive into the next-gen ChatGPT experience!

Monday, September 25, 2023

Diving into Deep Learning with PyTorch: A Beginner’s Guide

In this course, you learn all the fundamentals to get started with PyTorch and Deep Learning.

Deep Learning, with its potential to transform industries and the way we approach data, has taken the tech world by storm. If you've been curious about this revolutionary field and have been seeking a comprehensive introduction, then you're in the right place.

Why PyTorch?

PyTorch, developed by Facebook's AI Research lab, has rapidly gained popularity among researchers and developers alike. It is recognized for its dynamic computation graph, which means the graph builds on-the-fly as operations are created, making it highly flexible and intuitive. This is particularly useful for those just beginning their deep learning journey, as it allows for easy debugging and a more natural understanding of the flow of operations.

What Will You Learn?

In this course, you'll be taken on a deep dive into the fascinating world of deep learning. Some highlights include:

Understanding the Basics: Grasp the fundamental concepts of neural networks, how they're structured, and how they function.
PyTorch Essentials: Get hands-on experience with PyTorch's tensors, autograd, and other essential components.
Building Neural Networks: By the end of this course, you'll be constructing your very own neural networks, and training them to recognize patterns, images, and more.
Practical Applications: Witness the real-world utility of deep learning as you work on exciting projects and real-life datasets.
Beginner-Friendly Approach

This course is crafted keeping beginners in mind. Whether you're entirely new to programming, or an experienced developer wanting to switch to deep learning, you'll find the content accessible and engaging. The blend of theory and hands-on exercises ensures that you not only learn but also apply your newfound knowledge practically.


With the increasing demand for professionals skilled in deep learning and AI, there's no better time than now to dive in. By familiarizing yourself with PyTorch and deep learning fundamentals through this course, you're equipping yourself with the tools and knowledge necessary to be at the forefront of technological innovation.

Get started today, and embark on a journey of endless learning and opportunities!

Sunday, September 24, 2023

Unlocking Creative Horizons: DALL-E 3's Integration with ChatGPT and Enhanced Safety Measures

OpenAI’s DALL-E 3: The Next Evolution in Generative AI Visual Art

OpenAI has once again made a groundbreaking move in the realm of AI-driven art with the announcement of DALL-E 3, the third iteration of its generative AI visual art platform. With DALL-E’s proven capability to convert text prompts into artful images, this new version promises enhanced contextual understanding and user-friendly features.

What’s New with DALL-E 3?

One of the most exciting updates is the seamless integration of DALL-E 3 with ChatGPT. This feature allows users to leverage ChatGPT for generating detailed prompts, a task that could previously be a hurdle for those not adept at crafting specific prompts. By initiating a dialogue with ChatGPT, users can have the chatbot craft a descriptive paragraph which DALL-E 3 then interprets into creative visuals.

A striking demo was showcased to The Verge where Aditya Ramesh, the spearhead of the DALL-E team, used ChatGPT to brainstorm a logo for a hypothetical ramen restaurant situated in the mountains. The result? An imaginative art piece featuring a mountain adorned with ramen-inspired snowcaps, a broth-resembling waterfall, and pickled eggs artistically presented as garden stones. While the output was more artistic merch than a traditional logo, it exemplifies the innovative potential of DALL-E 3.

DALL-E’s Evolution: A Brief Look Back

The inception of DALL-E dates back to January 2021, pioneering the field before its counterparts like Stability AI and Midjourney. As DALL-E 2 emerged in 2022, OpenAI addressed certain concerns by introducing a waitlist system to regulate its access, primarily due to potential content biases and explicit image generations. The platform later became publicly accessible in September of the same year.

Now, with DALL-E 3, OpenAI is planning a phased release, initially rolling it out to ChatGPT Plus and ChatGPT Enterprise users, with research labs and API service access to follow in the fall. As of now, a timeline for a free public version remains under wraps.

Safety Enhancements in DALL-E 3

Amid the advancements, safety remains paramount. OpenAI has fortified DALL-E 3 with robust safety measures, rigorously tested by external red teamers. One notable advancement is the implementation of input classifiers designed to screen out explicit or potentially harmful prompts. Another significant upgrade ensures the inability to reproduce images of public figures when their names are explicitly mentioned in the prompt.

Sandhini Agarwal, OpenAI's policy researcher, expressed strong belief in these safety measures but also reminded users that continuous improvement is underway and perfection is still a work in progress.

Additionally, in response to concerns from the artist community, DALL-E 3 comes with an in-built ethical code: it won't attempt to recreate art in the style of living artists. OpenAI is also offering artists the option to prevent their art from being used in future AI iterations by allowing them to request removal of specific copyrighted images.

This move comes in light of legal challenges faced by DALL-E's competitors, Stability AI and Midjourney, and art platform DeviantArt, which were sued by artists alleging copyright infringements.

In Conclusion

DALL-E 3 stands as a testament to OpenAI's commitment to innovation, accessibility, and ethics in the ever-evolving domain of AI-generated art. As we await its broader release, the art and tech community watches with anticipation, eager to explore the limitless horizons that DALL-E 3 promises.