AILAB Blog: ChatGPT: The Next Evolution with Voice and Image Capabilities

OpenAI is thrilled to announce the rollout of new voice and image features in ChatGPT! This evolution offers a more intuitive interaction, allowing users to voice chat with ChatGPT and visually show its context by sharing images.

Broadening the Horizons: The What and Why

Using these new features, users can:

Snap and Share: Whether it's a fascinating landmark while traveling or a snapshot of the fridge's contents, ChatGPT can provide insights, recipes, and more.
Math Homework Assistance: Parents can help their children by snapping a photo of a math problem and receiving hints.
Availability: Over the next two weeks, Plus and Enterprise users can look forward to accessing these voice and image features. Voice capability will be available on both iOS and Android, while the image feature will be available across all platforms.

Diving Deeper into the Features

1. Engage in Voice Conversations with ChatGPT

Users can now verbally converse with ChatGPT, opening a plethora of opportunities such as bedtime stories or settling debates.

Getting Started with Voice:

Navigate to Settings → New Features on the mobile app.
Opt into voice conversations.
Tap the headphone button on the home screen and choose a voice from five options.

This innovation is backed by a new text-to-speech model and leverages Whisper, OpenAI's open-source speech recognition system.

2. Chat About Images

By tapping the photo button, users can now provide ChatGPT with visual context.

Getting Started with Images:

For iOS or Android users, tap the plus button first.
Share the desired image or use the drawing tool for more specificity.

The image understanding hinges on the prowess of multimodal GPT-3.5 and GPT-4 models.

Safety and Gradual Deployment

At OpenAI, our mission is to foster AGI that's both safe and beneficial. Here's our approach:

Voice:

While voice technology heralds immense potential, it can also be misused. We are committed to limiting its scope to specific use cases, such as voice chat. Notable collaborations, such as with Spotify, are leveraging this technology responsibly.

Image Input:

Challenges with vision-based models include hallucinations and high-stakes interpretations. To ensure responsible deployment, we've taken significant measures:

User Experience: Collaborating with "Be My Eyes," an app for the visually impaired, has enriched our understanding of the feature's practical applications and limitations.
Technical Safeguards: We've curtailed ChatGPT's capability to analyze or comment on individuals to uphold privacy.

Feedback from real-world usage will be paramount in refining these safeguards.

Model Limitations:

ChatGPT excels in specific domains but has limitations, particularly with non-English, non-roman scripts. Users are urged to use ChatGPT responsibly, especially for specialized topics.

Expanding Access

The excitement doesn't end here! Following the initial rollout to Plus and Enterprise users, we're eager to introduce these capabilities to a broader user base, including developers.

Stay tuned, and dive into the next-gen ChatGPT experience!

AILAB Blog

9.26.2023

ChatGPT: The Next Evolution with Voice and Image Capabilities

No comments:

Post a Comment