What are the new audio models launched by OpenAI?

OpenAI launched three new audio models including GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.

How can these models be used in daily life?

These models can be utilized in customer support, education, and real-time translation.

What are the usage prices for these models?

Usage prices for GPT-Realtime-2 start at $32 per million audio tokens, while GPT-Realtime-Translate costs $0.034 per minute.

OpenAI Launches New Audio Models to Enhance Interaction

Discover how OpenAI enhances voice interaction with new models to improve user experience.

2026-05-10T09:08:05.146Z 2026-05-10T08:13:38.051Z

OpenAI launches three new audio models to enhance voice interaction.
Models include GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
Models target customer support, education, and translation fields.
Usage prices start at $32 per million audio tokens.
Major companies like Zillow and Priceline are testing these models.

In a significant move towards enhancing the voice interaction experience, OpenAI unveiled three innovative audio models on May 7, designed to boost the capabilities of voice agents in completing tasks instantly. These models are part of the company's ongoing development of its developer platform, enabling users to benefit from advanced technologies in live conversations.

The new models include GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. The first model is designed to manage complex requests, invoke tools, handle interruptions, and maintain context during long audio sessions. Meanwhile, the second model supports translation from over 70 languages into 13 languages, making it ideal for use in fields such as customer support and education.

Event Details

The third model, GPT-Realtime-Whisper, provides real-time speech-to-text conversion, enabling automatic translation, meeting notes, and workflow updates during speaker conversations. Companies such as Zillow, Priceline, and Deutsche Telekom have begun testing these new models, reflecting a broad interest in modern audio technologies.

Pricing for using the GPT-Realtime-2 model starts at $32 per million audio tokens, while GPT-Realtime-Translate costs $0.034 per minute, and GPT-Realtime-Whisper costs $0.017 per minute. These prices reflect a trend towards offering advanced audio services at competitive rates.

Background & Context

OpenAI is considered one of the leading companies in the field of artificial intelligence, having made significant advancements in developing language and audio models. Since the launch of ChatGPT, the company has attracted the attention of numerous developers and businesses looking to leverage AI technologies to enhance their services. These new models come at a time when the world is increasingly relying on voice interaction as a means of communication.

Historically, speech recognition and machine translation technologies have faced significant challenges related to accuracy and speed. With technological advancements, these solutions have become more effective, allowing their use in various fields such as education, healthcare, and customer service.

Impact & Consequences

These new models represent an important step towards achieving smoother interactions between humans and machines. These developments are expected to enhance user experience across many applications, making it easier for companies to provide better services to their customers. Additionally, these models may help reduce language barriers between different cultures, promoting global communication.

Furthermore, the use of these technologies in fields such as education can open new horizons for learners, allowing them to access educational content in multiple languages easily. Improving customer service through the use of voice agents can also contribute to increased customer satisfaction and brand loyalty.

Regional Significance

In the Arab region, these models could have a significant impact on how companies interact with their customers. With the growing reliance on technology across various sectors, these solutions can enhance the experience of Arab users, particularly in areas such as e-commerce and technical support. The ability to provide real-time translation may also facilitate communication between Arab companies and global markets.

In conclusion, the launch of these new audio models by OpenAI represents an important step towards enhancing the use of artificial intelligence in daily life, opening new avenues for interaction between humans and technology.

What are the new audio models launched by OpenAI?: OpenAI launched three new audio models including GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
How can these models be used in daily life?: These models can be utilized in customer support, education, and real-time translation.
What are the usage prices for these models?: Usage prices for GPT-Realtime-2 start at $32 per million audio tokens, while GPT-Realtime-Translate costs $0.034 per minute.

OpenAI · audio models · voice interaction · technology · artificial intelligence · تقنية صوتية · ذكاء اصطناعي · تفاعل صوتي