Dev Day Holiday Edition—12 Days of OpenAI: Day 9
Day Nine of 12 Days: Developer Focus
Introduction to Developer Day
- Olivia Gar introduces the session, emphasizing its focus on developers and startups utilizing the OpenAI API.
- Over 2 million developers from more than 200 countries are engaged with the API, showcasing its global reach.
New Features and Models Announcement
- Michelle Poas announces the launch of new features in the API, including function calling, structured outputs, and developer messages.
- Developer messages allow for enhanced control over model instructions, improving interaction quality.
- A new parameter called "reasoning effort" is introduced to optimize computational resources based on problem complexity.
Vision Inputs Demonstration
- The introduction of vision inputs aims to assist in fields like manufacturing and science.
- A live demo showcases how the model can detect errors in a filled-out text form using vision capabilities.
Error Detection in Forms
- The demo highlights two specific errors made while calculating adjusted gross income (AGI).
- The model successfully identifies arithmetic mistakes and incorrect standard deduction values based on provided images.
Function Calling Feature Explained
- The function calling feature allows models to interact with backend APIs for accurate data retrieval.
- Users do not see backend interactions; all processing occurs behind the scenes for a seamless experience.
Structured Outputs Implementation
- A demonstration of structured outputs shows how users can request corrections formatted according to a specified JSON schema.
Enhancements in Model Functionality and Real-Time API
Structured Outputs and Corrections
- The model provides a list of corrections, detailing reasons for errors, locations, new values, and old values. This allows for a user-friendly UI that highlights discrepancies in PDF outputs.
- The structured output is beneficial for applications requiring automatic JSON extraction rather than rendering markdown from the model's output.
Internal Evaluations and Performance Metrics
- Internal evaluations are conducted to assess feature performance before release; these tests focus on API use cases.
- The O1 model shows significant improvement over GPT-4 in function calling accuracy, effectively managing when to call functions correctly.
- O1 also excels in structured outputs, maintaining format adherence better than previous models, which reduces off-distribution errors.
Coding Evaluation Results
- In coding evaluations using Live Bench (an open-source coding evaluation), O1 outperforms both its preview version and GPT-4 significantly.
Latency Improvements and Future Developments
- O1 maintains reasoning capabilities even with structured outputs while showing improved performance metrics compared to earlier versions.
- The new model uses 60% fewer tokens than its predecessor, resulting in faster processing times and reduced costs for applications.
Upcoming Features and Real-Time API Updates
- There is high demand for O1 Pro within the API; development is ongoing but not yet available to users.
Real-Time Voice Experiences with WebRTC Support
Introduction to Real-Time API Capabilities
- The Real-Time API enables developers to create advanced voice experiences similar to ChatGPT with AI assistants integrated into their applications.
Benefits of WebRTC Integration
- WebRTC support enhances the Real-Time API by providing low-latency video streaming benefits such as echo cancellation and adaptive bitrate management.
Simplified Application Development Process
- A demo illustrates how easy it is to set up an audio element with peer connections using WebRTC compared to previous websocket integrations.
Code Comparison: WebSockets vs. WebRTC
- Transitioning from websockets required extensive code (200–250 lines), along with additional complexities like back pressure management; WebRTC simplifies this process significantly.
Successful Demo Execution
Introduction to New Developments in Real-Time API
Overview of the Code Execution Process
- The process involves copying and pasting 12 lines of code, executing a script, and changing only the API token for functionality.
- The speaker expresses excitement about sharing this code with others, anticipating innovative applications from users.
Introduction of Fawn on the Lawn Toy
- A new toy called "Fawn on the Lawn" is introduced, featuring a tiny microcontroller comparable to a penny.
- The discussion shifts towards integrating WebRTC into the real-time API, highlighting its complexity compared to more playful topics like delivering presents.
Use Cases for Microcontrollers
- The potential applications for small microcontrollers are vast; they can be integrated into wearables or home devices for context-aware assistance.
- Users can easily set up these microcontrollers by connecting them via USB and configuring Wi-Fi settings without needing soldering or hardware expertise.
Updates to Real-Time API Pricing and Features
Cost Reductions and New SDK Support
- Significant cost reductions announced: GPT 40 audio tokens will now be 60% cheaper, while mini audio tokens will be 10x cheaper than before.
- A Python SDK is being introduced to simplify integration with the real-time API.
Enhancements in Fine-Tuning Capabilities
- Discussion transitions to fine-tuning customization needs from developers; preference fine-tuning is introduced as a new method available in the API.
Understanding Preference Fine Tuning
Explanation of Fine-Tuning Methods
- Preference fine-tuning utilizes direct preference optimization to align models better with user preferences, enhancing performance based on feedback.
- Current methods include supervised fine-tuning (exact input/output pairs), while preference fine-tuning focuses on preferred versus non-preferred responses.
Practical Applications of Fine-Tuning
- Typical use cases include customer support chatbots and content moderation where specific stylistic guidelines are necessary.
Demonstration of Fine-Tuning Process
Step-by-Step Guide on Implementing Fine-Tuning
- A demonstration shows how easy it is to start fine-tuning within the platform UI by selecting methods and uploading training data.
Fine-Tuning AI Models: Insights and Developments
Overview of Fine-Tuning Process
- The fine-tuning process is initiated with default hyperparameters, which can take several minutes to hours depending on dataset size. Once completed, the model can be sampled like any base model in the API.
Early Access and Results
- Early access for preference fine-tuning has been granted to select partners, yielding promising results. For instance, Rogo AI is enhancing an AI assistant for financial analysts by refining user queries.
- With preference fine-tuning, Rogo AI improved their internal benchmark accuracy from 75% (base model) to over 80%, showcasing the effectiveness of this approach.
Availability and Documentation
- Preference fine-tuning will soon be available for GPT-4 and GPT-4 Mini at the same cost per training token as supervised fine-tuning. Developers are encouraged to explore documentation for implementation.
New Features and SDK Support
- The team has rolled out new features including a real-time API and simpler WebRTC integration alongside preference fine-tuning capabilities aimed at customizing models effectively.
- Official support for Go and Java SDKs has been introduced, providing developers with necessary tools across all API endpoints similar to existing Python and Node SDKs.
User Experience Enhancements
- A streamlined login/signup process allows users to obtain an API key quickly without extensive terms of service agreements.
- Recent conference content has been released on YouTube, offering valuable insights into developments within OpenAI's ecosystem.
Community Engagement