OpenAI has recently launched a new feature called Predicted Outputs for developers using GPT-4o and GPT-4o-mini. This feature is designed to improve efficiency and reduce the latency of responses. By allowing users to input a “prediction string,” which is an anticipated segment of the output, this feature significantly reduces response times during repetitive tasks or minor document edits.
According to OpenAI, most of the output of an LLM (large language model) is known before generation. By predicting these outputs in advance, the model can generate fewer tokens, which is usually the highest latency step when using an LLM. This means that cutting 50% of the output tokens can potentially cut user latency by 50%.
Users who have tested this feature have found it to be most useful for updating existing text or making small changes to code, such as renaming variables or rephrasing specific content. In these scenarios, the AI response can closely match the provided input, leading to faster responses and lower costs.
However, the feature may not be as beneficial for creating unique, original content, where responses cannot be easily anticipated in advance. OpenAI recommends using this feature in controlled, predictable tasks to maximize efficiency, particularly in contexts that require frequent minor adjustments.
In conclusion, OpenAI’s Predicted Outputs feature is a valuable tool for developers using LLMs, especially in tasks that involve repetitive or minor changes. By predicting outputs in advance, this feature can significantly improve efficiency and reduce response times, making it a valuable addition to the developer’s toolkit.