Anthropic recently introduced prompt caching to its API, a feature that allows developers to store context between API calls, thus avoiding the need to repeat prompts. This feature is currently available in public beta for Claude 3.5 Sonnet and Claude 3 Haiku models, with plans to roll it out for Opus model soon. Prompt caching, as described in a 2023 paper, enables users to retain frequently used contexts in their sessions, ultimately allowing for cost-effective inclusion of additional background information without incurring extra costs.
Prompt caching offers several advantages to users and developers. One key benefit is the ability to include a large amount of context in a prompt and refer back to it in different conversations with the model. This not only enhances the user experience but also facilitates better fine-tuning of model responses. Early adopters of prompt caching reported significant speed and cost improvements across various use cases, such as incorporating full knowledge bases, 100-shot examples, and entire conversation turns in prompts. Moreover, cached prompts lead to lower prices per token, with Anthropic stating that using cached prompts is significantly cheaper than the base input token price.
For Claude 3.5 Sonnet users, the cost of writing a prompt to be cached is $3.75 per 1 million tokens, whereas using a cached prompt costs only $0.30 per million tokens. This represents a substantial cost saving, as the base price for input tokens in the Sonnet model is $3 per million tokens. By investing a little more upfront in writing the prompt to be cached, users can expect a 10x savings increase when using the cached prompt in subsequent interactions. Similarly, Claude 3 Haiku users will pay $0.30 per million tokens to cache prompts and $0.03 per million tokens when accessing stored prompts.
While prompt caching presents various benefits, it also comes with limitations that users should be aware of. For instance, prompt caching is not currently available for Claude 3 Opus model, although the pricing details have been released. Writing to cache in the Opus model will cost $18.75 per million tokens, and accessing the cached prompt will cost $1.50 per million tokens. Additionally, Anthropic’s prompt cache has a 5-minute lifetime and is refreshed with each use, which may limit its usefulness for certain applications that require longer retention periods.
Anthropic’s introduction of prompt caching is part of its broader effort to compete with other AI platforms, such as Google and OpenAI, by offering cost-effective options for developers. Prior to the release of the Claude 3 models, Anthropic reduced the prices of its tokens in a bid to attract third-party developers to its platform. Prompt caching is not unique to Anthropic, as other platforms like Lamina and OpenAI also offer similar features to lower the cost of using their models. While prompt caching is a valuable tool for optimizing costs and improving efficiency, it is essential for users to weigh the benefits against the limitations and consider their specific use cases before incorporating it into their workflows.
Leave a Reply