SeaLink
Sign up
← Docs

Prompt caching

Anthropic's prompt caching cuts cost 50–90% on repeated system prompts or long contexts. SeaLink passes the cache_control header through and bills based on actual cache hits.

When to enable caching

  • System prompt > ~5KB called repeatedly (chatbots, agent loops)
  • RAG with a fixed retrieval prefix + varying questions
  • Code assistant pinning a whole file as context
  • Long-doc Q&A: many questions over same doc

Example

Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.sealink.asia/v1",
api_key="<your-sealink-key>",
)
# Long, reusable system prompt — cached on the first call.
SYSTEM = open("knowledge_base.md").read() # imagine 50KB
resp = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[
{
"role": "system",
"content": SYSTEM,
# SeaLink passes this hint through to Anthropic / OpenAI.
"cache_control": {"type": "ephemeral"},
},
{"role": "user", "content": "Question 1"},
],
)
# Subsequent calls within ~5 minutes pay ~10% of the cached prefix cost.
print(resp.usage.prompt_tokens_details)
# {"cached_tokens": 12500}

Models that support caching

  • claude-sonnet-4-6 · claude-haiku-4-5 · claude-opus-4-7(5-min TTL; hit price ~10%
  • gpt-4o · gpt-4o-mini · o3-mini (automatic, no hint needed)

Real savings example

Customer-bot system 12.5K tokens × 10K calls/month. No cache: 12.5 × 10K × $3/1M = $375. With cache (90% hit): one write + 9999 × 10% price = ~$56. Saves $319/month.

Seeing cache hits in your console

SeaLink's usage page records cached_tokens per call. Dashboard has a daily cache hit rate chart (live in v1).