Prompt caching

Anthropic's prompt caching cuts cost 50–90% on repeated system prompts or long contexts. SeaLink passes the cache_control header through and bills based on actual cache hits.

When to enable caching

System prompt > ~5KB called repeatedly (chatbots, agent loops)
RAG with a fixed retrieval prefix + varying questions
Code assistant pinning a whole file as context
Long-doc Q&A: many questions over same doc

Example

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.sealink.asia/v1",
    api_key="<your-sealink-key>",
)

# Long, reusable system prompt — cached on the first call.
SYSTEM = open("knowledge_base.md").read()  # imagine 50KB

resp = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[
        {
            "role": "system",
            "content": SYSTEM,
            # SeaLink passes this hint through to Anthropic / OpenAI.
            "cache_control": {"type": "ephemeral"},
        },
        {"role": "user", "content": "Question 1"},
    ],
)
# Subsequent calls within ~5 minutes pay ~10% of the cached prefix cost.

print(resp.usage.prompt_tokens_details)
# {"cached_tokens": 12500}

Models that support caching

claude-sonnet-4-6 · claude-haiku-4-5 · claude-opus-4-7（5-min TTL; hit price ~10%）
gpt-4o · gpt-4o-mini · o3-mini (automatic, no hint needed)

Real savings example

Customer-bot system 12.5K tokens × 10K calls/month. No cache: 12.5 × 10K × $3/1M = $375. With cache (90% hit): one write + 9999 × 10% price = ~$56. Saves $319/month.

Seeing cache hits in your console

SeaLink's usage page records cached_tokens per call. Dashboard has a daily cache hit rate chart (live in v1).