Concepts

For people new to AI APIs. 30-second skim, then decide which to dig into.

Model

The underlying AI brain that reads your prompt and writes a reply.

Different models have different strengths: Claude is strong on long context and code; GPT and Gemini cover multimodal work; Qwen, DeepSeek, Kimi, and GLM cover Chinese and SEA language tasks.
SeaLink covers 10 model ecosystems so you don't have to commit to one vendor.

When it matters: When you decide cost vs. quality vs. speed for your use case.

How models count words. ~4 English chars = 1 token. ~1.5 Chinese chars = 1 token.

"Hello world" = ~3 tokens. "你好世界" = ~3 tokens. A typical chat reply might be 200-500 tokens.
Both your input and the model's output are billed in tokens. Output usually costs 3-5x more than input.
SeaLink shows tokens-per-call estimates everywhere — try /tools/tokenizer.

When it matters: Whenever you forecast monthly cost or hit a model's context limit.

How many tokens a model can read in one shot.

GPT-4o-mini: 128K tokens (~300 pages). Claude Sonnet: 200K. Kimi K2: 1M (~2000 pages).
If your input + history + reply would exceed it, the model returns 413. Switch to a longer-context model or trim history.

When it matters: When you summarize long documents, do whole-codebase analysis, or run multi-turn agents.

A secret string that authenticates your app with SeaLink.

Looks like sk-sealink-... — treat it like a password. Don't commit to git, don't paste in chat.
Each SeaLink account can have multiple keys with different model whitelists, monthly budgets, and expiry dates.
If a key leaks: rotate it from /dashboard/keys. The old key is revoked immediately; update your app to use the new key before rotating.

When it matters: Every time you ship code that calls SeaLink.

Pattern: retrieve relevant docs first, then ask the model to answer using them as context.

Step 1: embed your documents into vectors with text-embedding-3-large.
Step 2: at query time, embed the question, find top-k similar chunks.
Step 3: send chunks + question to a chat model (e.g. Qwen Plus).
Why use it: cheaper than fine-tuning, more current than the model's training data, and the model can cite sources.

When it matters: When you want the model to answer based on YOUR documents, not its general knowledge.

Letting the model call your code (search DB, send email, query API) instead of just answering text.

You declare what tools the model can use. The model decides if/when to call them and returns the call's arguments.
You execute the call (in your code), pass the result back, and the model continues.
All major SeaLink models support this with identical syntax — see /docs/function-calling.

When it matters: Building an agent, a customer-support bot that books appointments, or anything that mixes natural language with structured actions.

Receive the reply token-by-token as it's generated, instead of waiting for the whole thing.

Cuts perceived latency from 3-10s (long reply) to ~200-500ms first token. Users see text appear live, like ChatGPT.
Enable with stream:true in your request. Use OpenAI SDK's async iterator pattern.

When it matters: User-facing chat UIs. Don't bother for short replies (<100 tokens) or background batch jobs.

Reuse a long fixed prompt across many calls — pay 10% on cached tokens instead of full price.

Common use: a 12K-token system prompt called 10K times/month. Without cache: ~$375/month. With cache (90% hit): ~$56/month.
Anthropic and OpenAI both support it. SeaLink passes the headers through transparently.

When it matters: Customer-support bots, RAG with fixed system prompt, agents that loop over the same context.