Model
The underlying AI brain that reads your prompt and writes a reply.
- Different models have different strengths: Claude is strong on long context and code; GPT and Gemini cover multimodal work; Qwen, DeepSeek, Kimi, and GLM cover Chinese and SEA language tasks.
- SeaLink covers 10 model ecosystems so you don't have to commit to one vendor.
When it matters: When you decide cost vs. quality vs. speed for your use case.
Token
How models count words. ~4 English chars = 1 token. ~1.5 Chinese chars = 1 token.
- "Hello world" = ~3 tokens. "你好世界" = ~3 tokens. A typical chat reply might be 200-500 tokens.
- Both your input and the model's output are billed in tokens. Output usually costs 3-5x more than input.
- SeaLink shows tokens-per-call estimates everywhere — try /tools/tokenizer.
When it matters: Whenever you forecast monthly cost or hit a model's context limit.
Context window
How many tokens a model can read in one shot.
- GPT-4o-mini: 128K tokens (~300 pages). Claude Sonnet: 200K. Kimi K2: 1M (~2000 pages).
- If your input + history + reply would exceed it, the model returns 413. Switch to a longer-context model or trim history.
When it matters: When you summarize long documents, do whole-codebase analysis, or run multi-turn agents.
API key
A secret string that authenticates your app with SeaLink.
- Looks like sk-sealink-... — treat it like a password. Don't commit to git, don't paste in chat.
- Each SeaLink account can have multiple keys with different model whitelists, monthly budgets, and expiry dates.
- If a key leaks: rotate it from /dashboard/keys. The old key is revoked immediately; update your app to use the new key before rotating.
When it matters: Every time you ship code that calls SeaLink.
RAG (Retrieval-Augmented Generation)
Pattern: retrieve relevant docs first, then ask the model to answer using them as context.
- Step 1: embed your documents into vectors with text-embedding-3-large.
- Step 2: at query time, embed the question, find top-k similar chunks.
- Step 3: send chunks + question to a chat model (e.g. Qwen Plus).
- Why use it: cheaper than fine-tuning, more current than the model's training data, and the model can cite sources.
When it matters: When you want the model to answer based on YOUR documents, not its general knowledge.
Tool use / Function calling
Letting the model call your code (search DB, send email, query API) instead of just answering text.
- You declare what tools the model can use. The model decides if/when to call them and returns the call's arguments.
- You execute the call (in your code), pass the result back, and the model continues.
- All major SeaLink models support this with identical syntax — see /docs/function-calling.
When it matters: Building an agent, a customer-support bot that books appointments, or anything that mixes natural language with structured actions.
Streaming
Receive the reply token-by-token as it's generated, instead of waiting for the whole thing.
- Cuts perceived latency from 3-10s (long reply) to ~200-500ms first token. Users see text appear live, like ChatGPT.
- Enable with stream:true in your request. Use OpenAI SDK's async iterator pattern.
When it matters: User-facing chat UIs. Don't bother for short replies (<100 tokens) or background batch jobs.
Prompt caching
Reuse a long fixed prompt across many calls — pay 10% on cached tokens instead of full price.
- Common use: a 12K-token system prompt called 10K times/month. Without cache: ~$375/month. With cache (90% hit): ~$56/month.
- Anthropic and OpenAI both support it. SeaLink passes the headers through transparently.
When it matters: Customer-support bots, RAG with fixed system prompt, agents that loop over the same context.