The Kimi K2.5 API provides developers with programmatic access to Moonshot AI's flagship model. This comprehensive guide covers everything from authentication to advanced integration patterns, helping you build powerful AI applications with Kimi K2.5.
What is the Kimi K2.5 API?
The Kimi K2.5 API is a RESTful interface that allows developers to integrate Kimi K2.5's capabilities into their applications. Built on the OpenAI-compatible API format, it provides seamless integration with existing tools and frameworks while supporting Kimi K2.5's unique features like 256K context window and multimodal inputs.
Key Features
| Feature | Description |
|---|---|
| 256K Context Window | Process documents up to ~200 pages |
| Multimodal Support | Text, image, and document inputs |
| Streaming Responses | Real-time token generation |
| Function Calling | Tool use and agentic workflows |
| OpenAI Compatible | Drop-in replacement for OpenAI SDK |
| Context Caching | Reduced costs for repeated context |
Getting Started with Kimi K2.5 API
1. Obtain API Credentials
Sign up for an API key at the Moonshot AI Platform:
- Create an account on the Moonshot AI developer portal
- Navigate to API Keys section
- Generate a new API key
- Store securely (environment variables recommended)
2. API Base URL
https://api.moonshot.cn/v13. Authentication
All API requests require authentication via Bearer token:
Authorization: Bearer YOUR_API_KEYKimi K2.5 API Code Examples
Python Integration
Basic Chat Completion
import openai
# Configure client
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
# Simple completion
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain machine learning in simple terms."}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)Streaming Responses
import openai
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
stream = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{"role": "user", "content": "Write a Python function to calculate fibonacci numbers."}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Multimodal Request with Image
import openai
import base64
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
# Read and encode image
with open("chart.png", "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this chart and summarize the key trends."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_data}"
}
}
]
}
]
)
print(response.choices[0].message.content)JavaScript/Node.js Integration
Basic Request with Fetch
const response = await fetch('https://api.moonshot.cn/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer YOUR_API_KEY'
},
body: JSON.stringify({
model: 'kimi-k2.5',
messages: [
{ role: 'system', content: 'You are a coding assistant.' },
{ role: 'user', content: 'Generate a React component for a todo list.' }
],
temperature: 0.7,
max_tokens: 2000
})
});
const data = await response.json();
console.log(data.choices[0].message.content);Using OpenAI SDK
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'your-kimi-api-key',
baseURL: 'https://api.moonshot.cn/v1'
});
async function generateCode() {
const stream = await client.chat.completions.create({
model: 'kimi-k2.5',
messages: [
{ role: 'user', content: 'Create a Python API with FastAPI' }
],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
generateCode();cURL Examples
# Basic chat completion
curl https://api.moonshot.cn/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kimi-k2.5",
"messages": [
{"role": "user", "content": "Hello, Kimi!"}
]
}'
# With system prompt and parameters
curl https://api.moonshot.cn/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "kimi-k2.5",
"messages": [
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Explain decorators."}
],
"temperature": 0.3,
"max_tokens": 1500
}'Advanced Kimi K2.5 API Features
Function Calling / Tool Use
import openai
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
]
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{"role": "user", "content": "What's the weather in Beijing?"}
],
tools=tools
)
# Check if model wants to call a function
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function to call: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")Long Context Document Processing
import openai
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
# Read a long document
document = open("annual_report.txt", "r", encoding="utf-8").read()
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[
{
"role": "user",
"content": f"Summarize the key financial metrics from this report:\n\n{document}"
}
],
max_tokens=2000
)
print(response.choices[0].message.content)Kimi K2.5 API Pricing
| Type | Price | Unit |
|---|---|---|
| Context Cache Hit | $0.10 | / 1M tokens |
| Context Cache Miss | $0.60 | / 1M tokens |
| Output Tokens | $3.00 | / 1M tokens |
Cost Optimization Tips
- Use Context Caching: For repeated contexts, caching reduces costs significantly
- Stream Responses: For real-time applications, streaming improves UX
- Optimize Prompts: Clear, concise prompts reduce token usage
- Batch Requests: Process multiple items in a single request when possible
Error Handling
Common HTTP Status Codes
| Code | Meaning | Resolution |
|---|---|---|
| 200 | Success | Request completed successfully |
| 400 | Bad Request | Check request format and parameters |
| 401 | Unauthorized | Verify API key |
| 429 | Rate Limited | Reduce request frequency |
| 500 | Server Error | Retry with exponential backoff |
Python Error Handling Example
import openai
from openai import RateLimitError, APIError
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
try:
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Hello"}]
)
except RateLimitError:
print("Rate limit exceeded. Please wait before retrying.")
except APIError as e:
print(f"API error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")Best Practices for Kimi K2.5 API
1. Secure API Key Management
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("KIMI_API_KEY")
client = openai.OpenAI(
api_key=api_key,
base_url="https://api.moonshot.cn/v1"
)2. Implement Retry Logic
import time
from functools import wraps
def retry_with_backoff(max_retries=3):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for i in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if i == max_retries - 1:
raise
time.sleep(2 ** i) # Exponential backoff
return None
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def call_kimi_api(messages):
return client.chat.completions.create(
model="kimi-k2.5",
messages=messages
)3. Request Batching
# Instead of multiple individual requests
requests = [
"Summarize paragraph 1",
"Summarize paragraph 2",
"Summarize paragraph 3"
]
# Batch into a single request
batch_prompt = "Summarize each of these paragraphs:\n\n" + "\n\n".join(
f"{i+1}. {req}" for i, req in enumerate(requests)
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": batch_prompt}]
)Integration Examples
FastAPI Backend
from fastapi import FastAPI
from pydantic import BaseModel
import openai
app = FastAPI()
client = openai.OpenAI(
api_key="your-kimi-api-key",
base_url="https://api.moonshot.cn/v1"
)
class ChatRequest(BaseModel):
message: str
temperature: float = 0.7
@app.post("/chat")
async def chat(request: ChatRequest):
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": request.message}],
temperature=request.temperature
)
return {"response": response.choices[0].message.content}FAQ
How do I get a Kimi K2.5 API key?
Visit the Moonshot AI Platform, create an account, and generate an API key from the developer dashboard.
Is the Kimi K2.5 API compatible with OpenAI?
Yes, the Kimi K2.5 API uses the OpenAI-compatible format. You can use the OpenAI SDK by changing the base_url to https://api.moonshot.cn/v1.
What is the rate limit for Kimi K2.5 API?
Rate limits are based on cumulative recharge amount (Tier0-Tier5), not Free/Pro/Enterprise plans. For example, Tier0 (¥0 recharge) is 1 concurrent request, 3 RPM, 500,000 TPM, and 1,500,000 TPD. Check the official limits page for the latest values.
Does Kimi K2.5 API support streaming?
Yes, set stream=True in your request to receive tokens as they're generated, enabling real-time responses.
Can I use Kimi K2.5 API for image analysis?
Yes, Kimi K2.5 API supports multimodal inputs including images. Use base64-encoded images in your messages.
How much does Kimi K2.5 API cost?
Pricing starts at $0.10/1M tokens for cache hits, $0.60/1M for cache misses, and $3.00/1M for output tokens.