Kimi K2.5 Open Source: License, Weights & Self-Hosting Guide 2026

Feb 10, 2026

The Kimi K2.5 open source release represents a significant milestone in democratizing access to state-of-the-art AI. Unlike proprietary models from OpenAI and Anthropic, Moonshot AI has released Kimi K2.5 with open weights under a Modified MIT License, enabling researchers, developers, and enterprises to run, modify, and deploy the model on their own terms.

Kimi K2.5 Open Source Overview

What Does "Open Source" Mean for Kimi K2.5?

Kimi K2.5 is released as an open-weights model, which means:

  • ✅ Model weights are publicly downloadable
  • ✅ Can be run locally or on private infrastructure
  • ✅ Fine-tuning and modification allowed
  • ✅ Commercial use permitted (with limitations)
  • ✅ No API dependency required
  • ❌ Training data not public
  • ❌ Full training code not released

Comparison with Other "Open" Models

ModelOpen WeightsTraining DataCommercial UseTrue Open Source
Kimi K2.5✅ Yes❌ No✅ Modified MIT⚠️ Partial
Llama 3.1✅ Yes❌ No✅ Yes (with limits)⚠️ Partial
Mistral✅ Yes❌ No✅ Yes⚠️ Partial
GPT-4❌ No❌ No❌ API only❌ No
Claude❌ No❌ No❌ API only❌ No

Kimi K2.5 Modified MIT License Explained

License Overview

The Modified MIT License allows broad usage while including some restrictions for high-volume commercial deployments.

What You CAN Do

PermissionDetails
UsePersonal, academic, and commercial use
ModifyFine-tune and adapt the model
DistributeShare modifications and derivatives
Private UseDeploy on private infrastructure
SublicenseInclude in larger projects

What You CANNOT Do (Restrictions)

RestrictionThresholdDetails
Attribution Requirement>100M MAU or >$20M monthly revenueMust prominently display "Kimi K2.5" in user-facing products/services
Harmful UseAnyWeapons, surveillance, etc.

Full License Text

The complete license is available at: https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/LICENSE

Key clauses:

Permission is hereby granted, free of charge, to any person obtaining a copy
of this model and associated documentation files (the "Model"), to deal
in the Model without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Model, subject to the following conditions:

1. If your product/service exceeds 100 million monthly active users
   or $20 million in monthly revenue, you must prominently display "Kimi K2.5"
   in all user-facing products/services.

2. Standard MIT-style notice obligations still apply.

Downloading Kimi K2.5 Weights

Official Distribution Channels

SourceURLFormat
HuggingFacehuggingface.co/moonshotai/Kimi-K2.5PyTorch / Safetensors
Model Scopemodelscope.cn/models/MoonshotAI/Kimi-K2.5PyTorch

Download Methods

Using HuggingFace Hub

from huggingface_hub import snapshot_download

# Download complete model
model_path = snapshot_download(
    repo_id="moonshotai/Kimi-K2.5",
    local_dir="./kimi-k2-5",
    local_dir_use_symlinks=False,
    resume_download=True
)

Using Git LFS

# Install Git LFS
git lfs install

# Clone repository
git clone https://huggingface.co/moonshotai/Kimi-K2.5.git

# Or sparse checkout for specific files
git clone --filter=blob:none https://huggingface.co/moonshotai/Kimi-K2.5.git

Using wget/curl

# Download specific files
wget https://huggingface.co/moonshotai/Kimi-K2.5/resolve/main/config.json
wget https://huggingface.co/moonshotai/Kimi-K2.5/resolve/main/model.safetensors.index.json

Model Files Structure

kimi-k2-5/
├── config.json              # Model configuration
├── tokenizer.json           # Tokenizer vocab
├── tokenizer_config.json    # Tokenizer settings
├── model.safetensors.index.json  # Weight index
├── model-00001-of-000064.safetensors  # Shard 1
├── model-00002-of-000064.safetensors  # Shard 2
├── ...
├── model-00064-of-000064.safetensors  # Shard 64
├── generation_config.json   # Generation defaults
└── LICENSE                  # License file

Self-Hosting Kimi K2.5

Hardware Requirements

Moonshot's official deployment guide does not publish a single fixed "minimum" hardware spec. It provides reference setups:

ScenarioOfficial exampleNotes
vLLM / SGLang inferenceSingle-node H200 with tensor parallel size 8Reference command from Moonshot deployment guide
KTransformers + SGLang inference8x NVIDIA L20 + 2x Intel 6454SCPU+GPU heterogeneous inference example
LoRA SFT with KTransformers + LLaMA-Factory2x RTX 4090 + Intel 8488C (with large RAM/swap)Fine-tuning example from guide

Deployment Options

1. Local Deployment with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "./kimi-k2-5",  # Local path
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "./kimi-k2-5",
    trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello, Kimi!", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))

2. Production Deployment with vLLM

# Install vLLM
uv pip install -U vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

# Start server
vllm serve ./kimi-k2-5 \
    -tp 8 \
    --mm-encoder-tp-mode data \
    --trust-remote-code \
    --tool-call-parser kimi_k2 \
    --reasoning-parser kimi_k2

3. Docker Deployment

FROM nvidia/cuda:12.1-devel-ubuntu22.04

WORKDIR /app

# Install dependencies
RUN pip install torch vllm transformers

# Copy model (or mount as volume)
COPY ./kimi-k2-5 /models/kimi-k2-5

# Expose port
EXPOSE 8000

# Start server
CMD python -m vllm.entrypoints.openai.api_server \
    --model /models/kimi-k2-5 \
    --tensor-parallel-size 4 \
    --host 0.0.0.0 \
    --port 8000
# Build and run
docker build -t kimi-k2-5 .
docker run --gpus all -p 8000:8000 kimi-k2-5

4. Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kimi-k2-5
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kimi-k2-5
  template:
    metadata:
      labels:
        app: kimi-k2-5
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        args:
        - --model
        - /models/kimi-k2-5
        - --tensor-parallel-size
        - "4"
        volumeMounts:
        - name: model
          mountPath: /models
        resources:
          limits:
            nvidia.com/gpu: "4"
      volumes:
      - name: model
        persistentVolumeClaim:
          claimName: kimi-model-pvc

Cloud Deployment Options

AWS

# Using Deep Learning AMI
aws ec2 run-instances \
    --image-id ami-xxx \
    --instance-type p4d.24xlarge \
    --key-name my-key

# Deploy with ECS/EKS
# Use GPU-optimized instances (p4d, p5)

Google Cloud Platform

# Using Deep Learning VM
gcloud compute instances create kimi-server \
    --zone=us-central1-a \
    --machine-type=a2-highgpu-4g \
    --image-family=pytorch-latest-gpu

Azure

# Using NC-series VMs
az vm create \
    --resource-group myRG \
    --name kimi-vm \
    --size Standard_NC24ads_A100_v4

Fine-Tuning Open Source Kimi K2.5

LoRA Fine-Tuning

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "./kimi-k2-5",
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Configure LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)

# Training configuration
training_args = TrainingArguments(
    output_dir="./kimi-k2-5-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    save_steps=100
)

# Train
from trl import SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer
)
trainer.train()

Full Fine-Tuning (requires significant resources)

# For full fine-tuning, use DeepSpeed or FSDP
from accelerate import Accelerator

accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin)

model = AutoModelForCausalLM.from_pretrained(
    "./kimi-k2-5",
    trust_remote_code=True
)

# DeepSpeed config for distributed training
# Requires 8x H100 or equivalent

Quantization for Resource-Constrained Deployment

8-bit Quantization

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_compute_dtype=torch.float16
)

model = AutoModelForCausalLM.from_pretrained(
    "./kimi-k2-5",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

4-bit Quantization

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    "./kimi-k2-5",
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

GGUF Format (llama.cpp)

Community-converted GGUF files are available, but they are not official Moonshot releases:

# Example community repo
# https://huggingface.co/unsloth/Kimi-K2.5-GGUF

# Inspect available files/tags in the chosen repo before running llama.cpp

Open Source Community

Contributing

While the base model weights are fixed, community contributions include:

  • Fine-tuned variants: Domain-specific adaptations
  • Quantized versions: Optimized for different hardware
  • Integration libraries: SDKs for various frameworks
  • Deployment tools: Kubernetes charts, Docker images
ProjectDescriptionLink
MoonshotAI/Kimi-K2.5Official code repositorygithub.com/MoonshotAI/Kimi-K2.5
moonshotai/Kimi-K2.5 (HF)Official open-weights model card and fileshuggingface.co/moonshotai/Kimi-K2.5
unsloth/Kimi-K2.5-GGUFCommunity GGUF conversion examplehuggingface.co/unsloth/Kimi-K2.5-GGUF
KTransformers docsK2.5 deployment and tuning referencesgithub.com/kvcache-ai/ktransformers

Compliance and Best Practices

License Compliance Checklist

  • Review Modified MIT License thoroughly
  • Check user count if offering public service
  • Review separate API terms before reselling hosted access
  • Include license in distributions
  • Attribute Moonshot AI appropriately

Security Considerations

# Implement input validation
def validate_input(text):
    max_length = 100000  # Limit input size
    if len(text) > max_length:
        raise ValueError("Input too long")
    
    # Block harmful prompts
    blocked_terms = ["..."]  # Your list
    for term in blocked_terms:
        if term in text.lower():
            raise ValueError("Blocked content detected")
    
    return text

Monitoring and Logging

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def generate_with_logging(prompt):
    logger.info(f"Generation request: {len(prompt)} chars")
    
    output = model.generate(**inputs)
    
    logger.info(f"Generated: {len(output)} tokens")
    return output

FAQ

Is Kimi K2.5 truly open source?

Kimi K2.5 is open weights under a Modified MIT License. This provides significant freedom compared to closed API-only models, though it doesn't include training data or full training code (which is standard for large AI models).

Can I use Kimi K2.5 commercially?

Yes, with conditions. The Modified MIT License permits commercial use for most businesses. If your product/service exceeds 100M MAU or $20M monthly revenue, you must prominently display "Kimi K2.5" in user-facing interfaces.

How much does self-hosting cost?

Costs vary significantly by region, cloud provider, quantization strategy, and throughput target. Use current cloud calculators and the official deployment guide examples as your baseline when budgeting.

Can I modify and redistribute Kimi K2.5?

Yes, you can modify, fine-tune, and distribute your modified versions under the same license terms. This includes creating derivative models.

How does Kimi K2.5 compare to Llama 3 for open source use?

AspectKimi K2.5Llama 3 / 3.1
Parameters1T total / 32B activatedVaries by checkpoint
Context256KVaries by version (up to 128K in Llama 3.1)
LicenseModified MITLlama Community License
CommercialAllowed with attribution threshold clauseAllowed with license-specific restrictions
HardwareTypically demanding at full qualityOften lighter for smaller checkpoints

References

Kimi K2.5 Open Source: License, Weights & Self-Hosting Guide 2026 | Blog