💻 Technology Live

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

VentureBeat

12 Jun 2026 14 hours ago 1 min read

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

VentureBeat — 12 June 2026

Text:

2 0 0

🎙️ AI Podcast — Two-Host Discussion

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks …

Kokoro TTS · ~5 min episode · American English voices

Choose voices for Host A and Host B. Changes take effect on next play.

Host A 🟥

Host B 🟦

Moonshot AI released Kimi K2.7-Code this week, an open-source update to its K2 coding model family, claiming leaner reasoning and double-digit performance gains. K2.7-Code is built on the same trillion-parameter mixture-of-experts architecture as its p redecessor K2.6 , and drops in via an OpenAI-compatible API — which matters for teams already running K2.6 in production gateways. When K2.6 launched in April, it topped OpenRouter's weekly LLM leaderboard — a ranking based on actual API routing decisions by developers, not self-reported benchmark scores. Moonshot AI says K2.7-Code addresses what it calls "overthinking," reducing thinking-token usage by 30% compared to K2.6 — a number that would directly affect inference costs for teams running agentic workflows. Whether that efficiency gain holds on independent benchmarks is a question practitioners have already started raising publicly. What Kimi K2.7-Code is K2.7-Code is released under a Modified MIT license, with weights available on

This report comes from VentureBeat. The story centres on Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out. Full coverage and background context is available at the original source. Readers seeking more detail on this developing topic are encouraged to follow updates from VentureBeat and related outlets covering this beat.