Kimi K2.7-Code cuts thinking tokens 30% โ but practitioners say the benchmarks don't check out
Moonshot AI released Kimi K2.7-Code this week, an open-source update to its K2 coding model family, claiming leaner reasoning and double-digit performance gains. K2.7-Code is built on the same trillion-parameter mixture-of-experts architecture as its p redecessor K2.6 , and drops
Moonshot AI released Kimi K2.7-Code this week, an open-source update to its K2 coding model family, claiming leaner reasoning and double-digit performance gains. K2.7-Code is built on the same trillion-parameter mixture-of-experts architecture as its p redecessor K2.6 , and drops in via an OpenAI-compatible API โ which matters for teams already running K2.6 in production gateways. When K2.6 launched in April, it topped OpenRouter's weekly LLM leaderboard โ a ranking based on actual API routing decisions by developers, not self-reported benchmark scores. Moonshot AI says K2.7-Code addresses what it calls "overthinking," reducing thinking-token usage by 30% compared to K2.6 โ a number that would directly affect inference costs for teams running agentic workflows. Whether that efficiency gain holds on independent benchmarks is a question practitioners have already started raising publicly. What Kimi K2.7-Code is K2.7-Code is released under a Modified MIT license, with weights available on
This report comes from VentureBeat. The story centres on Kimi K2.7-Code cuts thinking tokens 30% โ but practitioners say the benchmarks don't check out. Full coverage and background context is available at the original source. Readers seeking more detail on this developing topic are encouraged to follow updates from VentureBeat and related outlets covering this beat.

