Kimi K2.6 vs Claude Opus 4.7
Five identical prompts, two harnesses. A frontier closed model against an open-weight underdog.
Kimi K2.6 dropped two days ago and the benchmarks looked strong. I don't take benchmarks at face value. They get gamed, leaked into training, and rarely reflect the kind of work you hand a model on a Tuesday morning.
So I ran it head-to-head against Claude Opus 4.7. Five identical prompts, two harnesses.
K2.6 ran on opencode via OpenRouter. Opus 4.7 ran on Claude Code. These are the harnesses most people actually use, so this reflects the choice you'd make if you were picking a default for next week.
The setup
Five prompts, each designed to stress a different part of the stack.
A real-time canvas physics simulation: 5,000 particles, mouse interaction, sliders for viscosity and gravity. The kind of thing where you can feel when a model gets the math wrong.
A pathfinding race: eight algorithms running in parallel on the same grid, with an interactive maze editor. Algorithm correctness plus visual taste, in one shot.
A 3D solar system in Three.js: clickable planets, orbital mechanics, smooth camera transitions, dressed up like a NASA mission control panel.
A Remotion explainer video about the rise of open source models: 40 seconds, 60fps, four scenes, multiple camera movements.
A one-shot recreation of a SaaS dashboard from a Dribbble shot, described to both models entirely in text. No image input.
The full prompts are at the end.
01. Fluid simulation
Both models hit 60fps with 5,000 particles. Both got the physics right. Both let you crank gravity to zero and watch the particles float in a way that felt physically believable, not just animated.
The difference was small and subjective. Opus had more particles visible at the top of the simulation and let you shift-click to repel, a small interaction detail K2.6 didn't include. K2.6's UI looked different but worked the same.
I had to remind myself partway through that one of these models is open-weight and an order of magnitude cheaper to run. That reframes the test. When two models tie on capability, the cheaper one effectively wins. I'm scoring on output quality, not economics, so this one is a draw.
02. Pathfinding race
Both models implemented all eight algorithms correctly. Both let you draw walls, drop start and end points, generate random mazes, and race the algorithms. Both worked on the first try.
K2.6 went with a barcode-style aesthetic I didn't expect and kind of liked. Opus played it safer.
One bit of weirdness: Opus looked like it was cheating on some maze layouts. I'd block its path and it would still find a way through. Smart? Yes. Suspiciously smart when the other algorithms in the same grid were stalling out? Also yes. Not enough to swing the verdict.
This one comes down to taste. Both tied on function. K2.6 one-shot it with no errors at a fraction of the price.
03. 3D solar system
This is where the gap got real.
Opus's version had UI flicker: a persistent visual glitch when navigating between planets. The orbital data was more detailed (Venus got 224.7 days versus K2.6's rounded 225), but I'm not buying precision at the cost of a buggy interface.
Then the thing that decided it: Opus's planets were dots. Colored spheres, no surface detail. K2.6's planets had texture. Earth looked like Earth. The sun glowed brighter when you zoomed close to it. Mercury, Venus, and Mars all had visual character. The difference between "here's a 3D scene" and "here's something that feels like mission control."
The classification panel showed up. The orbital parameters showed up. When I pulled the camera close to the sun on K2.6's version, it bloomed. That's the kind of detail you don't get unless the model actually understood the brief.
Opus also dropped Saturn from its render. I had to scroll to find it, and when I did, it was flickering too. Clean win for K2.6.
Opus's planets were dots. Kimi's had texture. A small detail that turns into a big one.
04. Remotion explainer
This is my favorite test of the five, and the one Opus has historically dominated. Remotion is a code-driven motion graphics library. You're not generating animations so much as building them frame-precise with TypeScript, splines, springs, and timing math.
K2.6's output worked, but it wasn't impressive. I've used Remotion extensively and I know what good looks like. K2.6's was serviceable.
Opus skipped the "closed models" opening frame I'd written into the prompt, which was annoying. But the rest of the composition flowed better. Transitions felt choreographed instead of stitched, and the closing scene landed. Opus wins this round easily.
There's a reason Opus is the default for motion graphics work. This test confirmed it.
05. Dashboard recreation
I gave both models a Dribbble shot of a SaaS analytics dashboard, Nexus, charts, sidebar, the works, described entirely in text. No image input. Just the prompt.
Both pulled it off. Both surprised me. The bar charts animated on load. The grouped stacked bars rendered correctly. The sidebar had the right hierarchy. This is the kind of prompt that used to break models a year ago.
Where Opus fell apart: chart rendering was off in a few specific places. The half-donut at the bottom didn't quite work, and a few card sections had weird spacing. K2.6 handled the tab management cleanly (it sort of simulated the browser environment the prompt asked for) and the overall layout felt more cohesive on first paint.
Both lost points for not handling mobile responsiveness. I didn't ask for it in the prompt, so that's fair.
A tough call. They both did well. The fact that K2.6 pulled this off in one shot, at its price point, with no follow-up corrections, gives it the round.
The verdict
Kimi K2.6 is the clear winner. Two outright wins, two ties, one loss. The loss was on Remotion, the one prompt I expected Opus to dominate because it always has.
The thing I keep coming back to: Kimi K2.6 is open-weight. Practically free, all things considered. It's not just keeping up with Opus 4.7, it's beating it on tasks where I expected the gap to be obvious. The 3D solar system is the example I'll keep telling people about. Opus gave me dots. Kimi gave me Earth.
Will I switch defaults? For most day-to-day work, probably yes. K2.6 needs a bit more prompting care to consistently hit the bar Opus hits effortlessly, but it's not far off. The cost difference matters when you're running hundreds of generations a week.
For motion graphics work I'm staying on Opus. That's not close.
For everything else, K2.6 has earned a serious look.
What this means for local AI users
K2.6 isn't a local model. At its full size you're running it through OpenRouter or one of the hosted providers. The trend it represents is the one that matters: open-weight models are catching up faster than the closed-source roadmap admits.
Six months ago this comparison would have been a blowout. Today it's 4 to 3. A year from now the question won't be "can the open model match the frontier," it'll be "why am I still paying frontier prices."
If you're already running smaller models locally (Qwen, Gemma, Llama) and wondering when the open ecosystem becomes a real default, this comparison is your answer. For most tasks, we're already there. The rest is deployment.
The five prompts
Reproducible end to end. Run them yourself against any model and score them however you like. The longer ones are collapsed, click to expand.
Build a single-file HTML artifact: a live particle-based fluid simulation with 5,000 particles rendered on canvas at 60fps. Include mouse interaction (attract/repel on click), adjustable viscosity and gravity sliders, and a color mode that shifts particle hue based on velocity. Use vanilla JS only, no libraries. Include a subtle glow effect and a minimal dark UI with monospaced readouts for FPS, particle count, and average velocity.
Create a single-file interactive artifact that visualizes 8 pathfinding algorithms (A*, Dijkstra, BFS, DFS, Greedy Best-First, Bidirectional, Jump Point Search, Theta*) running simultaneously on the same grid in a 4x2 layout. User can draw walls by clicking and dragging, place start/end points, and hit "race" to watch all 8 solve in parallel with smooth animations. Show per-algorithm stats (nodes explored, path length, time). Make it production-quality: clean typography, proper color theory, no generic AI aesthetic.
Build a single-file HTML artifact using Three.js: an interactive solar system where each planet is clickable and reveals a detailed info panel with orbital mechanics data. Include realistic relative orbital speeds, a time controller (pause/1x/100x/10000x), camera controls that smoothly transition when a planet is selected, and a "trajectory prediction" mode that draws dotted lines showing where each planet will be in X days. Add a subtle starfield background and make the UI feel like a NASA mission control panel.
04
Remotion explainer (40s, 60fps, 4 scenes)
Creative · Remotion
Build a complete Remotion v4 composition called OpenSourceRise for an educational explainer about the rise of open source AI models. The video must be exactly 40 seconds at 60fps (2400 frames total), 1920x1080, with 4 distinct scenes of 10 seconds each (600 frames per scene). Use TypeScript, spring animations from Remotion, interpolate with proper easing (Easing.bezier), and a vibrant color palette built on electric purple (#7C3AED), neon cyan (#06B6D4), hot coral (#FB7185), and lime (#84CC16) against a deep navy (#0B1020) background. All typography should use a modern geometric sans-serif (Inter or Space Grotesk loaded via @remotion/google-fonts).
Scene 1 (0:00–0:10), The Closed Era. Open on a single massive black monolith labeled "CLOSED MODELS" centered on screen. Camera slowly dollies in from a wide shot while smaller locked-padlock icons orbit around it in 3D-feeling parallax. At frame 300, the title "2020–2022" stamps in with a glitch effect. End the scene with the camera pushing directly into the monolith until the screen goes black.
Scene 2 (0:10–0:20), The Crack. Hard cut to a bright burst. A single glowing seed labeled "LLaMA" appears at center and cracks open like a fracture spreading across the screen, spawning animated nodes for Mistral, Falcon, Qwen, DeepSeek, and Kimi. Each node flies in from a different edge with spring physics and connects to the others via animated bezier curves that draw themselves. Camera performs a slow orbit (simulated via transform rotateY + perspective) around the growing network. Add floating stat callouts: "1000+ models on HuggingFace", "Weights released", "Apache 2.0".
Scene 3 (0:20–0:30), The Explosion. Camera zooms out rapidly to reveal a massive grid of 60+ model logos/names pulsing in sync to an implied beat (scale 1.0 → 1.08 every 20 frames, staggered). Overlay an animated bar chart in the lower third showing parameter counts growing from 7B → 70B → 400B+ with numbers counting up smoothly. Include a ticker at the top scrolling right-to-left with benchmark names (MMLU, HumanEval, GPQA, SWE-Bench). Camera does a smooth horizontal whip-pan across the grid.
Scene 4 (0:30–0:40), The Future is Open. Everything collapses inward into a single glowing sphere that expands to fill the screen. Reveal the closing title "THE FUTURE IS OPEN" with each letter dropping in on spring physics, staggered by 3 frames. Below it, a subtitle fades in: "Frontier intelligence, in everyone's hands." Camera slowly pulls back while particles drift outward. Final 30 frames: hold on the composition with a subtle breathing scale animation.
05
Nexus dashboard recreation
Frontend · One-shot UI
Build a single-file HTML artifact recreating a SaaS analytics dashboard called Nexus with pixel-level attention to layout, spacing, and visual hierarchy. No libraries except Tailwind (via CDN) and Lucide icons (via CDN). All charts must be hand-built with inline SVG. Do not use Chart.js, Recharts, or any charting library. Use a light theme on a soft gray page background (#F3F4F6) with white cards, rounded-2xl corners, and a subtle 1px border (#E5E7EB). Accent color is indigo/violet (#6366F1 to #8B5CF6 gradient range) with teal (#14B8A6) and lime (#A3E635) as secondary chart colors. Typography is Inter, tight letter-spacing on numbers.
Shell. Simulate a macOS browser chrome at the very top: traffic light dots (red, yellow, green), back/forward arrows, a tab bar, and a centered URL pill showing "nexus.io". Below the chrome, render the app itself inside a rounded container with a soft drop shadow.
Left sidebar (240px wide, white background, full height). Top: "Nexus" wordmark with a small violet logo mark. Search input with "Search" placeholder and a ⌘F keyboard hint on the right. Three grouped nav sections with small uppercase gray labels. GENERAL: Dashboard (active, with violet background tint, violet text, left icon), Payment, Customers, Message (with a gray "8" badge on the right). TOOLS: Product, Invoice, Analytics, Automation (with a small violet "BETA" pill). SUPPORT: Settings, Security, Help. Each nav item has a Lucide icon on the left, 14px text, and generous vertical padding. Bottom of sidebar: a small card showing "Team Marketing" with an avatar, and an "Upgrade Plan" button below it. Footer micro-text: "© 2023 Nexus.io Inc."
Top bar. Left: "Dashboard" as a large 28px semibold heading. Right: four controls in a row: a date range pill showing "Oct 18 – Nov 18", a "Monthly" dropdown, a Filter button with icon, and an "Export" button with icon. Far right: three circular icon buttons (bag, bell with red dot, settings) and a user chip showing avatar + "Young Alaska" + "Business" subtitle.
Top stat row (3 cards, equal width). Page Views: 12,450 with a green "15.8% ↗" pill. Total Revenue: $363.95 with a red "34.0% ↘" pill. Bounce Rate: 86.5% with a green "24.2% ↗" pill. Each card has a tiny icon next to the label and an info ⓘ icon on the right.
Middle row, Sales Overview (2/3 width) and Total Subscriber (1/3 width). Sales Overview card: Label + "$9,257.51" large number + small green "+$143.50 increased" caption. Top right: "Filter" and "Sort" buttons. The chart itself is a grouped stacked bar chart for Oct, Nov, Dec, each month has 4 stacked bars side-by-side in different widths/heights, using blue → violet → teal → light teal gradients. Above each group, show a floating value label with a connector line (e.g., "$2,988.20", "$1,765.09", "$4,005.65"). Bottom legend: China, UK, USA, Canada with colored dots. Total Subscriber card: Label + "24,473" large number + green "+749 increased" caption with "Weekly" dropdown top right. Chart is a vertical bar chart with 7 bars (Sun–Sat) where Tue is the tallest and highlighted in solid violet with "3,874" label floating above it; other bars are light gray.
Bottom row, Sales Distribution (left) and List of Integration (right). Sales Distribution: three metrics, Website $374.82, Mobile App $241.60, Other $213.42, above a horizontal half-donut gauge chart in violet/teal with a gray track. "Monthly" dropdown top right. List of Integration: Table with columns APPLICATION, TYPE, RATE, PROFIT. Three rows. Stripe (blue S logo), Finance, 40% progress bar, $650.00. Zapier (orange logo), CRM, 80% progress bar, $720.50. Shopify (green bag logo), Marketplace, 20% progress bar, $432.25. Each row has a checkbox on the left. "See All" link top right. RATE column shows a thin horizontal progress bar with the percentage text beside it.
Polish requirements. All numbers are tabular/monospaced-feeling (use font-variant-numeric: tabular-nums). Every card has consistent 24px internal padding. Icons are 16–18px, 1.5 stroke width, never larger than the text next to them. The active nav item must have a visible left-edge treatment (subtle bar or background tint, not both). Charts must render as real SVG with gradients defined in <defs>, not CSS backgrounds faking it. Everything must be responsive down to 1280px wide without breaking the layout. The entire thing should look like a finished product screenshot, not a wireframe.
Deliver a single self-contained HTML file that renders this dashboard on load with no interactivity required beyond hover states.