Local AI Coding is Finally Good Enough - Summary

Summary

The video tests whether recent local AI models can write useful code in real‑world, production‑scale TypeScript (Excalidraw) and Rust (Warp) repositories, using a frontier model (Anthropic’s Opus 4.7) as a baseline.

- **Setup:** The author runs two quantized Qwen models on an AMD Threadripper 9980X with a Radeon AI Pro R9 700 GPU (32 GB VRAM) and 128 GB RAM:
* Qwen‑3‑72B (an 80 B‑parameter MoE, ~3 B active params) – run via llama.cpp with CPU offload.
* Qwen‑3.6‑27B (a dense 27 B model) – fits entirely in GPU VRAM.

- **Tasks:** For each codebase the model receives an “easy” task (adding a highlighter mode in Excalidraw; adding a `/clearhistory` slash command in Warp) and a “hard” task (creating a five‑pointed star shape in Excalidraw; implementing command bookmarks in Warp).

- **Results:**
* Both local models produce code that compiles and behaves correctly on the easy tasks, but their implementations are less architecturally clean than Opus 4.7 (e.g., the highlighter is stored as visual style only, not as a semantic flag).
* On the hard tasks, the local models succeed functionally but introduce subtle bugs or miss UI details (e.g., star‑collision code incorrectly reused for diamonds; missing keybind overrides; incomplete UI integration). Opus 4.7 generally follows the existing patterns more closely, though it sometimes misinterprets the exact intent (e.g., deleting whole conversations instead of clearing history).
* The Qwen‑3‑Coder‑Next (the larger MoE model) fails to compile the Warp bookmark feature due to numerous type and API mismatches, showing a clear ceiling for the current local models on complex, tightly‑coupled code.

- **Performance:** Local models take roughly **5× longer** to generate the same amount of usable code as Opus 4.7 on the same hardware.

- **Takeaway:**
* Frontier models still outperform local models in code quality, speed, and architectural fidelity.
* However, when data‑privacy, air‑gapped, or compliance constraints prohibit sending code to the cloud, locally run models can be **useful assistants**—provided the user gives very specific, broken‑down prompts and iterates on the output.
* In such settings, treating the local model like a junior programmer that needs clear, incremental instructions yields practical help on mundane or well‑scoped tasks, freeing the developer to focus on more interesting work.

In short, local AI models are now “good enough” to aid development in restricted environments, but they are not yet a drop‑in replacement for state‑of‑the‑art cloud‑based coding assistants.

Facts

1. The speaker wanted to make a video for a long time but could not because local AI was not effective at coding.
2. Recently, local AI has improved and is now capable of coding tasks.
3. The video demonstrates local AI coding in a real TypeScript codebase (Excalidraw) and a real Rust codebase (Warp).
4. Excalidraw and Warp are production software used by thousands or millions of people.
5. Frontier models from OpenAI and Anthropic are hosted in the cloud and are subsidized, making them cheaper and better for many users.
6. Some developers cannot use cloud models due to ITAR‑controlled code, HIPAA regulations, IP‑sensitive work, or finance hedge‑fund policies that prohibit data leaving the building.
7. Alternatives such as BAAs, FedRAMP, or GovCloud still require company approvals for provider, model, region, and data flow.
8. Certain companies prohibit any use of third‑party processors.
9. The speaker’s friends either handwrite code or run local AI models on their own hardware.
10. The local AI models used are Qwen‑3‑72B (an 80‑billion‑parameter MoE model with ~3 billion active parameters per token) and Qwen‑3.6‑27B (a 27‑billion‑parameter dense model).
11. The frontier baseline model used is Opus 4.7.
12. The hardware consists of an AMD Ryzen Threadripper 9980X CPU, an AMD Radeon AI Pro R9 700 GPU with 32 GB VRAM, 128 GB DDR5 RAM, and Ubuntu 26.04.
13. The Qwen‑3.6‑27B quantized model runs fully on the GPU; the Qwen‑3‑72B model uses llama.cpp with MoE CPU offload.
14. The speaker used llama.cpp to run the models.
15. For each codebase, each model received two tasks: an easier task (follow existing patterns) and a harder task (understand system and touch architecture).
16. In Excalidraw, the easier task was to add a highlighter mode to the free‑draw tool.
17. Both Opus and Qwen‑3.6 passed the TypeScript check and implemented the highlighter feature.
18. Opus implemented highlighter as a real property on free‑draw elements; Qwen‑3.6 created the element with modified normal properties (stroke width and opacity) without preserving the highlighter intent in the data model.
19. The harder Excalidraw task was to create a five‑pointed‑star shape.
20. Both Opus and Qwen‑3 Coder Next got the star to work.
21. Opus overrode the existing “5” keybind; Qwen‑3 Coder Next did not assign a keybind.
22. Opus kept star‑specific geometry separate; Qwen‑3 Coder Next generalized diamond and star collision handling, causing diamond collision to use star points (a bug).
23. In Warp, the easier task was to add a clear‑history slash command.
24. Opus 4.7 and Qwen‑3.6 both passed cargo check.
25. Opus wired the command through the existing workspace action and confirmation dialogue but deleted the entire conversation instead of clearing history for the current pane.
26. Qwen‑3.6 truncated the conversation from the first exchange after a second Enter press, but the history persisted after restarting the application.
27. The harder Warp task was to implement command bookmarks.
28. Opus added a command bookmarks module, persistence changes, SQL‑light schema, terminal context‑menu handling, and a left‑panel view, but created a new panel type not integrated into the existing UI, omitted the expected icon, and inserted the command without executing it.
29. Qwen‑3 Coder Next touched persistence, terminal view, action wiring, left‑panel UI, and schema changes but failed to compile due to 47 previous errors (missing UI variables, wrong Warp UI APIs, type mismatches, missing enum variants, non‑exhaustive matches, moved values).
30. The local AI models required at least five times longer to complete the tasks than Opus.
31. The speaker notes that breaking tasks into smaller steps and providing more context improves local model performance.

← Previous Summary Main Page Next Summary →