Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally

Name: Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally
Uploaded: 2026-05-13T18:42:29.000Z
Duration: 20 min 35 s

Comparing AI Models: Quen 3.6 vs. Gemma 4

Introduction to the Comparison

The speaker has tested Quen 3.6 and found it ready to replace their current main model, Gemma 4.

The goal is to compare both models based on personal suitability rather than objective benchmarking.

A comprehensive test will be designed to explore the limits of each model.

Test Design and Application Concept

The speaker recalls a need for a markdown file viewer for Mac OS, which inspires the test project.

They plan to build a desktop app primarily for viewing markdown files with some editing capabilities using the Tori framework.

Model Specifications and Setup

Both models will be compared using their largest dense versions; Quen has 27 billion parameters while Gemma has 31 billion.

The models will run on a desktop computer accessed via a local network from a MacBook, emphasizing the importance of sufficient graphics card memory.

Implementing with Quen 3.6

Initial Steps with Quen

A project description file is created in both model folders, starting with Quen's implementation.

The first task given to Quen is to analyze the description and create an implementation plan broken down into smaller tasks.

Results from Quen's Implementation

After about four minutes, Quen produces a detailed development plan divided into phases and specific tasks.

The speaker initiates the project setup in Open Code, allowing the model to review all relevant files before proceeding.

Stress Testing Quen's Capabilities

Execution of Tasks by Quen

To stress-test the model, all tasks are requested at once instead of phase-by-phase execution.

It takes approximately 46 minutes for Quen to complete its work on this complex task.

Issues Encountered During Launch

Upon attempting to launch the application generated by Quen, errors arise related to server startup configurations.

Additional minor issues are identified in Rust code that require manual fixes before successful launch.

Evaluating Output from Quen

Functionality Assessment

Despite initial errors, the application launches successfully after adjustments; basic functionality appears promising.

Features like text input and real-time preview work correctly but some toolbar buttons do not respond as expected.

Transitioning to Gemma 4

Setting Up Gemma for Comparison

Moving on to Gemma 4, similar project files are used as those for Quen; tasks are repeated verbatim for consistency.

Performance Insights from Gemma

Gemma completes its planning stage faster than Quen at around two and a half minutes while producing a comparable breakdown of tasks.

Implementation Process with Gemma

Task Execution Speed

-Gemma finishes implementing its plan in just 20 minutes—half the time taken by Quen—while also listing completed tasks clearly at completion.

Launching Issues Identified

Similar launch issues occur as seen with Quen; problems relate specifically to Rust code concerning filesystem access requiring debugging efforts.

Final Evaluation of Both Models

Successful Completion

After resolving configuration issues, both applications function correctly showcasing effective text input and rendering features.

Conclusion on Model Performance

While both models performed well under stress testing conditions, differences noted include:

Quen: More detailed planning but more initial errors needing correction pre-launch.

Gemma: Faster execution but missed certain functionalities outlined in its own plan (e.g., toolbar buttons).

Future Considerations

Speaker expresses intent to use both models moving forward while seeking audience feedback on their experiences with either model.

Video description

Which local LLM should be my daily coding driver — Qwen 3.6 or Gemma 4? Instead of running abstract benchmarks, I gave both models the exact same real-world task: build a cross-platform markdown viewer/editor desktop app using Tauri. Same prompt, same hardware, same workflow through OpenCode. Here's what happened. I'm running the 27B Qwen 3.6 and the 31B Gemma 4 — both Dense models, because for code generation Dense architectures consistently deliver better results than MoE at comparable sizes. ⏱️ Timestamps: 00:00 Intro 00:28 The test idea 01:40 Hardware and model setup 02:38 Qwen 3.6 — planning phase 03:41 Qwen 3.6 — full project implementation (46 minutes) 04:22 Launching Qwen's app — debugging and results 06:14 Gemma 4 — planning phase 07:15 Gemma 4 — full project implementation (20 minutes) 07:33 Power consumption note 08:03 Launching Gemma's app — debugging and results 09:26 Final comparison and conclusions Which model are you using locally for coding tasks? Drop your experience in the comments — I'd genuinely like to hear what's working for you. 👍 If this comparison helped, leave a like and subscribe for more local AI, self-hosted tools, and single-board computing content. #LocalLLM #Qwen #Gemma #SelfHosted #AICoding