Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally

Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally

Comparing AI Models: Quen 3.6 vs. Gemma 4

Introduction to the Comparison

  • The speaker has tested Quen 3.6 and found it ready to replace their current main model, Gemma 4.
  • The goal is to compare both models based on personal suitability rather than objective benchmarking.
  • A comprehensive test will be designed to explore the limits of each model.

Test Design and Application Concept

  • The speaker recalls a need for a markdown file viewer for Mac OS, which inspires the test project.
  • They plan to build a desktop app primarily for viewing markdown files with some editing capabilities using the Tori framework.

Model Specifications and Setup

  • Both models will be compared using their largest dense versions; Quen has 27 billion parameters while Gemma has 31 billion.
  • The models will run on a desktop computer accessed via a local network from a MacBook, emphasizing the importance of sufficient graphics card memory.

Implementing with Quen 3.6

Initial Steps with Quen

  • A project description file is created in both model folders, starting with Quen's implementation.
  • The first task given to Quen is to analyze the description and create an implementation plan broken down into smaller tasks.

Results from Quen's Implementation

  • After about four minutes, Quen produces a detailed development plan divided into phases and specific tasks.
  • The speaker initiates the project setup in Open Code, allowing the model to review all relevant files before proceeding.

Stress Testing Quen's Capabilities

Execution of Tasks by Quen

  • To stress-test the model, all tasks are requested at once instead of phase-by-phase execution.
  • It takes approximately 46 minutes for Quen to complete its work on this complex task.

Issues Encountered During Launch

  • Upon attempting to launch the application generated by Quen, errors arise related to server startup configurations.
  • Additional minor issues are identified in Rust code that require manual fixes before successful launch.

Evaluating Output from Quen

Functionality Assessment

  • Despite initial errors, the application launches successfully after adjustments; basic functionality appears promising.
  • Features like text input and real-time preview work correctly but some toolbar buttons do not respond as expected.

Transitioning to Gemma 4

Setting Up Gemma for Comparison

  • Moving on to Gemma 4, similar project files are used as those for Quen; tasks are repeated verbatim for consistency.

Performance Insights from Gemma

  • Gemma completes its planning stage faster than Quen at around two and a half minutes while producing a comparable breakdown of tasks.

Implementation Process with Gemma

Task Execution Speed

-Gemma finishes implementing its plan in just 20 minutes—half the time taken by Quen—while also listing completed tasks clearly at completion.

Launching Issues Identified

  • Similar launch issues occur as seen with Quen; problems relate specifically to Rust code concerning filesystem access requiring debugging efforts.

Final Evaluation of Both Models

Successful Completion

  • After resolving configuration issues, both applications function correctly showcasing effective text input and rendering features.

Conclusion on Model Performance

  • While both models performed well under stress testing conditions, differences noted include:
  • Quen: More detailed planning but more initial errors needing correction pre-launch.
  • Gemma: Faster execution but missed certain functionalities outlined in its own plan (e.g., toolbar buttons).

Future Considerations

  • Speaker expresses intent to use both models moving forward while seeking audience feedback on their experiences with either model.
Video description

Which local LLM should be my daily coding driver — Qwen 3.6 or Gemma 4? Instead of running abstract benchmarks, I gave both models the exact same real-world task: build a cross-platform markdown viewer/editor desktop app using Tauri. Same prompt, same hardware, same workflow through OpenCode. Here's what happened. I'm running the 27B Qwen 3.6 and the 31B Gemma 4 — both Dense models, because for code generation Dense architectures consistently deliver better results than MoE at comparable sizes. ⏱️ Timestamps: 00:00 Intro 00:28 The test idea 01:40 Hardware and model setup 02:38 Qwen 3.6 — planning phase 03:41 Qwen 3.6 — full project implementation (46 minutes) 04:22 Launching Qwen's app — debugging and results 06:14 Gemma 4 — planning phase 07:15 Gemma 4 — full project implementation (20 minutes) 07:33 Power consumption note 08:03 Launching Gemma's app — debugging and results 09:26 Final comparison and conclusions Which model are you using locally for coding tasks? Drop your experience in the comments — I'd genuinely like to hear what's working for you. 👍 If this comparison helped, leave a like and subscribe for more local AI, self-hosted tools, and single-board computing content. #LocalLLM #Qwen #Gemma #SelfHosted #AICoding