Qwen 3.6 vs Gemma 4: I Built the Same App With Both Locally
Comparing AI Models: Quen 3.6 vs. Gemma 4
Introduction to the Comparison
- The speaker has tested Quen 3.6 and found it ready to replace their current main model, Gemma 4.
- The goal is to compare both models based on personal suitability rather than objective benchmarking.
- A comprehensive test will be designed to explore the limits of each model.
Test Design and Application Concept
- The speaker recalls a need for a markdown file viewer for Mac OS, which inspires the test project.
- They plan to build a desktop app primarily for viewing markdown files with some editing capabilities using the Tori framework.
Model Specifications and Setup
- Both models will be compared using their largest dense versions; Quen has 27 billion parameters while Gemma has 31 billion.
- The models will run on a desktop computer accessed via a local network from a MacBook, emphasizing the importance of sufficient graphics card memory.
Implementing with Quen 3.6
Initial Steps with Quen
- A project description file is created in both model folders, starting with Quen's implementation.
- The first task given to Quen is to analyze the description and create an implementation plan broken down into smaller tasks.
Results from Quen's Implementation
- After about four minutes, Quen produces a detailed development plan divided into phases and specific tasks.
- The speaker initiates the project setup in Open Code, allowing the model to review all relevant files before proceeding.
Stress Testing Quen's Capabilities
Execution of Tasks by Quen
- To stress-test the model, all tasks are requested at once instead of phase-by-phase execution.
- It takes approximately 46 minutes for Quen to complete its work on this complex task.
Issues Encountered During Launch
- Upon attempting to launch the application generated by Quen, errors arise related to server startup configurations.
- Additional minor issues are identified in Rust code that require manual fixes before successful launch.
Evaluating Output from Quen
Functionality Assessment
- Despite initial errors, the application launches successfully after adjustments; basic functionality appears promising.
- Features like text input and real-time preview work correctly but some toolbar buttons do not respond as expected.
Transitioning to Gemma 4
Setting Up Gemma for Comparison
- Moving on to Gemma 4, similar project files are used as those for Quen; tasks are repeated verbatim for consistency.
Performance Insights from Gemma
- Gemma completes its planning stage faster than Quen at around two and a half minutes while producing a comparable breakdown of tasks.
Implementation Process with Gemma
Task Execution Speed
-Gemma finishes implementing its plan in just 20 minutes—half the time taken by Quen—while also listing completed tasks clearly at completion.
Launching Issues Identified
- Similar launch issues occur as seen with Quen; problems relate specifically to Rust code concerning filesystem access requiring debugging efforts.
Final Evaluation of Both Models
Successful Completion
- After resolving configuration issues, both applications function correctly showcasing effective text input and rendering features.
Conclusion on Model Performance
- While both models performed well under stress testing conditions, differences noted include:
- Quen: More detailed planning but more initial errors needing correction pre-launch.
- Gemma: Faster execution but missed certain functionalities outlined in its own plan (e.g., toolbar buttons).
Future Considerations
- Speaker expresses intent to use both models moving forward while seeking audience feedback on their experiences with either model.