SuperGemma-4 (26B) UNCENSORED + Hermes,OpenClaw,OpenCode: THIS IS SO CRAZY!!!

Name: SuperGemma-4 (26B) UNCENSORED + Hermes,OpenClaw,OpenCode: THIS IS SO CRAZY!!!
Uploaded: 2026-04-16T09:15:03.000Z
Duration: 16 min 15 s

Introduction to Super Gemma 4

Overview of Super Gemma 4

The video introduces Super Gemma 4, a community fine-tuned version of Google's Gemma 4, aimed at local power users.

This specific release is the uncensored MLX 4-bit V2 by Jun Song on Hugging Face and is not an official Google product.

Features and Improvements

Super Gemma 4 is designed for less restricted local model use, enhancing usability for agent workflows compared to the stock version.

The original Gemma 4 has strong features like native system prompt support and function calling but lacks the openness desired by some users.

Performance Metrics

Benchmarks and Claims

The creator claims that Super Gemma 4 offers improved performance with a benchmark score of 95.8 versus the original's 91.4.

It boasts an average generation speed of 46.2 tokens per second, showing gains in various tasks including code and logic processing.

Practical Usability

Unlike other chaotic uncensored models, this version aims to maintain practical utility while being more open.

Setup Instructions

Installation Process

To use Super Gemma 4 on Apple silicon, install MLX-LM via pip and start the local server with specific commands provided in the video.

Important Notes

Users are advised against manually forcing a chat template path during setup to avoid corrupting responses.

Integration with Tools

Using with Hermes Agent

Once set up, any tool compatible with OpenAI endpoints can utilize Super Gemma 4; Hermes agent is highlighted as a suitable option.

Configuration Steps

Users can configure Hermes to point to their local MLX server using the custom OpenAI route for enhanced functionality.

Open Claw Compatibility

Alternative Use Cases

Open Claw can also leverage Super Gemma 4 through its custom provider path instead of relying on cloud APIs.

Memory Management

There are options available for tuning memory limits within Open Claw settings if needed.

Gemma 4: Exploring the GGUF Version

Overview of GGUF Version for Non-Mac Users

The GGUF version is introduced as an alternative for users not on Mac, specifically mentioning a "super Gemma 4 26B uncensored GGUF V2" available on Hugging Face.

This version is designed for broader compatibility with tools like llama.cpp, LM Studio, Jan, and Open Web UI, making it suitable for Windows or Linux users.

Features and Improvements

The GGUF variant employs a neutral embedded template to mitigate older prompt writing bugs that could lead to unintended coding modes or tool call behaviors.

It aims to enhance the chat experience by providing cleaner interactions when run through local servers compatible with OpenAI interfaces.

Target Audience and Use Cases

Super Gemma 4 serves as an uncensored option for those seeking less filtered outputs while maintaining practical applications in agent workflows such as coding and logic tasks.

The model is particularly beneficial for Mac users running MLX easily or those using Hermes agents and Open Claw, allowing integration into existing assistant stacks.

Community Perspective

The speaker expresses enthusiasm about this community-driven release, highlighting its balance between being uncensored yet practical. If successful in real-world use, it could become a preferred local variant of Gemma 4.

Video description

In this video, I'll be telling you about SuperGemma 4, a community fine-tuned uncensored version of Gemma 4 26B, and how you can run it locally with MLX on Apple Silicon or use the GGUF version on other platforms for agent workflows. -- Key Takeaways: 🚀 SuperGemma 4 is a community uncensored fine-tune of Gemma 4 26B aimed at text, coding, planning, and tool-use. 🍎 The MLX 4-bit v2 release is built for Apple Silicon and can be run as a local OpenAI-compatible server. 🤖 SuperGemma 4 pairs especially well with Hermes Agent for terminal-first local AI workflows. 🧠 It can also be connected to OpenClaw as a local reasoning model for assistant and automation tasks. ⚡ The model card claims improved speed, better benchmark scores, and stronger performance for code, logic, Korean, and browser tasks. 💾 Realistically, 24GB of unified memory is the minimum comfortable floor, while 32GB or more is the better choice. 🪟 A GGUF v2 version is also available for llama.cpp, LM Studio, Jan, Open WebUI, and other non-Mac setups. ✅ Overall, SuperGemma 4 looks like one of the more practical uncensored Gemma variants for local agent work.