SuperGemma-4 (26B) UNCENSORED + Hermes,OpenClaw,OpenCode: THIS IS  SO CRAZY!!!

SuperGemma-4 (26B) UNCENSORED + Hermes,OpenClaw,OpenCode: THIS IS SO CRAZY!!!

Introduction to Super Gemma 4

Overview of Super Gemma 4

  • The video introduces Super Gemma 4, a community fine-tuned version of Google's Gemma 4, aimed at local power users.
  • This specific release is the uncensored MLX 4-bit V2 by Jun Song on Hugging Face and is not an official Google product.

Features and Improvements

  • Super Gemma 4 is designed for less restricted local model use, enhancing usability for agent workflows compared to the stock version.
  • The original Gemma 4 has strong features like native system prompt support and function calling but lacks the openness desired by some users.

Performance Metrics

Benchmarks and Claims

  • The creator claims that Super Gemma 4 offers improved performance with a benchmark score of 95.8 versus the original's 91.4.
  • It boasts an average generation speed of 46.2 tokens per second, showing gains in various tasks including code and logic processing.

Practical Usability

  • Unlike other chaotic uncensored models, this version aims to maintain practical utility while being more open.

Setup Instructions

Installation Process

  • To use Super Gemma 4 on Apple silicon, install MLX-LM via pip and start the local server with specific commands provided in the video.

Important Notes

  • Users are advised against manually forcing a chat template path during setup to avoid corrupting responses.

Integration with Tools

Using with Hermes Agent

  • Once set up, any tool compatible with OpenAI endpoints can utilize Super Gemma 4; Hermes agent is highlighted as a suitable option.

Configuration Steps

  • Users can configure Hermes to point to their local MLX server using the custom OpenAI route for enhanced functionality.

Open Claw Compatibility

Alternative Use Cases

  • Open Claw can also leverage Super Gemma 4 through its custom provider path instead of relying on cloud APIs.

Memory Management

  • There are options available for tuning memory limits within Open Claw settings if needed.

Gemma 4: Exploring the GGUF Version

Overview of GGUF Version for Non-Mac Users

  • The GGUF version is introduced as an alternative for users not on Mac, specifically mentioning a "super Gemma 4 26B uncensored GGUF V2" available on Hugging Face.
  • This version is designed for broader compatibility with tools like llama.cpp, LM Studio, Jan, and Open Web UI, making it suitable for Windows or Linux users.

Features and Improvements

  • The GGUF variant employs a neutral embedded template to mitigate older prompt writing bugs that could lead to unintended coding modes or tool call behaviors.
  • It aims to enhance the chat experience by providing cleaner interactions when run through local servers compatible with OpenAI interfaces.

Target Audience and Use Cases

  • Super Gemma 4 serves as an uncensored option for those seeking less filtered outputs while maintaining practical applications in agent workflows such as coding and logic tasks.
  • The model is particularly beneficial for Mac users running MLX easily or those using Hermes agents and Open Claw, allowing integration into existing assistant stacks.

Community Perspective

  • The speaker expresses enthusiasm about this community-driven release, highlighting its balance between being uncensored yet practical. If successful in real-world use, it could become a preferred local variant of Gemma 4.
Video description

In this video, I'll be telling you about SuperGemma 4, a community fine-tuned uncensored version of Gemma 4 26B, and how you can run it locally with MLX on Apple Silicon or use the GGUF version on other platforms for agent workflows. -- Key Takeaways: 🚀 SuperGemma 4 is a community uncensored fine-tune of Gemma 4 26B aimed at text, coding, planning, and tool-use. 🍎 The MLX 4-bit v2 release is built for Apple Silicon and can be run as a local OpenAI-compatible server. 🤖 SuperGemma 4 pairs especially well with Hermes Agent for terminal-first local AI workflows. 🧠 It can also be connected to OpenClaw as a local reasoning model for assistant and automation tasks. ⚡ The model card claims improved speed, better benchmark scores, and stronger performance for code, logic, Korean, and browser tasks. 💾 Realistically, 24GB of unified memory is the minimum comfortable floor, while 32GB or more is the better choice. 🪟 A GGUF v2 version is also available for llama.cpp, LM Studio, Jan, Open WebUI, and other non-Mac setups. ✅ Overall, SuperGemma 4 looks like one of the more practical uncensored Gemma variants for local agent work.