Claude's leaked source code is absolutely UNHINGED!
Anthropic's AI Safety and the Source Code Leak
Overview of Anthropic's Commitment to Safety
- Anthropic is an AI company emphasizing safety, having created a 30-page policy on responsible AI.
- The company has previously declined Pentagon contracts due to safety concerns.
- On March 31st, they accidentally released their entire source code online due to a single line missing in a settings file.
Details of the Source Code Leak
- This incident marked the second occurrence of such a leak within 13 months, raising questions about internal processes.
- The leaked code included sensitive features like stealth mode for hiding AI authorship in open-source contributions.
Technical Aspects of the Leak
- Developers typically scramble code before shipping it; however, a source map linking back to the original version was inadvertently included.
- A previous similar leak occurred on Cloud Code's launch day in February 2025, where developers found another answer key buried in the download.
Consequences and Reactions
- The recent leak gained significant attention after being highlighted by an intern at a security firm, leading to millions of views on social media.
- An open bug report indicated that Anthropic’s build tool may have caused this repeated error during the shipping process.
Insights from the Leaked Code
- Within the leaked source code were over 1,900 files containing hidden features and unreleased models.
- Notable unreleased features included "Chyros," which automates background tasks while users sleep, and "Ultra plan," which offloads tasks to remote servers.
Hidden Features and Internal Names
- The code also revealed internal names for unreleased models like Capiara and Numbat, along with references to Opus 4.7—an unannounced model.
- A Tamagotchi-style virtual pet named Buddy was discovered within Claude Code, featuring various species and RPG stats.
Undercover Mode: A Controversial Feature
- One surprising feature was "Undercover mode," designed for employees contributing to public projects without revealing their affiliation with Anthropic.
- This mode instructively strips any identifiable information from commit messages or pull requests made by employees.
Concerns Over Anthropic's Security Practices
Internal Code Names and Leaks
- The discussion begins with the concern that companies like Anthropic, which use internal code names, may prioritize secrecy over transparency. This raises questions about whether they are hiding AI authorship from the communities they serve.
- Just five days before a significant source code leak, another incident occurred where a misconfigured website exposed around 3,000 internal files related to an unreleased AI model named Mythos, which was deemed to pose serious cybersecurity risks.
Company Response and Community Skepticism
- Anthropic attributed the leaks to human error in release packaging rather than a security breach, claiming it was an honest mistake without any personnel consequences. However, this explanation was met with skepticism from the community.
- The trustworthiness of a company focused on AI safety is questioned when it makes repeated mistakes in software packaging. This inconsistency raises concerns about their capability to manage artificial general intelligence safely.
Defensive Measures and Community Reactions
- A "poison pill" mechanism is embedded within Claude Code that sends fake features alongside real ones during server communication to deter competitors from copying its functionality.
- Following the leaks, some community members advocated for open-sourcing the code while others quickly forked it and began creating Python ports. Malicious actors also exploited the situation by uploading fake software packages that contained malware.
Implications of Code Quality and Development Practices
Insights into Claude Code Functionality
- The video explores how Claude Code operates behind the scenes; when conversations become lengthy, a secondary smaller Claude processes them to summarize key points while discarding irrelevant information.
- Internally at Anthropic, employees have access to a powerful tool called Tungsten that allows direct control over virtual terminals—capabilities not available in public versions of Claude.
Security Concerns Arising from Summarization Process
- A critical finding reveals that poisoned instructions can survive through summaries generated by the second Claude. This means attackers could potentially manipulate outcomes without needing direct interaction with the AI.
Naming Conventions as Safety Mechanisms
- The function names within the code itself serve as warnings against unsafe practices; for instance, one function explicitly instructs developers not to log secrets.
- Comments left by developers reveal casual attitudes towards coding standards; one engineer noted uncertainty about performance improvements yet proceeded with deployment anyway.
AI Contributions and Quality Control Issues
Automation vs Manual Processes
- The head of Cloud Code disclosed that all contributions were generated by AI itself—raising alarms about quality control given past errors in deployment processes.
Verification Methods Employed
- To catch bugs post-deployment, Anthropic uses Claude as a built-in verification agent designed to run tests on its own code but relies on manual checks instead of automated systems—a practice criticized for being prone to oversight.
This structured summary captures key discussions surrounding Anthropic's recent security incidents and development practices while providing timestamps for easy reference back to specific moments in the transcript.
AI Code Writing: A Double-Edged Sword?
The Complexity of AI-Generated Code
- The AI is programmed to monitor its own performance, attempting to eliminate laziness in code writing. However, it still allows errors, such as a missing settings line, to slip through.
- A file named "ripple" contains 5,000 lines of code and relies on 219 imports. This complexity categorizes it as a "god object," which is problematic due to its extensive dependencies.
- Another file, "print ts," exceeds even this size with nearly 5,594 lines. While large files can be acceptable in some contexts (like the Linux kernel), these are not maintained by expert engineers but rather by AI.
- The design choice for large files stems from efficiency; splitting code into smaller files slows down the AI's processing time and increases costs. Thus, the current structure is optimized for AI functionality rather than human readability.
Implications of Current Practices
- There are concerns about whether this approach signifies a troubling trend in software development—where code written and organized by AI may lead to unmanageable complexity that goes unnoticed.
Future Developments in AI Software
- Anthropic appears to be developing advanced tools beyond simple coding assistance, including Chyros (an always-on agent), Ultra Plan (a project planning tool), and Tungsten Tool (exclusive for employees).
- Features like undercover mode and poison pill strategies indicate an intention to protect proprietary information while hinting at the creation of an autonomous AI workforce. The release of source code raises questions about transparency and trust in cloud-based solutions.