Stop Fixing Your Claude Skills. Autoresearch Does It For You

Stop Fixing Your Claude Skills. Autoresearch Does It For You

How to Improve Cloud Code Skills with Auto Research

Introduction to Cloud Code Skills

  • The speaker expresses enthusiasm for Cloud Code skills but notes their unreliability, achieving intended output only 70% of the time.
  • The goal is to combine Cloud Code skills with a new AI development called Auto Research for improved reliability and accuracy.

Overview of Auto Research

  • Auto Research was introduced by Andre Carpathy, a former OpenAI member, allowing agents to autonomously optimize processes.
  • In this context, the focus is on improving skills over time by refining prompts using the Auto Research methodology.

Key Components of the Auto Research Repo

  • The relevant files in the GitHub repo are prepare.py, train.py, and program.md.
  • While prepare.py deals with machine learning specifics, train.py and program.md are crucial for skill improvement.

Implementing Skill Improvement

  • Users will provide a prompt in program.md instructing an agent to enhance the skill based on Auto Research methods.
  • Evaluation criteria (eval metrics) will be established to measure skill performance improvements over time.

Real-world Application Example

  • The speaker shares a personal experience where they used Auto Research on an old app, significantly reducing load speed from 1100ms to 67ms through iterative testing.
  • This method resulted in an 81.3% improvement in performance metrics, showcasing potential gains for skill accuracy as well.

Essential Ingredients for Successful Implementation

  • Three key components are necessary:
  • An objective metric (e.g., evaluation pass rate).
  • A reliable measurement tool that operates without human intervention.
  • Something tangible to change (e.g., skill instructions or prompts).

Conclusion and Future Implications

  • By leveraging these strategies, users can continuously improve their skills while also generating valuable research data that can inform future iterations or models.

Skill Improvement Through Evaluation

Setting Up the Skill Evaluation Process

  • The agent will receive instructions to evaluate its performance against a suite of tests, aiming for continuous improvement every five minutes.

Understanding Prompts and Their Variability

  • Skills are defined as prompts, which can yield different results each time they are run due to their inherent noise. A standardized approach is necessary for consistent quality improvement.

Importance of Repeated Testing

  • To assess skill outputs effectively, multiple runs are required to identify the mode (most frequent result) and median (average), acknowledging that AI outputs represent data distributions.

Benchmarking Performance

  • Just like academic testing assesses knowledge, skills must be benchmarked using binary questions (yes/no) to evaluate their effectiveness systematically.

Criteria for High-Quality Diagrams

  • Four criteria have been established for evaluating diagrams:
  • Legibility and grammatical correctness of text.
  • Adherence to a defined color palette (pastel colors).
  • Linear structure (left-to-right or top-to-bottom orientation).
  • Absence of numbers or ordinals in the design.

Creating an Automated Evaluation System

Initial Setup Requirements

  • Communication with Claude Code is essential; this example uses an anti-gravity window with the Claude Code extension.

Utilizing External Resources

  • The Andre Carpathy auto research repository needs to be accessed and integrated into the evaluation process.

Defining the Evaluation Test Suite

  • A voice transcription tool called Whisper Flow will be used to instruct the system on building a self-improving skill system based on predefined constraints.

Execution Plan for Diagram Generation

  • Every two minutes, ten diagrams will be generated based on specific functions. These will undergo evaluation through the test suite, adjusting prompts as needed until optimal results are achieved.

Overview of Diagram Generator Skill

  • The diagram generator skill focuses on creating clean hand-drawn style diagrams from natural language inputs, emphasizing clarity and professionalism in design.

How to Optimize Skills Using Auto Research

Overview of the Process

  • The output is designed to resemble a whiteboard sketch with pastel colors and simple icons. The process involves sending requests to Nano Banana Pro 2, which generates content that can be pasted into Excaladraw.
  • Each generation costs approximately 2 cents, leading to an estimated total of $10 for optimizing skills over 50 tests, which is a positive return on investment given potential ad revenue from YouTube videos.

Scoring Mechanism

  • The speaker clarifies their scoring mechanism: generating 10 images evaluated against four criteria, resulting in a maximum score of 40.
  • A real-time dashboard displays results, showing initial scores and improvements over iterations. For example, one experiment improved from a score of 32 to 37.

Importance of Evaluation Criteria

  • Different users may have varying definitions of "good," emphasizing the need for personalized evaluation metrics. Time invested in running evaluations significantly impacts outcomes.
  • Recommendations include defining simple yes/no evaluation criteria to streamline the assessment process and improve efficiency.

Automation and Iteration

  • The system autonomously runs evaluations every two minutes while mutating prompts based on previous results. This method can be applied across various skills for optimization.
  • The speaker plans to create a meta skill that optimizes all skills in their repository by leveraging this automated research approach.

Tips for Effective Evaluation

  • Successful runs produce high-quality outputs with minimal errors (e.g., achieving a score of 39 out of 40). Simple binary evaluations are recommended for clarity.
  • Avoid overly strict criteria that could lead models to optimize for irrelevant factors rather than quality content.

Conclusion and Resources

  • Users are encouraged to adopt these strategies without barriers such as email sign-ups or gatekeeping.
  • Emphasizing simplicity in evaluation will yield better results; complex scoring systems may introduce variability that complicates outcomes.

Auto Research Applications

Exploring the Versatility of Auto Research

  • Auto research can be applied to a multitude of areas beyond just skills and prompts, including websites and landing pages.
  • It is useful for split testing various elements such as titles, thumbnails, and emails.
  • The speaker emphasizes the flexibility of auto research, suggesting it can be utilized in virtually any context desired by users.
  • There is an ongoing evolution within the ecosystem as individuals discover more effective methods for implementing auto research over time.
  • For those who may find certain aspects confusing, particularly regarding Claude portions, additional resources are recommended for further understanding.
Video description

๐Ÿ”ฅ Join Maker School & get customer #1 guaranteed: https://skool.com/makerschool/about ๐Ÿ“š Watch my NEW 2026 Claude Code course: https://www.youtube.com/watch?v=QoQBzR1NIqI ๐Ÿ’ผ Work with my team: https://dub.sh/work-with-me-gnn ๐ŸŽ™๏ธ The free SKILL.md: https://drive.google.com/drive/folders/14nUSxV8cpi5OI2OQxhBqyeuN92ERTMX1 ๐Ÿ“š Free multi-hour courses โ†’ Claude Code (4hr full course): https://www.youtube.com/watch?v=QoQBzR1NIqI โ†’ Vibe Coding w/ Antigravity (6hr full course): https://www.youtube.com/watch?v=gcuR_-rzlDw โ†’ Agentic Workflows (6hr full course): https://www.youtube.com/watch?v=MxyRjL7NG18 โ†’ N8N (6hr full course, 890K+ views): https://www.youtube.com/watch?v=2GZ2SNXWK-c Summary โคต๏ธ You can now automatically improve your Claude Code skills utilizing the principles of Karpathy's "autoresearch" combined with evals. In this video, I give you a step-by-step, end-to-end walkthrough of how to do so. I also give you guys an example skill you could use to do this for yourself! My software, tools, & deals (some give me kickbacksโ€”thank you!) ๐Ÿš€ Instantly: https://link.nicksaraev.com/instantly-short ๐Ÿ“ง Anymailfinder: https://link.nicksaraev.com/amf-short ๐Ÿค– Apify: https://console.apify.com/sign-up (30% off with code 30NICKSARAEV) ๐Ÿง‘๐Ÿฝโ€๐Ÿ’ป n8n: https://n8n.partnerlinks.io/h372ujv8cw80 ๐Ÿ“ˆ Rize: https://link.nicksaraev.com/rize-short (25% off with promo code NICK) Follow me on other platforms ๐Ÿ˜ˆ ๐Ÿ“ธ Instagram: https://www.instagram.com/nick_saraev ๐Ÿ•Š๏ธ Twitter/X: https://twitter.com/nicksaraev ๐Ÿค™ Blog: https://nicksaraev.com Why watch? If this is your first viewโ€”hi, Iโ€™m Nick! TLDR: I spent six years building automated businesses with Make.com (most notably 1SecondCopy, a content company that hit 7 figures). Today a lot of people talk about automation, but Iโ€™ve noticed that very few have practical, real world success making money with it. So this channel is me chiming in and showing you what *real* systems that make *real* revenue look like. Hopefully I can help you improve your business, and in doing so, the rest of your life ๐Ÿ™ Like, subscribe, and leave me a comment if you have a specific request! Thanks. Chapters 00:00 Introduction to Claude Code Skills 01:47 The Concept of Autoresearch 03:40 Ingredients for Successful Autoresearch 05:25 Evaluating Skills with Evals 08:51 Setting Up for Autoresearch 10:30 The Diagram Generator Skill 12:33 Optimizing Skills with Autoresearch 14:16 Tips for Effective Evals 15:35 Conclusion and Further Applications