RepoSwarm - Giving AI Agents Architecture Context Across All Your Repos - Roy Osherove

Name: RepoSwarm - Giving AI Agents Architecture Context Across All Your Repos - Roy Osherove
Uploaded: 2025-12-27T15:19:42.000Z
Duration: 1 h 28 min 1 s

Introduction to AI Tools and Architectures

Audience Engagement

The speaker begins by engaging the audience with questions about their experience with AI-related tools like Cursor and Cloud Code.

Queries are posed regarding the size of organizations represented, specifically those with over 100, 500, or 1,000 GitHub repositories.

Speaker Background

Roy Oshro introduces himself as a chief architect at Verbbit AI and author of several books on unit testing and leadership.

He mentions his blog, robotpaper.ai, which covers various topics including unit testing and AI.

The Role of AI in Software Development

Benefits of Writing Books

Roy discusses how writing books allows him to leverage AI tools for code reviews by prompting them to follow his style.

Importance of Readable Code

He emphasizes the significance of creating readable, maintainable, and trustworthy code.

Refactoring Legacy Code

A recommended prompt for refactoring legacy code is shared: using insights from Michael Feathers' book on effective refactoring practices.

Transforming Workflows with AI

New Working Methods

The talk focuses on how AI can enable new workflows rather than relying solely on traditional methods.

Information Condensation

Roy highlights that AI can transform large amounts of information into concise formats for future use.

Utilizing Notebook LM for Presentations

Slide Generation Experiment

Approximately 30% to 40% of the presentation slides were generated using Notebook LM based on provided blog posts and source code.

Effectiveness of New Tools

The speaker notes that he retained about 60% of what was generated by Notebook LM due to its high quality.

Challenges in Managing Multiple Repositories

Chief Architect Responsibilities

As chief architect at Verbbit, Roy oversees compliance across approximately 400 repositories while ensuring good practices are followed.

Documentation Issues

He points out that documentation quality is often poor across repositories, making it difficult to manage effectively.

Real-world Scenario: Consolidating Monitoring Tools

Example Challenge

Roy describes a scenario where consolidating monitoring tools across repositories could take days or weeks without efficient systems like Repos Swarm.

How to Ensure Accurate Data Handling in HIPAA-Compliant Applications

Importance of Up-to-Date Documentation

Organizations often lack current diagrams showing how data is handled across their applications, which is crucial for HIPAA compliance.

Many existing diagrams are outdated and irrelevant, necessitating time-consuming recreations that may not be completed effectively.

Challenges with Developer Resources

Developers frequently struggle to find documentation on new services or event schemas, leading to inefficient manual processes during code reviews.

The introduction of AI coding agents requires context about feature usage and component standards, which is often missing from existing documentation.

Liabilities in Current Documentation Practices

Standard README files in repositories often lack comprehensive onboarding information, making it difficult for new developers to get up to speed quickly.

Architecture documentation tends to be static and poorly formatted for AI readability, resulting in a loss of valuable insights over time.

Introduction of Repo Swarm Tool

To address these issues, the "Repo Swarm" tool was developed during a company hackathon. It aims to automate the documentation process using AI.

The tool compiles snapshots of all repositories into a centralized repository called "Architecture Hub," containing markdown files that are easy for AI agents to read.

Features and Benefits of Architecture Hub

Each repository's markdown file includes sections like high-level overviews, dependencies, and security analyses generated through specific prompts.

The architecture hub ensures continuous updates by regenerating content daily without needing backward compatibility adjustments when new components are added.

Standardizing Deployment Processes

Importance of Standardization in Deployment

The speaker emphasizes the need for standardizing deployment processes across various teams within the company, as there are currently multiple types of deployments for front-end and back-end applications that lack uniformity.

Challenges in Achieving Standardization

The process of gathering information from different teams to understand their deployment practices is time-consuming. The speaker suggests that while it may be difficult to achieve full standardization initially, significant progress can be made by focusing on key areas like authentication.

Authentication and Authorization Issues

Identifying common authentication libraries and authorization models used across teams is crucial. There may be inconsistencies where different teams utilize varying models, which could benefit from a centralized approach.

Data Mapping and Security Checks

Understanding data flow, storage schemas, and security checks is essential. The speaker mentions involving security experts to prioritize top vulnerabilities in repositories for timely fixes.

Documentation Practices

Comprehensive documentation of dependencies related to machine learning services (e.g., Gemini Claude), feature flags, and prompt security checks is necessary. This ensures all relevant information is captured systematically.

Utilizing AI for Documentation

Benefits of AI in Documentation

A single markdown file can serve as a comprehensive repository that remains up-to-date with daily commits. This allows for easy querying and retrieval of information across multiple repositories.

Advantages Over Traditional Documentation

Unlike traditional documentation that can become stale, this method ensures continuous updates and readability by AI tools, enhancing accessibility to critical information.

Querying Information Efficiently

Users can leverage tools like Cursor to quickly access answers about monitoring tools or other queries within minutes, achieving a high level of trust in the responses provided by AI systems.

Exploring Repository Structures

Overview of Repository Contents

The structure includes details such as repository names, project purposes, major dependencies from package JSON files, and directory structures presented clearly in markdown format.

Utilizing Advanced Tools for Queries

Tools equipped with augmented search capabilities index content effectively. This enables users to find accurate answers without needing extensive context input manually.

Practical Applications with Cursor

Demonstrating Cursor's Capabilities

The speaker shares experiences using Cursor for generating reports based on user queries about programming languages or other topics relevant to the repositories being documented.

Ensuring Consistency in Responses

Initial experiments revealed inconsistencies in AI-generated answers; thus, employing CLI tools alongside well-crafted prompts became essential for improving accuracy and efficiency when generating reports.

Generating Reports with CLI Tools

Overview of Commands and Rules

The process begins with a set of commands outlined in a markdown file, which serves as a prompt for generating reports.

Specific instructions include reading all ArchMD files while excluding prompts, diagrams, and report folders to avoid false positives from previous reports.

Report Generation Process

The report generation involves applying cursor rules and ensuring that only factual information is used, avoiding hallucinations.

A new directory is created for storing reports in memory, and the system starts planning its next moves based on the provided prompt.

Execution and Results

The tool utilizes various CLI tools to search for monitoring types within repositories; however, there are doubts about finding substantial results.

It’s noted that running the same prompt multiple times may yield different results due to the non-deterministic nature of models.

Quality Assurance and Feedback

After generating reports, they are sent to tech leads for review to ensure completeness and identify any missed critical information.

Continuous improvement of prompts has been emphasized as essential for better outcomes over time.

Insights on Report Content

Each report includes an overview summarizing key findings such as technologies used (e.g., OpenTelemetry), along with architecture diagrams.

Generating a report takes approximately five minutes even across multiple repositories; adjustments can be made based on specific needs or contexts.

Model Selection Considerations

Different models can be tested for better results; options like Codeex or Grock may provide enhanced outputs but at higher costs.

An example report generated today shows repository distributions by language within an architecture hub context.

Repository Management Practices

Only repositories with commits in the past 12 months are indexed to maintain relevance; this ensures focus on active projects.

Advanced Reporting Techniques in Software Development

Overview of Advanced Reporting

The speaker introduces a more advanced example of reporting, emphasizing the potential for extensive analysis and insights once the output is generated.

Time constraints are acknowledged, with 15 minutes remaining for the discussion. The speaker highlights various applications such as onboarding new developers, conducting architecture reviews, and performing compliance security checks.

Real-World Application Scenarios

A common scenario discussed involves deprecating a service within a company, necessitating identification of all dependencies across teams to ensure proper communication and planning.

Manual processes can be time-consuming; however, utilizing advanced reporting tools can significantly reduce workload by automating up to 80-90% of tasks related to service updates.

Monitoring Service Analysis

The speaker describes challenges in consolidating API gateways when transitioning to new systems like Kong while managing existing ones.

Continuous monitoring allows for daily updates on architectural documents based on new commits across repositories, creating an evolving snapshot of architecture history.

Architectural Decision Records (ADRs)

The concept of architecture history enables teams to ask insightful questions about past decisions and changes over time.

ADRs serve as documentation for significant architectural decisions; they help understand project evolution and rationale behind current states.

Utilizing AI for Enhanced Reporting

The analogy of tree rings illustrates how historical data can reveal patterns in architectural changes over time.

AI tools were employed to generate creative reports from existing data, including prompts that facilitate the creation of ADR records based on git history sampling.

Examples from Real Projects

Specific examples include decisions made regarding serverless implementations and database integrations that were not previously documented but are now captured through enhanced reporting methods.

Documenting Kit History and Tech Adoption

Importance of Documenting Kit History

The speaker emphasizes the significance of documenting kit history, which is often overlooked. This documentation serves as a valuable resource for tracking changes and decisions made within projects.

Custom Commands for Tech Adoption

A discussion on tech adoption and deprecation highlights the use of custom commands in project management. These commands can be tailored to specific needs, enhancing workflow efficiency.

Overview of Repository Architecture

The speaker provides insights into the architecture behind repository management, mentioning that it operates through a temporal orchestration workflow engine designed for durability and error handling.

Repository Discovery and Management

GitHub Integration

The system integrates with GitHub using a special token to discover repositories automatically based on a predefined JSON file (repos.json). This allows for easy updates without manual intervention.

Filtering Repositories

Repositories are filtered based on age (older than one year), with options to skip certain repositories that may not be relevant or useful for documentation purposes.

Workflow Execution and Prompting

Fan Out Process

Once repositories are identified, a fan-out process initiates new workflows to analyze each repository's structure, execute prompts, store results, and perform cleanup tasks efficiently.

Role of Prompts in Analysis

Prompts are central to the product's functionality; they determine how effectively data is interrogated across different repositories. Base prompts apply universally while specialized prompts cater to specific repository types.

Enhancing Prompt Efficiency

Contextual Injection Between Prompts

The system allows results from higher-level prompts to inform lower-level ones, enhancing context and improving overall analysis quality. This sequential prompting strategy optimizes information retrieval.

Structure of Prompt JSON

A detailed overview of the prompt JSON structure reveals its role in organizing shared repository information. The core directory contains essential base prompts crucial for effective operation.

Understanding Prompt Management and Optimization

Importance of Versions in Prompts

The discussion highlights the significance of prompt versions, noting that changes to a prompt version can trigger a workflow run even without new commits.

This optimization is designed to save tokens by only executing prompts that have changed or when code modifications occur.

Caching Mechanisms

The system utilizes caching mechanisms such as DynamoDB or local files to enhance efficiency, running only specific prompts if they have been altered.

There is mention of prompt inheritance and tagging based on file types, which aids in context chaining for more effective security checks.

Investigation Process

Investigating a repository typically takes between 10 to 15 minutes, with shorter times for smaller repositories. Costs range from $0.30 to $1.00 depending on the model used.

Plans are underway to improve logging during investigations, enhancing transparency and tracking.

Future Developments

Future enhancements include integrating Slack notifications for critical updates and automating documentation indexing for easier querying without direct access.

Cross-repository reasoning aims to allow investigations across multiple repositories simultaneously, verifying dependencies and facilitating future agent interactions.