Ollama + Claude Code = 99% CHEAPER

Name: Ollama + Claude Code = 99% CHEAPER
Uploaded: 2026-04-04T01:28:52.000Z
Duration: 50 min 23 s

How to Run Cloud Code for Free

Introduction to Cloud Code

The video presents two methods for running Cloud Code for free: locally on a machine and using Open Router.

An analogy is introduced where Cloud Code is likened to a car, while the AI model (like Claude) represents the engine.

Understanding Cloud Code

Cloud Code acts as a harness around models like Opus or Sonnet, guiding how they organize folders and execute tasks.

Users can switch from closed-source models (which require payment for API access) to open-source or self-hosted local models that are free.

Open Source vs. Closed Source Models

Open source models allow users to modify and run them freely, whereas closed source models (e.g., Sonnet, GBT Gemini) restrict access and require token payments.

The car analogy illustrates that open source allows users to inspect and modify the engine, while closed source keeps it locked.

Performance Comparison

A performance chart shows that while closed-source models generally outperform open-source ones, the gap is narrowing.

Some current open-source models surpass older versions of popular closed-source models like Sonnet 3.7 in performance metrics.

Limitations of Open Source Models

Despite improvements in open-source models, heavy workloads may still favor closed-source options like Opus 4.6 or GBT 5.4 due to reliability.

Issues with open-source compatibility may arise from differences in training data or context windows compared to what Cloud Code expects.

Recent Developments in Open Source Models

The release of Google's Gemma 4 model highlights advancements in smaller yet high-performing open source options.

Smaller model sizes are advantageous as they require less hardware capacity for local hosting.

Legal Considerations

Using different local or open-source models with Anthropic's agent harness does not violate their terms of service; it's permissible under their guidelines.

Next Steps

The tutorial will proceed with demonstrating how to run a local model using Olama integrated into Cloud Code.

How to Download and Use AI Models from Olama

Step 1: Accessing Olama and Downloading a Model

Begin by visiting ola.com to download the appropriate version for your operating system, such as Windows.

Navigate to Open Router to explore rankings of various AI models. Note that this page may load slowly due to extensive data.

Select the Quen 3.5 model, which offers several sizes and benchmarks compared to other open-source and closed-source models.

Consider your computer's specifications (RAM, CPU, GPU) when choosing a model size; consult Cloud Code for guidance on suitable options.

Opt for the Quen 3.5 9 billion parameter model (6.6 GB), acknowledging it may not be the best but serves well for demonstration purposes.

Step 2: Downloading and Running the Model

Execute the command lama pull followed by the model name in your terminal within VS Code or another environment where you plan to use it.

The download time will vary based on model size; expect an estimated time displayed during the process.

Understand that some models are only available via cloud usage, while others can be downloaded locally; check indicators next to each model for details.

Once downloaded, run the command Olama run with the same model name to interact with Quen 3.5 directly on your machine.

Step 3: Utilizing Cloud Code with Local Models

After interacting with Quen 3.5, exit chat mode using Control D and prepare to launch Cloud Code with your local model.

In Olama, click on launch commands like Olama launch Claude which allows you to select which model Claude will operate with.

Choose between installed models or opt for new ones like Quen 3.5 when launching Cloud Code; ensure you're in the correct directory during this process.

Step 4: Account Management and Subscription Options

Be aware of account management processes when launching Claude; options include subscription plans or API key authorization for access.

Using Local and Cloud Models in AI Development

Introduction to Model Usage

The speaker discusses the initial setup of using an Anthropic account, mentioning a $5 credit purchase that is not actually consumed due to the use of free open-source models.

Emphasizes the importance of switching from a paid model to local or open-source models to avoid charges.

Interaction with Cloud Code

Demonstrates how to interact with Cloud Code while highlighting that using local models ensures privacy but may result in slower performance compared to cloud servers.

Notes the lack of visibility into the processing steps when using local models, contrasting it with cloud-based interactions where tool calls are visible.

File Creation and Context Management

Describes an attempt to create a file named "Quen 3.5" and write a joke inside it, noting issues with context limits affecting performance.

Discusses running commands in the terminal to adjust context limits for better model performance.

Custom Model Configuration

Introduces creating a custom model based on "Quen 3.5," allowing for increased context window capabilities.

Explains how users can request assistance from Claude for command execution related to model configuration.

Transitioning Between Local and Cloud Models

Shares success in executing commands that allow writing text files, including humorous content generated by the AI.

Outlines steps for launching cloud-based models like Miniax M2.7, emphasizing user prompts for sign-in and connection requirements.

Cost Considerations in AI Model Usage

Highlights that while initial usage may seem free, there are costs associated with running advanced models either locally (requiring hardware investment) or through subscriptions on cloud platforms.

Stresses the need for balancing quality versus price when selecting AI solutions, considering factors such as usage limits and payment structures.

Testing New AI Models

Performance Comparison of AI Models

The speaker compares a new model to the previous local version, Quen 3.5, noting that the new model is faster and more efficient with a larger parameter count.

The speaker successfully runs a "morning coffee skill" using the new model, highlighting its improved performance and visibility compared to smaller models.

The experience feels similar to using cloud code due to the model's ability to spin up multiple local agents quickly, emphasizing the benefits of larger models hosted in the cloud.

When to Use Open Source Models

The speaker discusses scenarios for utilizing open-source models, particularly for low-stakes or high-volume tasks like summarizing files or searching through codebases.

Other applications include research tasks such as web searches and organizing information, where top-tier models are not necessary.

Caution is advised when coding; simpler tests can be run on these models but should be verified later with more powerful options.

Switching Between Local and Cloud Models

If primary services like Claude are down or if session limits are reached, users may switch temporarily to local models for productivity.

Instructions for integrating Open Router into existing setups involve changing environment variables to redirect API calls from Anthropics' API to Open Router's services.

Setting Up Open Router

Users need an account with Open Router and must load funds (e.g., $5-$10) for increased request limits beyond free usage caps.

Configuration changes in project settings are necessary; users must add specific environment variables pointing towards their Open Router credentials instead of default settings.

Final Configuration Steps

A JSON file containing configuration details is used; care must be taken regarding sensitive information like API keys during setup.

Users will replace certain tokens in their configurations with those from their Open Router accounts while leaving some fields blank as instructed.

Understanding Open Router Configuration

Importance of Complete Configuration

The necessity of configuring all parameters, not just the ones suggested by Open Router documentation, is emphasized to avoid unintended charges from paid models like Haiku and Sonnet.

Users may unknowingly incur costs if they omit certain variables in their configuration, leading to default usage of paid models instead of free options.

Transitioning to Free Models

After correcting the configuration, the system successfully utilizes free models for tool calls without incurring charges.

An overview of Open Router Free is provided, highlighting its similarity to Olama and showcasing various available free models.

Limitations and Recommendations

While the Open Router Free model can help avoid rate limits by routing to available models, it lacks user control over which specific model is called.

A demonstration shows successful file creation using the Open Router Free model without any associated costs.

Switching Models for Enhanced Functionality

Utilizing Quen 3.6 Model

Instructions are given on how to switch from the Open Router Free model to a specific Quen 3.6 model for improved performance while remaining cost-free.

Upon launching with the new model name, successful interaction confirms that it operates effectively without additional charges.

Exploring Web Search Capabilities

The discussion shifts towards testing web search capabilities with Google's new Gemma 4 model; however, limitations arise due to lack of direct web search access.

Alternative methods such as fetches are attempted but face challenges in executing web searches effectively.

Cost Management Strategies in Cloud Code

Monitoring Usage Limits

Users are advised to monitor their account limits regarding daily and minute-based call restrictions when using free models.

Cost Comparison Between Models

A comparison between different models reveals significant cost differences; cheaper alternatives can provide substantial savings when integrated into cloud code workflows.

Using a combination of open router configurations with cloud code could lead to reduced expenses—potentially achieving costs that are significantly lower than traditional methods.