How To Install CODE LLaMA LOCALLY (TextGen WebUI)
Installing Code Lama Locally
In this section, the speaker demonstrates how to install Code Lama locally. Code Lama is a model based on Llama 2 that outperforms GPT-4 in coding tasks.
Installing Text Generation Web UI and Anaconda
- To begin, make sure Anaconda is installed on your machine for managing Python versioning issues.
- Install Text Generation Web UI by copying the URL from its GitHub page and using it to clone the code.
- Create a new conda environment using the command
conda create -n tg python=3.10.9.
- Activate the conda environment with
conda activate tg.
Installing Requirements and Torch Libraries
- Clone the code repository for Text Generation Web UI and navigate into the cloned folder.
- Install all required packages by running
python -m pip install -r requirements.txt.
- Install torch libraries if not already installed using
pip3 install torch torchvision torchaudio --index-urlfollowed by a specific URL (provided in comments).
- If encountering a "Cuda is not available" error, use
conda install pytorch torchvision torchaudio cudatoolkit=<version>to specify the PyTorch version with Cuda support.
Handling Module Installation Errors
- If encountering a "module not found" error for CharDet, install it directly using
python -m pip install chardet.
- If encountering a "CCharDet" error, install it in the same way as above.
Running Text Generation Web UI Server
- Start the server by running
python server.pyin the terminal.
- Access Text Generation Web UI through the provided URL.
Downloading Code Lama Model
This section covers downloading different versions of Code Lama models from Hugging Face and Blokes.
Choosing the Model Version
- Visit the Wizard Coder Python 13B model card on Hugging Face.
- Select the desired model version based on parameters (e.g., 1 billion, 13 billion, or 70 billion parameters).
- Alternatively, download the raw unquantized model directly from Meta.
Downloading the Model
- Copy the URL next to the model name.
- Switch back to Text Generation Web UI and navigate to the "Model" tab.
- Paste the copied URL to initiate the download of the Code Lama model.
Using Quantized Models
This section explains how quantized versions of Code Lama models can be used for faster execution without significant loss in quality.
Benefits of Quantization
- Quantization allows compressed models to run more quickly.
- The quality of quantized models is usually not significantly compromised.
Accessing Quantized Models
- Visit Blokes' page for quantized versions of Code Lama models.
- Choose a specific version, such as GPTQ, which stands for GPT with quantization applied.
Conclusion
The speaker concludes by summarizing the installation process for Code Lama locally and highlights different options available for downloading and using Code Lama models.
Model Selection and Configuration
In this section, the speaker discusses the model selection and configuration process for using the xlama HF model.
Selecting the Model
- The speaker mentions that they downloaded a model and need to select it.
- They recommend choosing the xlama HF model instead of the default Auto GPT queue.
- Different model loaders can be experimented with, as some may work better for certain quantization tasks.
Configuring Parameters
- The max sequence length is set to 16,000 tokens by default but can be fine-tuned up to 100,000 tokens.
- The speaker suggests setting the max new tokens parameter to 2048.
- For coding tasks, they recommend setting the temperature at its minimum value to ensure less creativity in the model's output.
Prompt Template and Task Completion
This section covers using prompt templates and completing coding tasks with the selected model.
Using Prompt Template
- The default tab contains a prompt template that describes a task.
- Users need to input their specific prompt in this section.
Completing Coding Task
- As an example, the speaker writes a prompt asking for Python code to output numbers from 1 to 100.
- After submitting the prompt, two examples of completed code are generated successfully.
- It is mentioned that this setup runs locally on GPU but also provides options for CPU usage if no GPU is available.
Conclusion and Benefits
This section concludes the discussion on using local coding assistance with XLAMA HF and highlights its benefits.
- By following the steps outlined in previous sections, users can have a fully functional and capable coding assistant on their local computer.
- The XLAMA HF model proves to be extremely useful for coding tasks.
- The speaker emphasizes that this setup allows users to have a powerful coding assistant without relying on external services.