If you like the idea of ChatGPT, Google Gemini, Microsoft Copilot, or any of the other AI assistants, then you may have some concerns relating to the likes of privacy, costs, or more. That's where Llama 2 comes in. Llama 2 is an open-source large language model developed by Meta, and there are variants ranging from 7 billion to 70 billion parameters.
Given that it's an open-source LLM, you can modify it and run it in any way that you want, on any device. If you want to give it a try on a Linux, Mac, or Windows machine, you can easily!
Requirements
You'll need the following to run Llama 2 locally:
- One of the best Nvidia GPUs (you can use AMD on Linux)
- An internet connection
Best GPUs in 2025: Our top graphics card picks
Picking the right graphics card can be difficult given the sheer number of options on the market. Here are the best graphics cards to consider.
How to run Llama 2 on a Mac or Linux using Ollama
If you have a Mac, you can use Ollama to run Llama 2. It's by far the easiest way to do it of all the platforms, as it requires minimal work to do so. All you need is a Mac and time to download the LLM, as it's a large file.
Step 1: Download Ollama
The first thing you'll need to do is download Ollama. It runs on Mac and Linux and makes it easy to download and run multiple models, including Llama 2. You can even run it in a Docker container if you'd like with GPU acceleration if you'd like to have it easily configured.
Once Ollama is downloaded, extract it to a folder of your choice and run it.
Step 2: Download the Llama 2 model
Once Ollama is installed, run the following command to pull the 13 billion parameter Llama 2 model.
ollama pull llama2:13b
This may take a while, so give it time to run. It's a 7.4GB file and may be slow on some connections.
Step 3: Run Llama 2 and interact with it
Next, run the following command to launch and interact with the model.
ollama run llama2
This will then launch the model, and you can interact with it. You're done!
How to run Llama 2 on Windows using a web GUI
If you're using a Windows machine, then there's no need to fret as it's just as easy to set up, though with more steps! You'll be able to clone a GitHub repository and run it locally, and that's all you need to do.
Step 1: Download and run the Llama 2 Web GUI
If you're familiar with Stable Diffusion and running it locally through a Web GUI, that's what this basically is. oobabooga's text generation Web UI GitHub repository is inspired by that and works in very much the same way.
- Download the repository linked above
- Run start_windows.bat, start_linux.sh, or start_macos.sh depending on what platform you're using
- Select your GPU and allow it to install everything that it needs
Step 2: Access the Llama 2 Web GUI
From the above, you can see that it will give you a local IP address to connect to the web GUI. Connect to it in your browser and you should see the web GUI.
Click around and familiarize yourself with the UI. You'll have first loaded a chat window, but it won't work until you load a model.
Step 3: Load a Llama 2 model
Now you'll need to load a model. This will take some time as it will need to download it, but you can do that from inside of the Web GUI.
- Click the Model tab at the top
- On the right, enter TheBloke/Llama-2-13B-chat-GPTQ and click Download
- If it's downloading, you should see a progress bar in your command prompt as it downloads the relevant files.
-
When it finishes, refresh the model list on the left and click the downloaded model.
- Click Load, making sure that model loader says GPTQ-for-LLaMa
It may take a moment for it to load, as these models require a lot of vRAM.
Step 4: Interact with Llama 2!
All going well, you should now have Llama 2 running on your PC! You can interact with it through your browser in a no-internet environment, so long as you have the hardware necessary to execute it. On my RTX 4080 with 16GB of vRAM it can generate at nearly 20 tokens per second, which is significantly faster than you'll find on most free plans for any LLMs like ChatGPT or otherwise.
If you wanted, you could also try to use LM Studio, as there are pre-built models available using Llama 2.
Run local LLMs with ease on Mac and Windows thanks to LM Studio
If you want to run LLMs on your PC or laptop, it's never been easier to do thanks to the free and powerful LM Studio. Here's how to use it