Run local LLMs with ease on Mac and Windows thanks to LM Studio

Large language models (LLM) like ChatGPT, Google Gemini, and Microsoft Copilot all run in the cloud, which basically means they run on somebody else's computer. Not only that, they're particularly costly to run, and that's why all of them have a paid tier option that'll set you back $20 a month. However, you can run many different language models like Llama 2 locally, and with the power of LM Studio, you can run pretty much any LLM locally with ease.

Setting up LM Studio on Windows and Mac is ridiculously easy, and the process is the same for both platforms. It should also work on Linux, though we aren't using it for this tutorial.

Llama 2 header showing Llama 2 7B, Llama 2 13B, and Llama 2 70B

How to run Llama 2 locally on your Mac or PC

If you've heard of Llama 2 and want to run it on your PC, you can do it easily with a few programs for free.

Posts

By Adam Conway

LM Studio requirements

You'll need just a couple of things to run LM Studio:

Apple Silicon Mac (M1/M2/M3) with macOS 13.6 or newer
Windows / Linux PC with a processor that supports AVX2 (typically newer PCs)
16GB+ of RAM is recommended. For PCs, 6GB+ of VRAM is recommended
NVIDIA/AMD GPUs supported
An (optionally fast) internet connection to download models

If you have the above, then you're ready to go. I'm using an RTX 4080 with 16GB of VRAM, and since it's one of the best graphics cards, my text generation is quick.

These are the best graphics cards you can buy

Best GPUs in 2025: Our top graphics card picks

Picking the right graphics card can be difficult given the sheer number of options on the market. Here are the best graphics cards to consider.

Posts

By Rich Edmonds

Step 1: Download and launch LM Studio

The start page of LM Studio, introducing users to the software

You'll first need to download LM Studio from the website for whatever platform you're on. This download may take a bit of time as it's roughly 400MB, depending on the speed of your internet connection. Once it's downloaded, launch it, and it should look like the above screenshot.

Step 2: Choose a model to download

Next, choose a model to download by clicking the magnifying glass and looking through the options available. Most of these models will be several gigabytes in size and may take a while to download. I'm using Zepyhr-7B as it's small enough and easy for an LLM to use, but there are a lot of different LLMs to choose from. Google's recently-released Gemma model is available too if you want to give it a try, and so is Mixtral 8x7B.

Have a browse around, do some research, and see if any catch your eye. Zephyr is a model trained to be an assistant, so it can be useful once set up. Once you've chosen one, do the following:

Wait for it to finish downloading.
Click the Speech Bubble on the left.
At the top, select your model.
Wait for it to load.

What is Mixtral 8x7B? The open LLM giving GPT-3.5 a run for its money

If you've heard about Mixtral 8x7B but aren't sure what makes it so special, we have all of the details here

Posts

By Adam Conway

Step 3: Converse!

LM Studio Zephyr 7B's response, acting as a voice assistant

It's seriously that simple, and you've already downloaded and set up an LLM locally to speak with. At this point, you can enable GPU acceleration on the right-hand side to speed up responses if you want, though it's not necessary. I run LM Studio on my RTX 4080 with 20 GPU layers, but you may need more or fewer.

Why use an LLM locally?

Privacy, primarily

If you're wondering why you would want to use an LLM locally, there are a few reasons. The first, and one that concerns most people, is privacy. LLMs are powerful tools that can be used for organizational and planning purposes, some of which may be sensitive. If you also want to ask an LLM about private code (for example, if you're debugging it), then you should never use a cloud-based one.

These are only scratching the surface of reasons, too. Sometimes, these LLMs are tuned toward specific use cases that Bard, ChatGPT, and Bing Chat can't provide. As already mentioned, Zephyr is trained as a virtual assistant, and that level of specificity isn't there in other LLMs. Definitely give LM Studio a try if you're interested in trying one out because it's never been easier to run your own LLM!