DeepSeek is a cutting-edge AI model designed for natural language processing, offering powerful capabilities such as text generation, summarization, and reasoning. It can run locally on Linux, making it an excellent choice for users who want privacy, control, and offline access to AI.
One of DeepSeek’s strengths is its flexibility while it can run on CPU-only systems, performance is significantly improved when using a dedicated GPU. On a CPU, response times may be slower, and larger models may require substantial RAM. With a GPU, DeepSeek can generate responses much faster by leveraging parallel processing, making real-time interactions more seamless.
This guide will walk you through the installation and setup of DeepSeek on Ubuntu or Debian-based Linux distributions, ensuring you can get started with AI on your own machine, whether you have a high-end GPU or not.
In this tutorial you will learn:
- How to install and configure Ollama for running DeepSeek
- How to optimize system resources for best performance

| Category | Requirements, Conventions or Software Version Used |
|---|---|
| System | Ubuntu/Debian, at least 16GB RAM recommended |
| Software | Ollama, Python 3.8+, DeepSeek-R1 models |
| Other | At least 10GB free disk space (more for larger models) |
| Conventions | # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
Prerequisites
Before we begin, ensure your system meets the minimum requirements. While DeepSeek can run on a CPU-only machine, having a high-performance processor and sufficient RAM will improve execution speed.
If a compatible GPU is installed, Ollama will automatically detect and utilize it for accelerated processing. If no GPU is found, a message will be displayed indicating that the model is running on the CPU.
No manual configuration is required.
DeepSeek’s 671B model is one of the largest AI models ever trained, requiring over a petabyte of storage and running on thousands of GPUs simultaneously! Yet, thanks to its efficient architecture, even the smaller 1.5B model can generate high-quality results on consumer hardware.
Installation Steps
Video
- Install Ollama: Ollama is required to run DeepSeek models. It provides an optimized local runtime for running machine learning models efficiently.First, ensure that
curlis installed on your system. If it’s not installed, you can use apt install curl with:$ sudo apt install curl
Once
curlis available, download and run the official Ollama installation script:$ curl -fsSL https://ollama.com/install.sh | sh
After installation, verify that Ollama is installed correctly by checking its version:
$ ollama --version
Additionally, ensure that the Ollama service is running with:
$ systemctl is-active ollama.service

Install Ollama - Download DeepSeek-R1: Now, fetch the model you want to run. DeepSeek-R1 models vary in size, balancing speed and accuracy based on your hardware capabilities. Larger models provide better reasoning and accuracy but require more RAM, VRAM, and disk space.To install the 7B model as an example, run:
$ ollama pull deepseek-r1:7b
Available DeepSeek-R1 Models, Hardware Requirements and Recommendations Model Parameters Disk Space Required Minimum RAM Recommended GPU VRAM Performance deepseek-r1:1.5b1.5 Billion 3 GB 8 GB 4 GB Fastest, low memory usage, less accuracy deepseek-r1:7b7 Billion 15 GB 16 GB 8 GB Good balance of speed and accuracy deepseek-r1:8b8 Billion 17 GB 24 GB 10 GB Similar to 7B, slightly improved reasoning deepseek-r1:14b14 Billion 30 GB 32 GB 16 GB Better understanding, needs more RAM deepseek-r1:32b32 Billion 70 GB 64 GB 24 GB High accuracy, slower response times deepseek-r1:70b70 Billion 160 GB 128 GB 48 GB Very accurate, slow inference speed deepseek-r1:671b671 Billion 1.5 TB 512 GB+ Multiple GPUs, 100 GB+ VRAM Cutting-edge accuracy, extremely slow Choosing the Right Model:
- 1.5B – 7B models: Best for everyday tasks, chat applications, and lightweight inference.
- 8B – 14B models: Balanced models offering improved reasoning while staying relatively efficient.
- 32B – 70B models: Highly advanced, suitable for research and deep analysis, but require substantial resources.
- 671B model: Requires data-center-level hardware. Used for cutting-edge AI research.
NOTE
Even with 512+ GB RAM and multiple GPUs with 100+ GB VRAM, the DeepSeek-R1:671B model remains slow due to its massive 671 billion parameters, requiring an immense number of calculations per response. While multiple GPUs improve overall throughput, they don’t significantly reduce latency for a single request, as data movement, memory bandwidth, and computational limits create bottlenecks. Even high-end AI infrastructure struggles with this scale, making smaller models (7B–14B) far more practical for real-time applications. The 671B model is best suited for research and large-scale AI experiments, where precision outweighs speed.If you are unsure, start with
deepseek-r1:7bas a general-purpose model. - Begin Using DeepSeek: Once the model is downloaded, you can start interacting with it directly.To run the DeepSeek-R1 model, use:
$ ollama run deepseek-r1:7b
You can explore more advanced usage and configurations in the Ollama Documentation.

Begin Using DeepSeek. Ask question. - Use a Local API for Integration: If you need to interact with DeepSeek programmatically, enable the API.
$ ollama serve & curl http://localhost:11434/api/generate -d '{"model": "deepseek-r1:7b", "prompt": "Hello, how are you?"}'
Use a Local API for Integration with DeepSeek
Conclusion
DeepSeek offers various model sizes, each with different hardware requirements. If your system struggles with larger models, consider using a smaller variant like `1.5b`. Running DeepSeek without a GPU is possible, but optimizations will improve efficiency.