How to Run DeepSeek Locally on Your Machine (Beginner's Guide)
Learn how to run DeepSeek R1 locally on your Mac or PC using Ollama. Running open-source AI locally ensures total privacy for your sensitive code and data.
When the AI revolution started, it seemed like you had two choices: pay $20 a month for a cloud-based API like ChatGPT, or build a $50,000 server rack in your basement. Running state-of-the-art AI required massive data centers owned by trillion-dollar tech companies.
DeepSeek changed that assumption. By releasing the open-source weights for their R1 and V3 models, they made top-tier reasoning models available to anyone with a laptop. The community quickly optimized these models to run entirely offline on standard consumer hardware — a reasonably powerful MacBook or Windows laptop included.
Why would you want to run an AI locally instead of just using a website? Absolute privacy, zero censorship, zero latency costs, and total ownership. If you are a developer working on proprietary code, or a lawyer handling confidential client files, you cannot upload that data to a public cloud API. Running DeepSeek locally is the solution.
Understanding Hardware Requirements
Before you begin, you need to understand the concept of "Quantization" and "Parameters." DeepSeek models come in different sizes, measured in billions of parameters (e.g., 7B, 14B, 32B, 70B). The larger the model, the smarter it is, but the more RAM (memory) it requires to run.
To make these massive models fit on consumer laptops, developers compress them using a process called quantization (usually turning them into a 4-bit or 8-bit format). This drastically reduces the memory requirement with only a tiny loss in "smartness."
What can your computer run?
- 8GB RAM (Standard Laptop): You can run the 7B or 8B parameter models. These are fast and great for basic coding and writing, but might struggle with highly complex logic puzzles.
- 16GB to 32GB RAM (Pro Laptop/MacBook Pro M-Series): The sweet spot. You can comfortably run the 14B or 32B models. These models are incredibly capable and rival GPT-4 in many benchmarks.
- 64GB+ RAM (Desktop Workstation): You can run the massive 70B parameter models, giving you a top-tier reasoning engine running completely offline.
RAM determines whether a model can load. Your GPU determines how fast it runs — see the GPU and speed section below for details.
Which DeepSeek Model Should You Actually Download?
The parameter count (7B, 14B, 32B) tells you the model size. But there is a second dimension: which variant of DeepSeek you are downloading. The names on Hugging Face and Ollama are not self-explanatory.
DeepSeek-R1 (full): The original reasoning model. The full versions are extremely large — the 70B variant requires roughly 40GB of storage and memory. Most consumer hardware cannot run the full R1 above 7B comfortably.
DeepSeek-R1 Distilled: These are the versions you actually want for local use. Meta's and Alibaba's open-source architectures (Llama and Qwen) were fine-tuned using R1's reasoning outputs, producing smaller models that retain most of R1's reasoning capability. The distilled variants appear in Ollama as deepseek-r1:7b or deepseek-r1:14b. When LM Studio or Ollama shows you a DeepSeek R1 option at 7B or 14B, it is almost certainly a distilled variant — this is what you want. [SOURCE NEEDED — confirm distilled model naming in Ollama library]
DeepSeek-V3: A standard (non-reasoning) chat model. Faster to respond than R1 because it skips the thinking phase. Better for conversational tasks, summarisation, and quick rewrites. It does not show chain-of-thought output. If you want speed over depth, V3 is the better local choice.
Practical rule: Start with deepseek-r1:7b on Ollama or the equivalent GGUF file on LM Studio. If it runs well, step up to 14B. Only go above 14B if you have 32GB RAM and confirmed GPU offload is working. For how to get the most out of R1 once it is running, see our DeepSeek prompts guide.
GPU and Apple Silicon: The Speed Multiplier
RAM determines whether a model can load. Your GPU determines how fast it runs — and the gap is significant.
NVIDIA GPUs (CUDA): If your machine has a dedicated NVIDIA card with at least 6GB of VRAM, both LM Studio and Ollama will detect it automatically and offload model layers to the GPU. The result is 3–5x faster token generation compared to CPU-only inference on the same machine. [SOURCE NEEDED] A laptop with 16GB RAM and an RTX 3060 will run the 14B model faster than a desktop with 32GB RAM and no GPU.
Apple Silicon (M-series Macs): M1 through M4 chips use unified memory — RAM and GPU memory are the same physical pool. This makes local inference unusually fast compared to equivalent-spec Intel or AMD machines. An M3 MacBook Pro with 16GB handles the 14B model faster than most Windows laptops with dedicated GPUs. [SOURCE NEEDED] When LM Studio shows "Metal" as the inference backend on a Mac, it is using the GPU correctly.
AMD GPUs: Support is improving via ROCm but remains less reliable than NVIDIA. If you have an AMD GPU, check the current LM Studio and Ollama documentation before relying on GPU offload. [SOURCE NEEDED — verify current AMD ROCm support status]
No GPU / Intel integrated graphics: The model runs on CPU only. Expect 1–4 tokens per second on a 7B model — usable for short tasks but slow for long outputs.
Speed Expectations Before You Start
Token speed varies significantly by hardware and model size. These are approximate figures based on community benchmarks [SOURCE NEEDED].
| Hardware | Model | Approx. tokens/sec | Usability |
|---|---|---|---|
| M3 MacBook Pro 16GB | 7B distilled | ~40–60 t/s | Excellent |
| M3 MacBook Pro 16GB | 14B distilled | ~20–30 t/s | Good |
| Windows laptop, RTX 3060 8GB VRAM | 7B distilled | ~35–50 t/s | Good |
| Windows laptop, no GPU | 7B distilled | ~3–6 t/s | Slow but usable |
| Windows desktop, RTX 4090 24GB | 32B distilled | ~25–35 t/s | Good |
| Any machine, 8GB RAM, CPU only | 7B distilled | ~2–5 t/s | Slow |
The general rule: if you are below 5 tokens/sec, check that GPU offload is enabled in LM Studio (right sidebar → GPU Offload slider).
Method 1: The Easiest Way (LM Studio)
If you want a graphical user interface that looks exactly like the ChatGPT website, LM Studio is the absolute best way to start. It requires zero command-line experience.
Step 1: Download LM Studio
Navigate to LMStudio.ai and download the application for Windows, Mac, or Linux. The installation process is identical to installing any standard software.
Step 2: Search for DeepSeek
Open LM Studio. On the home page, there is a prominent search bar. Type "DeepSeek R1." LM Studio connects directly to Hugging Face (the GitHub of AI models) and will display a list of available downloads.
Step 3: Choose the Right Model Size
You will see many options uploaded by different users (often names like `TheBloke/DeepSeek-R1-GGUF`). Look at the file sizes. As a rule of thumb, the file size should be roughly half of your total RAM. If you have 16GB of RAM, download an 8GB model file (usually the 14B parameter version). Click "Download."
Step 4: Load and Chat
Once the download is complete, click the "Chat" icon on the left sidebar. At the top of the screen, select the DeepSeek model you just downloaded from the dropdown menu. LM Studio will load the model into your RAM. Once it finishes loading, simply type your prompt into the bottom chat box and hit enter.
Congratulations — you are now chatting with a capable reasoning model running entirely on your own hardware.
Tool Comparison: LM Studio vs Ollama vs Jan.ai
Three tools dominate local AI on consumer hardware. If you are following this guide for the first time, start with LM Studio (Method 1 below). If you are a developer who wants to call the model from scripts or connect it to a code editor, use Ollama (Method 2).
| LM Studio | Ollama | Jan.ai | |
|---|---|---|---|
| Interface | GUI (desktop app) | CLI + local API | GUI (desktop app) |
| Setup difficulty | Beginner | Developer | Beginner |
| VS Code integration | Via API | Via Continue.dev | Via API |
| Model library | Hugging Face (vast) | Ollama library (~hundreds) | Hugging Face + Ollama |
| Best for | Non-developers, first-time setup | Developers, API automation | Alternative GUI option |
| Free | Yes | Yes | Yes |
Method 2: The Developer Way (Ollama)
If you are a developer and you want to use DeepSeek to write code directly inside your code editor (like VS Code), LM Studio is too heavy. You need a background service. Ollama is the industry standard for running local models via command-line and API.
Step 1: Install Ollama
Go to Ollama.com and download the installer. Once installed, Ollama runs silently in the background of your computer, exposing a local API on port `11434`.
Step 2: Download and Run DeepSeek
Open your Terminal (Mac/Linux) or Command Prompt (Windows). Type the following command and hit Enter:
ollama run deepseek-r1
Ollama will automatically download the default DeepSeek R1 model (usually the 7B version) and start a chat session right in your terminal. You can start typing prompts immediately.
Pro Tip: If you have a powerful machine and want a larger model, you can specify the size: `ollama run deepseek-r1:14b` or `ollama run deepseek-r1:32b`.
Step 3: Integrate with VS Code (Continue.dev)
Running the model in the terminal is fun, but to make it useful, you want it inside your code editor.
- Open Visual Studio Code.
- Go to Extensions and install Continue (an open-source AI coding assistant).
- Open the Continue sidebar. Click the "+" button at the bottom to add a new model.
- Select "Ollama" as the provider.
- Select "DeepSeek-R1" from the model list.
Note: After completing this setup, Continue writes a config.json file to ~/.continue/ on your machine. If the connection to Ollama fails, that file is the first place to check — confirm the apiBase is set to http://localhost:11434.
You now have a local, offline version of GitHub Copilot. You can highlight proprietary code in your editor, ask DeepSeek to find the bug, and it will process the request securely on your local hardware.
What to Do Once It's Running
Here are three tasks worth trying in the first session to confirm the model is working and to get a feel for its capabilities.
Test 1 — Confirm reasoning is active
Type: What is 17 multiplied by 23? Show your work step by step.
If the model is an R1 or distilled R1 variant, you should see a <think> block appear before the final answer. If you get a direct answer with no thinking block, you may have downloaded a V3 model or a non-reasoning variant by mistake.
Test 2 — Code debugging
Paste this broken Python function and ask it to find and fix the bug:
def add_numbers(a, b)
return a + b
A working local setup should identify the missing colon and produce corrected code. If it produces garbled output, reduce the model size or check that GPU offload is enabled.
Test 3 — Private document analysis
Drag a text file into LM Studio's chat window (or use ollama run with file input). Ask a specific question about its contents. This confirms the local setup is processing your documents without sending them anywhere — your network monitor should show zero outbound traffic during the response. For a full list of effective prompts to use with the reasoning model, see our DeepSeek prompts guide.
Why DeepSeek R1 is Different (The "Thinking" Phase)
When you run DeepSeek R1 locally, you will notice something unique. When you give it a prompt, it does not answer immediately. Instead, it starts generating text inside a `
This is the "Chain of Thought" reasoning process. R1 is breaking down your problem, writing pseudo-code, testing its own logic, and correcting its mistakes before it ever gives you the final answer. Do not interrupt this process. The longer it thinks, the more accurate the final output will be.
Troubleshooting Common Local AI Issues
Running massive neural networks on a laptop can occasionally cause issues. Here is how to fix them:
- The model is generating gibberish: This usually means you ran out of RAM, or the model's "Temperature" is set too high. Open LM Studio settings and ensure the Temperature is set between 0.6 and 0.8.
- The computer is completely frozen: You downloaded a model that is too large for your RAM. The computer is trying to use your slow hard drive as "Swap Memory." Force restart your computer and download a smaller model (e.g., switch from 32B to 14B).
- It is generating text incredibly slowly (1 word a second): This usually means the model is not using your GPU (graphics card). In LM Studio, go to the right sidebar and check the box that says "GPU Offload." Set the slider to "Max." This pushes the computation to your graphics card, which is vastly faster than your CPU.
- Windows Defender blocked the Ollama installer: This is a false positive. Ollama is open-source — you can inspect the code on GitHub. Click "More info" then "Run anyway" in the Windows SmartScreen prompt. If your organisation's IT policy blocks unsigned executables, use LM Studio instead, which is code-signed.
- "Error: model not found" in Ollama: The Ollama model registry uses specific naming.
deepseek-r1is valid;deepseekr1ordeepseek_r1are not. Runollama listto see what is installed. Runollama pull deepseek-r1:7bto download explicitly rather than relying on theruncommand to auto-download.
Conclusion
Running a capable reasoning model like DeepSeek R1 entirely offline on a standard laptop removes the cloud dependency. Developers, researchers, and writers get permanent access to high-tier intelligence without paying a monthly subscription or exposing their data.
Whether you choose the user-friendly LM Studio or the developer-focused Ollama, setting up local AI takes less time than installing a video game. The future of AI is not just in the cloud; it is running quietly in the background of your own laptop.
Next Reads: Best DeepSeek Prompts Guide — DeepSeek vs Claude for Developers
Sources used in this report
FAQ
Is it free to run DeepSeek locally?
Yes. DeepSeek R1 model weights are open-source, and tools like Ollama used to run them locally are completely free.
How much RAM do I need to run DeepSeek?
You can run the distilled 7B models on 8GB of RAM. However, for serious coding and reasoning tasks, it is recommended to run the 14B or 32B models, which require 16GB to 32GB of RAM.
Does DeepSeek need an internet connection to work?
No. Once the model file is downloaded to your computer, it runs entirely offline. You can disconnect your Wi-Fi, take your laptop into the woods, and DeepSeek will still be able to write code, generate text, and solve math problems.
Can local DeepSeek search the live web for current events?
No — once downloaded, the model runs entirely from its training data with no network access. Its knowledge has a fixed cutoff date. [SOURCE NEEDED — confirm R1 distilled knowledge cutoff] If you need real-time information, use the deepseek.com web interface with its Web Search toggle enabled, or use the DeepSeek API with a search plugin. For purely local tasks — analysing code, reviewing documents, working through logic problems — the knowledge cutoff rarely matters.
Can I run multiple models and switch between them?
Yes. Both Ollama and LM Studio support multiple downloaded models. In Ollama, run ollama list to see everything installed, then ollama run [model-name] to switch. In LM Studio, use the model dropdown in the chat interface to switch without restarting the app. Each model loads fresh into RAM when selected, so switching takes 15–30 seconds on most machines. You can have DeepSeek R1 for reasoning tasks and DeepSeek V3 (or Llama 3 or Mistral) for faster conversational tasks installed side by side.
About the author
Generative Report Desk
The editorial team behind Generative Report covers AI tools, model releases, practical workflows, and the business impact of generative AI.
Related reports
Best DeepSeek Prompts for Developers and Data Analysts
DeepSeek R1 is a reasoning engine, not just a chatbot. Learn how to write prompts that unlock its ability to debug complex code, design architectures, and analyze data.