Human-edited AI reporting for readers who want signal, not sludge.

RSS

How to Run DeepSeek Locally on Your Machine (Beginner's Guide)

Learn how to run DeepSeek R1 locally on your Mac or PC using Ollama. Running open-source AI locally ensures total privacy for your sensitive code and data.

By Generative Report Desk Apr 14, 2026 Updated Jun 26, 2026 11 min read
Terminal screen showing code execution
AI Coding Generative AI

When the AI revolution started, it seemed like you had two choices: pay $20 a month for a cloud-based API like ChatGPT, or build a $50,000 server rack in your basement. Running state-of-the-art AI required massive data centers owned by trillion-dollar tech companies.

DeepSeek changed that assumption. By releasing the open-source weights for their R1 and V3 models, they made top-tier reasoning models available to anyone with a laptop. The community quickly optimized these models to run entirely offline on standard consumer hardware — a reasonably powerful MacBook or Windows laptop included.

Why would you want to run an AI locally instead of just using a website? Absolute privacy, zero censorship, zero latency costs, and total ownership. If you are a developer working on proprietary code, or a lawyer handling confidential client files, you cannot upload that data to a public cloud API. Running DeepSeek locally is the solution.

Understanding Hardware Requirements

Before you begin, you need to understand the concept of "Quantization" and "Parameters." DeepSeek models come in different sizes, measured in billions of parameters (e.g., 7B, 14B, 32B, 70B). The larger the model, the smarter it is, but the more RAM (memory) it requires to run.

To make these massive models fit on consumer laptops, developers compress them using a process called quantization (usually turning them into a 4-bit or 8-bit format). This drastically reduces the memory requirement with only a tiny loss in "smartness."

What can your computer run?

  • 8GB RAM (Standard Laptop): You can run the 7B or 8B parameter models. These are fast and great for basic coding and writing, but might struggle with highly complex logic puzzles.
  • 16GB to 32GB RAM (Pro Laptop/MacBook Pro M-Series): The sweet spot. You can comfortably run the 14B or 32B models. These models are incredibly capable and rival GPT-4 in many benchmarks.
  • 64GB+ RAM (Desktop Workstation): You can run the massive 70B parameter models, giving you a top-tier reasoning engine running completely offline.

RAM determines whether a model can load. Your GPU determines how fast it runs — see the GPU and speed section below for details.

Which DeepSeek Model Should You Actually Download?

The parameter count (7B, 14B, 32B) tells you the model size. But there is a second dimension: which variant of DeepSeek you are downloading. The names on Hugging Face and Ollama are not self-explanatory.

DeepSeek-R1 (full): The original reasoning model. The full versions are extremely large — the 70B variant requires roughly 40GB of storage and memory. Most consumer hardware cannot run the full R1 above 7B comfortably.

DeepSeek-R1 Distilled: These are the versions you actually want for local use. Meta's and Alibaba's open-source architectures (Llama and Qwen) were fine-tuned using R1's reasoning outputs, producing smaller models that retain most of R1's reasoning capability. The distilled variants appear in Ollama as deepseek-r1:7b or deepseek-r1:14b. When LM Studio or Ollama shows you a DeepSeek R1 option at 7B or 14B, it is almost certainly a distilled variant — this is what you want. [SOURCE NEEDED — confirm distilled model naming in Ollama library]

DeepSeek-V3: A standard (non-reasoning) chat model. Faster to respond than R1 because it skips the thinking phase. Better for conversational tasks, summarisation, and quick rewrites. It does not show chain-of-thought output. If you want speed over depth, V3 is the better local choice.

Practical rule: Start with deepseek-r1:7b on Ollama or the equivalent GGUF file on LM Studio. If it runs well, step up to 14B. Only go above 14B if you have 32GB RAM and confirmed GPU offload is working. For how to get the most out of R1 once it is running, see our DeepSeek prompts guide.

GPU and Apple Silicon: The Speed Multiplier

RAM determines whether a model can load. Your GPU determines how fast it runs — and the gap is significant.

NVIDIA GPUs (CUDA): If your machine has a dedicated NVIDIA card with at least 6GB of VRAM, both LM Studio and Ollama will detect it automatically and offload model layers to the GPU. The result is 3–5x faster token generation compared to CPU-only inference on the same machine. [SOURCE NEEDED] A laptop with 16GB RAM and an RTX 3060 will run the 14B model faster than a desktop with 32GB RAM and no GPU.

Apple Silicon (M-series Macs): M1 through M4 chips use unified memory — RAM and GPU memory are the same physical pool. This makes local inference unusually fast compared to equivalent-spec Intel or AMD machines. An M3 MacBook Pro with 16GB handles the 14B model faster than most Windows laptops with dedicated GPUs. [SOURCE NEEDED] When LM Studio shows "Metal" as the inference backend on a Mac, it is using the GPU correctly.

AMD GPUs: Support is improving via ROCm but remains less reliable than NVIDIA. If you have an AMD GPU, check the current LM Studio and Ollama documentation before relying on GPU offload. [SOURCE NEEDED — verify current AMD ROCm support status]

No GPU / Intel integrated graphics: The model runs on CPU only. Expect 1–4 tokens per second on a 7B model — usable for short tasks but slow for long outputs.

Speed Expectations Before You Start

Token speed varies significantly by hardware and model size. These are approximate figures based on community benchmarks [SOURCE NEEDED].

HardwareModelApprox. tokens/secUsability
M3 MacBook Pro 16GB7B distilled~40–60 t/sExcellent
M3 MacBook Pro 16GB14B distilled~20–30 t/sGood
Windows laptop, RTX 3060 8GB VRAM7B distilled~35–50 t/sGood
Windows laptop, no GPU7B distilled~3–6 t/sSlow but usable
Windows desktop, RTX 4090 24GB32B distilled~25–35 t/sGood
Any machine, 8GB RAM, CPU only7B distilled~2–5 t/sSlow

The general rule: if you are below 5 tokens/sec, check that GPU offload is enabled in LM Studio (right sidebar → GPU Offload slider).

Method 1: The Easiest Way (LM Studio)

If you want a graphical user interface that looks exactly like the ChatGPT website, LM Studio is the absolute best way to start. It requires zero command-line experience.

Step 1: Download LM Studio

Navigate to LMStudio.ai and download the application for Windows, Mac, or Linux. The installation process is identical to installing any standard software.

Step 2: Search for DeepSeek

Open LM Studio. On the home page, there is a prominent search bar. Type "DeepSeek R1." LM Studio connects directly to Hugging Face (the GitHub of AI models) and will display a list of available downloads.

Step 3: Choose the Right Model Size

You will see many options uploaded by different users (often names like `TheBloke/DeepSeek-R1-GGUF`). Look at the file sizes. As a rule of thumb, the file size should be roughly half of your total RAM. If you have 16GB of RAM, download an 8GB model file (usually the 14B parameter version). Click "Download."

Step 4: Load and Chat

Once the download is complete, click the "Chat" icon on the left sidebar. At the top of the screen, select the DeepSeek model you just downloaded from the dropdown menu. LM Studio will load the model into your RAM. Once it finishes loading, simply type your prompt into the bottom chat box and hit enter.

Congratulations — you are now chatting with a capable reasoning model running entirely on your own hardware.

Tool Comparison: LM Studio vs Ollama vs Jan.ai

Three tools dominate local AI on consumer hardware. If you are following this guide for the first time, start with LM Studio (Method 1 below). If you are a developer who wants to call the model from scripts or connect it to a code editor, use Ollama (Method 2).

LM StudioOllamaJan.ai
InterfaceGUI (desktop app)CLI + local APIGUI (desktop app)
Setup difficultyBeginnerDeveloperBeginner
VS Code integrationVia APIVia Continue.devVia API
Model libraryHugging Face (vast)Ollama library (~hundreds)Hugging Face + Ollama
Best forNon-developers, first-time setupDevelopers, API automationAlternative GUI option
FreeYesYesYes

Method 2: The Developer Way (Ollama)

If you are a developer and you want to use DeepSeek to write code directly inside your code editor (like VS Code), LM Studio is too heavy. You need a background service. Ollama is the industry standard for running local models via command-line and API.

Step 1: Install Ollama

Go to Ollama.com and download the installer. Once installed, Ollama runs silently in the background of your computer, exposing a local API on port `11434`.

Step 2: Download and Run DeepSeek

Open your Terminal (Mac/Linux) or Command Prompt (Windows). Type the following command and hit Enter:

ollama run deepseek-r1

Ollama will automatically download the default DeepSeek R1 model (usually the 7B version) and start a chat session right in your terminal. You can start typing prompts immediately.

Pro Tip: If you have a powerful machine and want a larger model, you can specify the size: `ollama run deepseek-r1:14b` or `ollama run deepseek-r1:32b`.

Step 3: Integrate with VS Code (Continue.dev)

Running the model in the terminal is fun, but to make it useful, you want it inside your code editor.

  1. Open Visual Studio Code.
  2. Go to Extensions and install Continue (an open-source AI coding assistant).
  3. Open the Continue sidebar. Click the "+" button at the bottom to add a new model.
  4. Select "Ollama" as the provider.
  5. Select "DeepSeek-R1" from the model list.

Note: After completing this setup, Continue writes a config.json file to ~/.continue/ on your machine. If the connection to Ollama fails, that file is the first place to check — confirm the apiBase is set to http://localhost:11434.

You now have a local, offline version of GitHub Copilot. You can highlight proprietary code in your editor, ask DeepSeek to find the bug, and it will process the request securely on your local hardware.

What to Do Once It's Running

Here are three tasks worth trying in the first session to confirm the model is working and to get a feel for its capabilities.

Test 1 — Confirm reasoning is active

Type: What is 17 multiplied by 23? Show your work step by step.

If the model is an R1 or distilled R1 variant, you should see a <think> block appear before the final answer. If you get a direct answer with no thinking block, you may have downloaded a V3 model or a non-reasoning variant by mistake.

Test 2 — Code debugging

Paste this broken Python function and ask it to find and fix the bug:

def add_numbers(a, b)
    return a + b

A working local setup should identify the missing colon and produce corrected code. If it produces garbled output, reduce the model size or check that GPU offload is enabled.

Test 3 — Private document analysis

Drag a text file into LM Studio's chat window (or use ollama run with file input). Ask a specific question about its contents. This confirms the local setup is processing your documents without sending them anywhere — your network monitor should show zero outbound traffic during the response. For a full list of effective prompts to use with the reasoning model, see our DeepSeek prompts guide.

Why DeepSeek R1 is Different (The "Thinking" Phase)

When you run DeepSeek R1 locally, you will notice something unique. When you give it a prompt, it does not answer immediately. Instead, it starts generating text inside a `` block.

This is the "Chain of Thought" reasoning process. R1 is breaking down your problem, writing pseudo-code, testing its own logic, and correcting its mistakes before it ever gives you the final answer. Do not interrupt this process. The longer it thinks, the more accurate the final output will be.

Troubleshooting Common Local AI Issues

Running massive neural networks on a laptop can occasionally cause issues. Here is how to fix them:

  • The model is generating gibberish: This usually means you ran out of RAM, or the model's "Temperature" is set too high. Open LM Studio settings and ensure the Temperature is set between 0.6 and 0.8.
  • The computer is completely frozen: You downloaded a model that is too large for your RAM. The computer is trying to use your slow hard drive as "Swap Memory." Force restart your computer and download a smaller model (e.g., switch from 32B to 14B).
  • It is generating text incredibly slowly (1 word a second): This usually means the model is not using your GPU (graphics card). In LM Studio, go to the right sidebar and check the box that says "GPU Offload." Set the slider to "Max." This pushes the computation to your graphics card, which is vastly faster than your CPU.
  • Windows Defender blocked the Ollama installer: This is a false positive. Ollama is open-source — you can inspect the code on GitHub. Click "More info" then "Run anyway" in the Windows SmartScreen prompt. If your organisation's IT policy blocks unsigned executables, use LM Studio instead, which is code-signed.
  • "Error: model not found" in Ollama: The Ollama model registry uses specific naming. deepseek-r1 is valid; deepseekr1 or deepseek_r1 are not. Run ollama list to see what is installed. Run ollama pull deepseek-r1:7b to download explicitly rather than relying on the run command to auto-download.

Conclusion

Running a capable reasoning model like DeepSeek R1 entirely offline on a standard laptop removes the cloud dependency. Developers, researchers, and writers get permanent access to high-tier intelligence without paying a monthly subscription or exposing their data.

Whether you choose the user-friendly LM Studio or the developer-focused Ollama, setting up local AI takes less time than installing a video game. The future of AI is not just in the cloud; it is running quietly in the background of your own laptop.


Next Reads: Best DeepSeek Prompts GuideDeepSeek vs Claude for Developers

Sources used in this report

  1. Ollama Official Site
  2. DeepSeek-R1 — GitHub Repository
  3. Ollama — Run LLMs Locally
  4. LM Studio — Local AI Model Runner

FAQ

Is it free to run DeepSeek locally?

Yes. DeepSeek R1 model weights are open-source, and tools like Ollama used to run them locally are completely free.

How much RAM do I need to run DeepSeek?

You can run the distilled 7B models on 8GB of RAM. However, for serious coding and reasoning tasks, it is recommended to run the 14B or 32B models, which require 16GB to 32GB of RAM.

Does DeepSeek need an internet connection to work?

No. Once the model file is downloaded to your computer, it runs entirely offline. You can disconnect your Wi-Fi, take your laptop into the woods, and DeepSeek will still be able to write code, generate text, and solve math problems.

Can local DeepSeek search the live web for current events?

No — once downloaded, the model runs entirely from its training data with no network access. Its knowledge has a fixed cutoff date. [SOURCE NEEDED — confirm R1 distilled knowledge cutoff] If you need real-time information, use the deepseek.com web interface with its Web Search toggle enabled, or use the DeepSeek API with a search plugin. For purely local tasks — analysing code, reviewing documents, working through logic problems — the knowledge cutoff rarely matters.

Can I run multiple models and switch between them?

Yes. Both Ollama and LM Studio support multiple downloaded models. In Ollama, run ollama list to see everything installed, then ollama run [model-name] to switch. In LM Studio, use the model dropdown in the chat interface to switch without restarting the app. Each model loads fresh into RAM when selected, so switching takes 15–30 seconds on most machines. You can have DeepSeek R1 for reasoning tasks and DeepSeek V3 (or Llama 3 or Mistral) for faster conversational tasks installed side by side.

About the author

G

Generative Report Desk

The editorial team behind Generative Report covers AI tools, model releases, practical workflows, and the business impact of generative AI.

Related reports