Installing and Running DeepSeek Models Locally on Windows 11

Running large language models (LLMs) like DeepSeek locally offers privacy, customization, and offline access. This guide provides a step-by-step walkthrough for installing and using DeepSeek models on Windows 11, including optimizations for different hardware setups (CPU/GPU).

Bertie Atkinson

DeepSeek is a family of open-source LLMs developed by the Chinese company DeepSeek AI. These models are designed for tasks like text generation, summarization, and question answering. Popular variants include:

  • deepseek-llm-7b-chat: A 7-billion parameter model optimized for conversational tasks.
  • deepseek-math-7b: Fine-tuned for mathematical reasoning.
  • Smaller models like deepseek-1.3b for low-resource environments.

Why Run DeepSeek Locally?

  • Privacy: Process sensitive data without relying on cloud APIs.
  • Offline Access: Use the model without an internet connection.
  • Customization: Fine-tune the model for specific tasks.
  • Cost Savings: Avoid API fees for frequent usage.

Hardware Requirements

ComponentMinimum (7B Model)Recommended (7B Model)
RAM16 GB32 GB
VRAM (GPU)8 GB16 GB (e.g., RTX 4080)
Storage15 GB (FP16)30 GB (unquantized)

Notes:

  • For CPU-only setups, use quantized models (GGUF/GGML) to reduce memory usage.
  • Smaller models (e.g., deepseek-1.3b) require fewer resources.

Installation Steps

Install Python and Dependencies

  1. Download Python 3.10+
  • Visit python.org.
  • Check “Add Python to PATH” during installation.
  1. Verify Installation
    Open PowerShell and run:
   python --version  # Should show Python 3.10+
   pip --version     # Ensure pip is installed

Set Up a Virtual Environment

  1. Create a Project Folder
   mkdir deepseek-project && cd deepseek-project
  1. Create and Activate a Virtual Environment
   python -m venv venv
   .\venv\Scripts\activate  # Activates the environment

Install PyTorch and Transformers

  1. Install PyTorch
  • For NVIDIA GPU (CUDA 12.1):
    bash pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  • For CPU-only:
    bash pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
  1. Install Hugging Face Libraries
   pip install transformers accelerate huggingface_hub

Download the DeepSeek Model

  1. Create a Hugging Face Account
  • Sign up at huggingface.co.
  • Generate an access token under Settings > Access Tokens.
  1. Log In via CLI
   huggingface-cli login


Paste your token when prompted.

  1. Download the Model
    Use Python code to load the model.

Running the Model

Basic Inference Script

Create a file deepseek_demo.py:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "deepseek-ai/deepseek-llm-7b-chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")  # Auto-detects GPU

prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Run the script:

python deepseek_demo.py

Optimizing Performance

Quantization (Reduce Memory Usage)

Use the bitsandbytes library for 4-bit quantization:

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    load_in_4bit=True  # Requires `bitsandbytes`
)


Install bitsandbytes:

pip install bitsandbytes

CPU Optimization

For GGUF quantized models:

  1. Install llama-cpp-python:
   pip install llama-cpp-python
  1. Download a GGUF model from Hugging Face (e.g., TheBloke/deepseek-llm-7B-chat-GGUF).

Alternative Methods

Using Ollama

If DeepSeek is added to Ollama’s library:

  1. Install Ollama for Windows.
  2. Run:
   ollama run deepseek-7b

LM Studio

A user-friendly GUI for running local LLMs:

  1. Download LM Studio.
  2. Search for “DeepSeek” in the app and download the model.

Troubleshooting

IssueSolution
CUDA Out of MemoryUse a smaller model or enable quantization.
Slow CPU InferenceUse GGUF quantization with n_threads=8.
Model Not LoadingCheck Hugging Face token permissions.
Dependency ErrorsUpdate packages with pip install --upgrade.

FAQs

Can I use DeepSeek commercially?

Check the model’s license on Hugging Face. Most DeepSeek models are research-only.

How does DeepSeek compare to Llama 3?

DeepSeek specializes in coding and mathematical tasks, while Llama 3 is more general-purpose.

Is a GPU mandatory?

No, but CPU inference will be significantly slower.

How do I fine-tune DeepSeek?

Use Hugging Face’s Trainer class with a custom dataset.


Conclusion

Running DeepSeek locally on Windows 11 is straightforward with tools like Hugging Face Transformers. For best performance, use a GPU with quantization or opt for smaller models. Explore alternative tools like LM Studio for a no-code experience.

Next Steps:

  • Experiment with fine-tuning using custom datasets.
  • Join the DeepSeek community for updates.
  • Explore quantized models for resource-constrained setups.
Share This Article