Ollama — Run Any LLM on Your Laptop
Run Llama, Mistral, and more locally for free
Ollama is a free, open-source tool that lets you run large language models directly on your laptop or desktop — no cloud, no API keys, no subscriptions. If you have a computer with at least 8GB of RAM, you can run models like Llama 3.3, DeepSeek, and Mistral locally in minutes. For Indian users concerned about data privacy, internet costs, or simply wanting unlimited AI access, Ollama is one of the best tools available in 2026.
What You'll Learn
- How to install Ollama on Windows, macOS, and Linux
- How to download and run your first local LLM
- Which models work best on different hardware configurations
- How to use Ollama for coding, writing, and Indian language tasks
- How to connect Ollama to other tools and UIs
What Is Ollama and Why Use It?
Ollama is a command-line tool that manages and runs open-source LLMs on your local machine. Think of it as a package manager for AI models — you type one command, and the model downloads and starts running. No account creation, no payment, no data sent anywhere.
The key advantages over cloud AI services like ChatGPT or Gemini are straightforward: zero cost per message, complete privacy since nothing leaves your machine, and no rate limits or usage caps. The trade-off is that local models require decent hardware and are generally less capable than the largest cloud models.
For developers, Ollama also provides a local API endpoint at localhost:11434 that is compatible with the OpenAI API format, making it easy to integrate into your applications without paying for API calls.
India Note: Ollama is particularly valuable in India where internet connectivity can be intermittent in many areas. Once you download a model (typically 2-8GB), you can use it completely offline — during travel, in areas with poor connectivity, or simply to avoid using mobile data.
Model Comparison Table
| Model | Parameters | RAM needed | Best for | Download size | |-------|-----------|------------|----------|---------------| | Llama 3.3 | 70B | 40GB+ | Highest quality general use | ~43GB | | Llama 3.2 | 3B | 4GB | Fast, low-end hardware | ~2GB | | Mistral 7B | 7B | 8GB | General use, balanced | ~4.1GB | | DeepSeek Coder | 6.7B | 8GB | Code generation, debugging | ~3.8GB | | Phi-3 Mini | 3.8B | 4-6GB | Ultra-low-end laptops, mobile | ~2.3GB |
How to Install and Run Ollama
Step 1: Install Ollama on Your System
Installation takes under two minutes on any operating system.
macOS:
brew install ollama
Or download the installer from ollama.com.
Linux (Ubuntu/Debian):
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from the official Ollama website. Works on Windows 10 and 11 without WSL (Windows Subsystem for Linux).
After installation, verify it works:
ollama --version
Step 2: Download and Run Your First Model
To download and run a model, use the ollama run command:
ollama run mistral
This downloads the Mistral 7B model (approximately 4.1GB) and starts an interactive chat session in your terminal. The first run takes a few minutes to download; subsequent runs start in seconds.
Start with Mistral 7B or Llama 3.2:3B if you have 8GB RAM. These are the most reliable entry-point models for Indian hardware configurations.
Step 3: Learn the Essential Commands
# List downloaded models
ollama list
# Download a model without starting chat
ollama pull deepseek-coder
# Remove a model to free disk space
ollama rm mistral
# Run with a specific prompt (non-interactive)
ollama run mistral "Write a Python function to validate Indian PAN card numbers"
# Run with a system prompt
ollama run llama3.2 --system "You are a helpful assistant who answers in Hindi"
Step 4: Connect to a GUI Interface
While the terminal works fine, most users prefer a graphical interface. Install Open WebUI for a ChatGPT-like experience:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main
Then open http://localhost:3000 in your browser. You get a full chat interface with conversation history, model switching, and file uploads — all running locally.
Step 5: Integrate with Your Development Tools
For coding assistance in VS Code, install the Continue extension (free, open-source). In Continue's settings:
- Set provider to "Ollama"
- Set model to "deepseek-coder" or "codellama"
- Ensure Ollama is running in the background
You now have a free, offline AI coding assistant in VS Code — no subscription, no API key.
Best Models for Indian Hardware
Not every model runs well on every machine. Here is a practical guide based on common laptop configurations available in India:
| RAM | Recommended Models | Performance | |-----|-------------------|-------------| | 8GB | Phi-3 Mini (3.8B), Llama 3.2 (3B) | Smooth, 10-15 tokens/sec on CPU | | 16GB | Mistral 7B, DeepSeek Coder, Llama 3.2 | Good, 8-12 tokens/sec on CPU | | 32GB | Llama 3.3 (smaller quant), CodeLlama 34B | Usable, 5-8 tokens/sec on CPU | | GPU (6GB+ VRAM) | Any 7-8B model | Fast, 30-50+ tokens/sec |
For coding tasks, use DeepSeek Coder V2 or CodeLlama. For general conversation in Hindi and Indian languages, Llama 3.3 handles Hindi, Tamil, and Telugu reasonably well — better than older models but still weaker than cloud models like Gemini.
If you have a laptop in the ₹40,000-60,000 range with 16GB RAM (common configurations from Lenovo, HP, and Dell available on Flipkart and Amazon India), the 7B parameter models will run comfortably.
India Note: Students at IITs, NITs, and engineering colleges often have access to lab computers with GPUs. You can install Ollama on these machines for significantly faster inference. If your college has NVIDIA GPUs, Ollama automatically detects and uses them.
Using Ollama as a Local API
Developers can use Ollama as a drop-in replacement for cloud AI APIs:
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Write a Python function to validate Aadhaar numbers"
}'
This OpenAI-compatible API means you can point any application that supports the OpenAI API format to localhost:11434 and use local models instead. This is especially useful when building AI applications or experimenting with RAG pipelines without incurring API costs.
Data Privacy Use Cases in India
Ollama is the right choice for privacy-sensitive work that is common in Indian professional contexts:
- Legal documents: Draft NDAs, contracts, and legal correspondence without sending confidential information to cloud servers
- Financial data: Analyze balance sheets, audit reports, or tax documents locally
- Healthcare: Process patient data or medical research in compliance with India's DPDP Act (Digital Personal Data Protection)
- Government work: NIC and other government IT departments increasingly look for on-premise AI solutions — Ollama-based setups qualify
Frequently Asked Questions
What are the minimum laptop specs to run Ollama in India?
You need at least 8GB RAM to run small 7B parameter models like Mistral 7B or Phi-3 Mini. A laptop in the ₹40,000-60,000 range with 8-16GB RAM runs these models at 8-15 tokens per second on CPU.
Which Ollama models work best for Indian users?
For general use: Llama 3.2 (3B) on 8GB RAM. For coding: DeepSeek Coder. For Indian language tasks: Llama 3.3 handles Hindi better than most local models. Phi-3 Mini is best for very low-end hardware.
How do I use Ollama with VS Code for coding assistance?
Install the Continue extension in VS Code, set the model provider to Ollama, and select your preferred local model. Ollama must be running in the background. This gives you a free, offline coding assistant with no API key required.
What are the data privacy benefits of using Ollama?
With Ollama, your prompts and data never leave your computer. This is critical for sensitive work — legal documents, financial data, patient records. For Indian companies subject to data residency requirements, Ollama is a compliant solution.
How much does Ollama cost compared to cloud AI APIs?
Ollama is completely free after the one-time model download. Running 1 million tokens through Claude costs $3-15; through ChatGPT $2-10. With Ollama, 1 million tokens costs you only electricity — approximately ₹2-5 in India.
Common Issues and Fixes
Ollama is running slowly: On CPU-only machines, slower speeds are expected. Try smaller models (Phi-3 Mini or Llama 3.2:3B). For permanent speed improvement, add more RAM or a dedicated GPU.
Model download fails: India's internet speeds can be slow for large model files (2-43GB). Use a stable Wi-Fi connection, not mobile data, for model downloads. Downloads resume automatically if interrupted — just re-run ollama pull [model-name].
Out of memory error: The model you are trying to run requires more RAM than you have. Switch to a smaller parameter model. Check RAM requirements in the model comparison table above.
Ollama not found after installation on Windows: Restart your terminal or command prompt after installation. Windows sometimes requires a new terminal session to pick up newly installed CLI tools.
Port 11434 already in use: Another process is using the default port. Set OLLAMA_HOST=0.0.0.0:11435 before starting Ollama to use a different port.
Ollama vs Alternatives
| Tool | Cost | Ease of use | Best for | |------|------|-------------|----------| | Ollama | Free | Easy (CLI) | Developers, privacy-first users | | LM Studio | Free | Very easy (GUI) | Non-technical users | | Jan.ai | Free | Easy (GUI + API) | Combined GUI and API use | | GPT4All | Free | Easy (GUI) | Beginners, simple tasks | | llama.cpp | Free | Hard (compile) | Maximum performance, customization |
Ollama is the most popular choice for developers because of its clean CLI, automatic GPU detection, and OpenAI-compatible API. LM Studio is better for non-technical users who want a visual interface.
Official Resources
- Ollama Official Website — Download and documentation
- Ollama GitHub Repository — Source code, issues, and model library
- Ollama Model Library — Browse all available models
- Open WebUI — Best GUI for Ollama
- Continue Extension — Free AI coding assistant for VS Code using Ollama
- Ollama Discord Community — Community support and discussions
Community Questions
0No questions yet. Be the first to ask!