tools

Guide to Local LLMs: Getting Started with Ollama, LM Studio and more

May 22, 2025

by Macfleet Team

With the growing interest in AI privacy and customization, running large language models (LLMs) locally on your own hardware has become increasingly popular. But for beginners, the ecosystem of tools like Ollama, LM Studio, and Open WebUI can be overwhelming. This guide breaks down everything you need to know to get started with local LLMs.

Understanding local LLMs

Running LLMs locally offers several advantages:

Complete Privacy: Your data never leaves your machine
No Subscription Costs: Use open-source models for free
Customization: Fine-tune models for specific use cases
Offline Access: Work without an internet connection

Hardware requirements

Your hardware will determine which models you can run effectively:

GPU VRAM Requirements

4GB VRAM: Run Gemma 2B, Phi 3 Mini at Q8 or Llama 3 8B/Gemma 9B at Q4
8GB VRAM: Run Llama 3 8B/Gemma 9B at Q8
16GB VRAM: Run Gemma 27B/Command R 35B at Q4
24GB VRAM: Run Gemma 27B at Q6 or Llama 3 70B at Q2

Quantizations (Q2, Q4, etc.) compress models to run on less powerful hardware. Q8 offers high quality with minimal intelligence loss, while Q2 is suitable only for large models on non-coding tasks.

Best tools for beginners

LM Studio

LM Studio offers the simplest entry point for beginners:

Easy-to-use GUI interface
Built-in model library with one-click downloads
Automatic quantization options
OpenAI-compatible API server
Support for embedding models like Nomic Embed v1.5

Ollama

Ollama provides a more developer-focused approach:

Command-line interface (simple but powerful)
Great for programmers and API integration
Excellent performance optimization
Works well with various front-ends

AnythingLLM

AnythingLLM combines document processing with local LLMs:

Built-in RAG (Retrieval-Augmented Generation)
Document indexing and vectorization
User-friendly interface
Both local and cloud model support

Open WebUI

A powerful front-end primarily for Ollama:

Rich feature set
Multi-user support
Works over local networks
Customization options

Step-by-step setup guide

Getting started with LM Studio

Download and install LM Studio from their website
Browse the model library and download a model that fits your hardware
Select your preferred quantization level
Run the model locally and start chatting
Optionally, enable the API server to connect with other applications

Popular frameworks for running LLMs locally

There are several excellent frameworks for running LLMs on your local machine. Here's a breakdown of the most user-friendly options:

1. GPT4All

GPT4All is one of the most beginner-friendly options for running LLMs locally:

Easy setup: Simple installation process with a user-friendly GUI
GPU acceleration: Automatically uses CUDA if available
OpenAI integration: Can use your OpenAI API key to access GPT-3.5/4
Context-aware responses: Connect local folders for document-based queries
API server: Enable the API server for integration with other applications

Explore GPT4All →

2. LM Studio

LM Studio offers more customization than GPT4All:

Rich model library: Easy access to download models from Hugging Face
Multiple model sessions: Run and compare different models simultaneously
Advanced configuration: Fine-tune model parameters for optimal performance
Local inference server: Launch an API server with one click
High performance: Optimized for speed with GPU acceleration

Explore LM Studio →

3. AnythingLLM

AnythingLLM combines document processing with local LLMs:

Built-in RAG: Integrated Retrieval-Augmented Generation
Document indexing: Automatically processes and vectorizes your content
User-friendly interface: Clean design for easy interaction
Flexible model support: Works with both local and cloud models
Multi-user capability: Supports team collaboration

Explore AnythingLLM →

4. Jan

Jan combines speed with an elegant interface:

Fast response generation: Generates responses at ~53 tokens/sec
Beautiful UI: Clean, ChatGPT-like interface
Model importing: Import models from other frameworks
Extensions: Install extensions to enhance functionality
Proprietary model support: Use models from OpenAI, MistralAI, and Groq

Explore Jan →

5. llama.cpp

A powerful C/C++ implementation that powers many LLM applications:

High efficiency: Written in C/C++ for maximum performance
Flexible deployment: Run via command line or web interface
GPU acceleration: Install CUDA-enabled version for faster responses
Deep customization: Fine-tune all model parameters
Developer-friendly: Great for integrating into custom applications

Explore llama.cpp →

6. llamafile

Simplifies llama.cpp into a single executable file:

Single-file executable: Combines llama.cpp with Cosmopolitan Libc
No configuration needed: Automatically uses GPU without setup
Multimodal support: Models like LLaVA can process images and text
High performance: Much faster than standard llama.cpp (up to 5x)
Cross-platform: Works on Windows, macOS, and Linux seamlessly

Explore llamafile →

7. Ollama

Command-line focused tool with wide application support:

Terminal-based: Easy to use through command line
Wide model support: Access Llama 3, Mistral, Gemma, and more
Application integration: Many applications accept Ollama integration
Custom model support: Use downloaded models from other frameworks
Simple commands: Easy-to-remember commands for model management

Get started with our Ollama guide →

8. NextChat

Perfect for those who want to use proprietary models locally:

API integration: Use GPT-3, GPT-4, and Gemini Pro via API keys
Web UI available: Also available as a web application
One-click deployment: Deploy your own web instance easily
Local data storage: User data saved locally for privacy
Customization options: Full control over model parameters

Explore NextChat →

Setting up document processing (RAG)

For those looking to chat with their documents:

Choose a solution with RAG capabilities (AnythingLLM, Jan.io)
Import your documents (PDFs, Word files, code repositories)
The system will automatically index and vectorize your content
Connect to your local LLM or a cloud provider
Start asking questions about your documents

Advanced topics

Understanding model sizes and capabilities

Different model sizes offer various capabilities:

Small models (2B-8B parameters): Basic assistance, limited reasoning
Medium models (8B-30B parameters): Good reasoning, coding abilities
Large models (30B+ parameters): Advanced reasoning, specialized knowledge

Running models on multiple GPUs

For larger models, you can distribute the workload:

Use tensor parallelism to split models across GPUs
Configure VRAM allocation for optimal performance
Balance between GPU and CPU offloading

Ready to start your local LLM?

Running local LLMs gives you control, privacy, and customization that cloud services can't match. Start with LM Studio for the easiest entry point, then explore other options as you become more comfortable with the technology.

Whether you're looking to chat privately with AI, process sensitive documents, or build custom applications, local LLMs offer a powerful alternative to cloud-based solutions. The initial learning curve is well worth the freedom and capabilities you'll gain.