How to Build a Zero-Cost AI Agent Stack with Ollama, n8n, and AnythingLLM
A step-by-step guide to running a private, automated AI agent on your own hardware without spending a dime on API fees or cloud subscriptions.
Last updated: March 2026
Key Takeaways
- Completely free to run. Ollama, n8n (Community Edition), and AnythingLLM are all open-source tools that run on hardware you already own. There are no API fees, no per-token charges, and no monthly subscriptions required.
- Your data never leaves your machine. Unlike cloud-based AI services, this stack processes everything locally. Prompts, documents, and outputs stay on your hardware, which eliminates the risk of third-party data exposure.
- No coding experience required. n8n provides a visual drag-and-drop workflow editor, and AnythingLLM offers a clean desktop interface. You can build a functional AI agent without writing a single line of code.
- Modular and low-risk. Each tool in this stack is independent. If one component does not meet your needs, you can swap it out without dismantling your entire setup. This reduces the cost of experimentation to nearly zero.
- Hardware you already own is likely sufficient. A computer with 16 GB of RAM and a modern processor can comfortably run smaller open-source language models. No specialized GPU is strictly required to get started.
Why Build a Local AI Agent Stack?
Cloud-based AI tools are convenient, but they come with trade-offs that matter if you care about privacy, cost predictability, or simply maintaining control over your own workflows. Every prompt you send to a hosted service travels through someone else's infrastructure. Every document you upload is processed on servers you do not own. And every month, the bill arrives whether you used the service heavily or not.
A local AI agent stack sidesteps all of these concerns. By combining three free, open-source tools, you can build a system that processes natural language, automates multi-step workflows, and answers questions about your own documents, all without any data leaving your machine.
This guide walks you through setting up that stack from scratch. The three tools at its core are Ollama (your local language model runtime), n8n (your workflow automation engine), and AnythingLLM (your document management and chat interface). Each one is free, each one runs on your hardware, and together they form a capable AI agent that rivals many paid alternatives.
What Each Tool Does (and Why It Was Chosen)
Ollama: The Local Language Model Engine
Ollama is a free, open-source application that lets you download and run large language models directly on your computer. It supports models from the Llama, Mistral, Gemma, and Phi families, among others, and it handles all the technical complexity of model management behind the scenes. You install it, pull a model with a single command, and start generating text immediately.
From a safety and privacy standpoint, Ollama is significant because it keeps model inference entirely on your hardware. No prompts are sent to external servers. No responses are logged by a third party. If you are working with sensitive information, proprietary business data, or anything you simply prefer to keep private, Ollama provides that guarantee by design.
Ollama also exposes a local API on port 11434 that other applications can connect to. This is what makes it the backbone of the stack: both n8n and AnythingLLM can send requests to Ollama and receive model responses without any internet connection.
n8n (Community Edition): The Workflow Automation Layer
n8n is a workflow automation platform similar in concept to Zapier or Make, but with a critical difference: its Community Edition is free to self-host and comes with no limits on workflows or executions. You install it on your own machine, open a visual editor in your browser, and connect different steps together using a drag-and-drop canvas.
In this stack, n8n serves as the "brain" that coordinates actions. It can receive a trigger (like a chat message, a form submission, or a scheduled timer), send that input to Ollama for processing, and then route the response to another step, such as sending an email, updating a spreadsheet, or storing data. n8n has native support for Ollama through its built-in Ollama Chat Model node, which means connecting the two requires no custom code.
n8n also supports more advanced AI patterns through its agent and chain nodes, which are built on the LangChain framework. This means you can add memory to your agents, give them access to tools, and build multi-step reasoning workflows, all within the visual editor.
AnythingLLM: The Document Interface and Knowledge Base
AnythingLLM is a free, open-source desktop application that lets you chat with your documents using any language model. It supports PDFs, Word documents, CSVs, text files, and more. You upload your files into a workspace, AnythingLLM processes and indexes them using embeddings, and then you can ask questions about those documents in natural language.
This is what turns your stack from a simple chatbot into something genuinely useful. Instead of relying on the language model's general training data, AnythingLLM uses a technique called Retrieval-Augmented Generation (RAG) to ground responses in your actual documents. When you ask a question, it searches your uploaded files for relevant context and feeds that context to the model along with your prompt. This produces answers that are specific to your data rather than generic guesses.
AnythingLLM connects directly to Ollama as its language model and embedding provider. It also includes a built-in vector database, so there is no need to set up a separate database service. Everything runs locally, and no signup or account creation is required.
Hardware Requirements
Before starting the installation, confirm that your hardware meets these minimum specifications. Running language models locally is more resource-intensive than typical software, so it is worth checking upfront to avoid performance issues down the road.
Minimum Specifications
RAM: 8 GB is the absolute minimum for running small models (3 billion parameters or fewer). 16 GB is strongly recommended and will allow you to run 7-billion-parameter models comfortably, which offer noticeably better output quality.
CPU: Any modern multi-core processor from the last five years should work. Apple Silicon (M1, M2, M3, M4) performs particularly well for local inference. Intel and AMD processors are also fully supported.
Storage: Plan for at least 10 to 20 GB of free disk space. Individual models range from roughly 2 GB (for small models) to 8 GB or more (for larger, higher-quality models).
GPU: A dedicated GPU is not required, but it will significantly speed up response times if you have one. Ollama supports NVIDIA GPUs (via CUDA), AMD GPUs (via ROCm), and Apple Silicon (via Metal).
Operating system: All three tools support Windows, macOS, and Linux.
Step 1: Install and Configure Ollama
Ollama is the foundation of this stack. Install it first, verify it is working, and then move on to the other tools.
1.1 Download and Install Ollama
Visit ollama.com/download and download the installer for your operating system. On macOS and Windows, run the installer and follow the prompts. On Linux, you can install it with a single terminal command provided on the download page.
Once installed, Ollama runs as a background service. It starts automatically and listens for requests on http://localhost:11434.
1.2 Download a Language Model
Open a terminal (or Command Prompt on Windows) and pull a model. For most users, the Llama 3.2 model at 3 billion parameters is a strong starting point. It runs well on 8 GB of RAM and produces solid general-purpose output:
ollama pull llama3.2
If you have 16 GB of RAM or more and want higher quality output, consider pulling a 7-billion-parameter model such as Mistral:
ollama pull mistral
You can browse the full library of available models at ollama.com/library.
1.3 Verify the Installation
Test that Ollama is working by running a quick chat session directly in the terminal:
ollama run llama3.2
Type a prompt and confirm you receive a response. Type /bye to exit the session. You can also confirm the API is running by opening http://localhost:11434 in a web browser. You should see a simple confirmation message.
1.4 Security Consideration
By default, Ollama only listens on localhost, which means it only accepts connections from your own machine. This is the safest configuration and should not be changed unless you have a specific reason to expose it to your local network. If you are running all three tools on the same computer, the default configuration is correct.
Step 2: Install and Configure n8n
With Ollama running in the background, the next step is to install n8n so you can build automated workflows that send prompts to your local model and act on the responses.
2.1 Choose Your Installation Method
n8n offers several installation paths. The simplest options for a local setup are:
Option A: Docker (recommended if you have Docker installed). Run n8n in a container with a single command:
docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n
Option B: npm (if you have Node.js installed). Install n8n globally via npm:
npm install -g n8n
Then start it with:
n8n start
Both methods will launch n8n and make it available at http://localhost:5678 in your browser.
2.2 Create Your Account
The first time you access n8n in the browser, it will prompt you to create a local owner account. This account exists only on your machine and is not shared with any external service. Choose a strong password, as this protects access to your automation workflows.
2.3 Connect n8n to Ollama
This is where the stack starts to come together. To connect n8n to your local Ollama instance:
Create a new workflow
From the n8n dashboard, click the button to create a new workflow.
Add a trigger node
Click "Add first step" and select "When chat message received." This creates a chat interface within n8n that you can use for testing.
Add an AI Agent node
Click the plus icon after your trigger node and search for "AI Agent." Add it to your canvas and connect it to the trigger.
Attach the Ollama Chat Model
Click on the "Chat Model" port of the AI Agent node and search for "Ollama Chat Model." Select it, then create a new credential. Set the Base URL to http://localhost:11434 if you are running n8n directly on your machine, or http://host.docker.internal:11434 if n8n is running inside Docker. Click "Save" and you should see a success confirmation.
Select your model
In the Ollama Chat Model settings, choose the model you downloaded earlier (for example, llama3.2).
Test the workflow
Save the workflow, then click the chat icon in the bottom-right corner of the canvas. Type a message and confirm that you receive a response from your local model.
2.4 Security Consideration
The n8n Community Edition runs entirely on your machine. However, if you plan to expose n8n to the internet (for example, to receive webhooks from external services), you should place it behind a reverse proxy with SSL/TLS encryption and restrict access with authentication. For a purely local setup, the default localhost configuration is secure.
Step 3: Install and Configure AnythingLLM
AnythingLLM adds document-aware intelligence to your stack. It gives you a clean chat interface where you can upload files, ask questions about their contents, and get answers grounded in your actual data.
3.1 Download and Install AnythingLLM Desktop
Visit anythingllm.com/desktop and download the installer for your operating system. Run the installer and follow the prompts. No account creation or signup is required.
3.2 Connect AnythingLLM to Ollama
When you first launch AnythingLLM, it will walk you through an onboarding process where you select your LLM provider. Choose "Ollama" from the list of available providers. AnythingLLM will detect your local Ollama instance automatically if it is running on the default port. Select the model you downloaded earlier.
AnythingLLM will also ask you to select an embedding provider. Again, choose Ollama. This ensures that document embeddings (the numerical representations used for search and retrieval) are generated locally as well.
3.3 Create a Workspace and Upload Documents
AnythingLLM organizes your data into workspaces. Think of each workspace as a self-contained project. Documents uploaded to one workspace do not bleed into another, which is useful for keeping different contexts separated.
To get started, create a new workspace and give it a descriptive name. Then upload one or more documents. AnythingLLM supports PDFs, Word documents (.docx), plain text files, CSVs, and several other formats. Once uploaded, the system will process and embed the documents, which may take a few moments depending on file size.
3.4 Test Document-Based Chat
With documents uploaded, type a question in the workspace chat window. AnythingLLM will search your documents for relevant context, pass that context along with your question to the Ollama model, and return a grounded answer. You should see citations or references indicating which parts of your documents informed the response.
3.5 Privacy Consideration
AnythingLLM stores all data, including documents, embeddings, and chat history, locally on your machine by default. The application includes a built-in vector database (LanceDB), so there is no need to configure or connect to an external database service. You can verify and adjust privacy settings within the application, including the option to disable telemetry if you prefer to share no usage data at all.
Step 4: Connect the Full Stack
At this point, you have three independent tools running on your machine. The final step is to make them work together as a unified AI agent. There are several practical ways to do this depending on your use case.
Pattern 1: n8n as the Automation Layer, Ollama as the Brain
This is the most common pattern. You build workflows in n8n that accept input (via chat, webhook, email, or scheduled trigger), send that input to Ollama for processing, and then route the model's output to a downstream action. Examples include:
Summarizing incoming emails and saving the summaries to a file. Classifying support tickets by category and urgency. Generating draft responses to frequently asked questions. Extracting structured data from unstructured text inputs.
All of this runs locally with no external API calls.
Pattern 2: AnythingLLM for Document Q&A, n8n for Task Automation
Use AnythingLLM as your primary interface for asking questions about company documents, research papers, manuals, or any other reference material. Use n8n for everything else: scheduling tasks, connecting different applications, and building multi-step automations that involve your local model.
This separation keeps your workflows clean and your document context isolated. AnythingLLM handles retrieval-augmented generation, while n8n handles procedural automation.
Pattern 3: n8n Calling AnythingLLM via API
If you run AnythingLLM as a Docker-based server instance (rather than the desktop version), it exposes a REST API that n8n can call directly. This allows you to build workflows that query specific AnythingLLM workspaces programmatically, combining document retrieval with workflow automation in a single pipeline.
Recommended Models for This Stack
The quality of your AI agent depends heavily on which model you run. Here are practical recommendations based on available hardware:
For machines with 8 GB of RAM
Use Llama 3.2 (3B) or Phi-3 Mini (3.8B). These models are small enough to run on constrained hardware while still producing coherent, useful output for summarization, classification, and basic question answering.
For machines with 16 GB of RAM
Use Llama 3.2 (7B), Mistral (7B), or Gemma 2 (9B). These offer a meaningful step up in reasoning quality and instruction following. The 7-billion-parameter range is widely considered the sweet spot for local inference on consumer hardware.
For machines with 32 GB of RAM or a dedicated GPU
Consider Llama 3.1 (13B) or larger quantized models. At this level, you can expect output quality that approaches some commercial APIs for many common tasks.
To download any model, simply run ollama pull [model-name] in your terminal. You can have multiple models installed simultaneously and switch between them as needed.
Safety, Privacy, and Risk Considerations
One of the primary motivations for building this stack is to reduce risk. Here is a summary of the specific protections this setup provides and the areas where you should remain cautious.
What This Stack Protects Against
Data leakage to third parties. Because all processing happens on your hardware, your prompts, documents, and outputs are never transmitted to an external server. This is particularly important if you work with proprietary information, client data, or anything subject to confidentiality agreements.
Unpredictable costs. Cloud AI services charge per token, per API call, or per month. Costs can spike unexpectedly with heavy usage. This stack has no variable costs. Your only expense is the electricity to run your computer.
Vendor lock-in. Each component of this stack is open-source and interchangeable. If Ollama were to cease development tomorrow, you could replace it with another local inference tool like LM Studio or vLLM without rebuilding the rest of your setup.
Service outages. Cloud services go down. Local tools do not depend on someone else's uptime. If your computer is running, your AI agent is available.
What This Stack Does Not Protect Against
Model hallucinations. Local language models can and will generate inaccurate information, just like their cloud-hosted counterparts. Always verify critical outputs before acting on them.
Physical device security. If someone gains access to your machine, they gain access to your models, documents, and workflow configurations. Use standard security practices: full-disk encryption, strong passwords, and screen locking.
Outdated model knowledge. Open-source models have training data cutoffs. They will not know about events or information that postdates their training. For time-sensitive tasks, you may need to supplement model responses with current data, which n8n can help automate through web scraping or API integrations.
Troubleshooting Common Issues
Ollama is not responding
Open a terminal and run ollama list to verify the service is running and at least one model is installed. If the service is not running, restart it with ollama serve. Check that port 11434 is not blocked by a firewall.
n8n cannot connect to Ollama
If n8n is running in Docker and Ollama is installed directly on your machine, use http://host.docker.internal:11434 as the Ollama base URL in your n8n credentials. The standard localhost address does not resolve correctly from inside a Docker container.
AnythingLLM is slow to respond
Large documents take time to embed. If responses are slow during chat, try using a smaller embedding model or reducing the size of uploaded documents. Also confirm that Ollama is not already busy serving a request from n8n at the same time, as concurrent requests can slow down response times on machines without a dedicated GPU.
Model output quality is poor
Smaller models trade quality for speed and lower resource usage. If outputs are consistently unhelpful, try pulling a larger model. Moving from a 3-billion-parameter model to a 7-billion-parameter model typically produces a noticeable improvement in reasoning and instruction following.
Where to Go From Here
Once your stack is running and tested, there are several directions you can take it depending on your needs:
Build specific automation workflows in n8n. Start with a simple use case, such as summarizing text pasted into a form, and gradually add complexity. n8n's template library includes dozens of AI-focused workflows you can import and modify.
Upload your real documents to AnythingLLM. The tool becomes far more useful when it has domain-specific context. Internal documentation, research papers, product specs, and meeting notes are all good candidates.
Experiment with different models. The open-source model ecosystem is large and evolving quickly. Try different models for different tasks. A coding-specialized model like CodeLlama may outperform a general-purpose model for technical workflows, while a smaller, faster model may be better for real-time classification tasks.
Add memory and tools to your n8n agents. n8n's AI Agent node supports adding memory (so the agent remembers previous messages in a conversation) and tools (so the agent can take actions like searching the web, querying a database, or calling an API). These features allow you to build agents that go beyond simple prompt-and-response patterns.
Final Thoughts
Building a local AI agent stack is not just a cost-saving measure. It is a deliberate choice to keep control over your data, your workflows, and your tools. Ollama, n8n, and AnythingLLM are each mature, actively maintained, and free to use. Together, they provide a foundation that is private by default, modular by design, and capable enough to handle a wide range of real-world tasks.
The barrier to entry has never been lower. If you have a reasonably modern computer and an hour to follow this guide, you can have a working AI agent stack running before the end of the day, at zero cost, with full ownership of everything it does.

