Key Takeaways
- Andrej Karpathy's "LLM Wiki" pattern uses a local AI model to compile, interlink, and maintain a structured knowledge base from your raw documents — building a persistent, compounding second brain instead of re-deriving answers from scratch every time you ask a question.
- You can set this up entirely on hardware you already own using Obsidian (free, local-first markdown editor) and Ollama (free, open-source LLM runner) — no cloud services required, no data leaves your network.
- This approach is fundamentally different from RAG pipelines and cloud tools like NotebookLM: your knowledge accumulates locally in plain markdown files you own forever, with zero vendor lock-in and complete data privacy.
What Is an LLM Knowledge Base (And Why Should You Care)?
The Problem With Stateless AI
If you have used ChatGPT, Claude, or any other AI assistant, you have experienced the same frustration: every conversation starts from zero. You spend twenty minutes building context, explaining your project, uploading documents — and then the session ends. Next time, you do it all over again. Nothing carries forward. Nothing accumulates.
Most tools that try to solve this problem use a technique called RAG (Retrieval-Augmented Generation). You upload a collection of documents, and the AI retrieves relevant chunks at query time to generate an answer. Google's NotebookLM, ChatGPT's file uploads, and most enterprise knowledge tools work this way. It is functional, but limited. The AI is rediscovering knowledge from scratch on every question. Ask something that requires synthesizing five different documents, and the AI has to find and piece together the relevant fragments every single time. Nothing is built up. There is no accumulation.
Karpathy's "Compilation" Insight
On April 2, 2026, Andrej Karpathy — co-founder of OpenAI, former AI Director at Tesla, and the person who coined the term "vibe coding" — posted a detailed breakdown of a different approach he has been using for his own research. The post went viral, reaching over 325,000 views and widespread coverage within 48 hours.
The core idea is simple but powerful: instead of just retrieving from raw documents at query time, have an LLM incrementally build and maintain a persistent wiki — a structured, interlinked collection of markdown files that sits between you and your raw sources.
When you add a new source, the LLM does not just index it for later retrieval. It reads the document, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you have fed it.
Karpathy reports that at a scale of roughly 100 articles and 400,000 words, the system handles complex queries without needing any vector database or traditional RAG infrastructure at all. The LLM auto-maintains its own index files and summaries, and can navigate the full corpus efficiently using that self-built structure.
He described the shift in one sentence: a large fraction of his recent token throughput now goes into manipulating knowledge rather than manipulating code.
Why Local-First Matters for Your Knowledge Base
The Cloud Problem
NotebookLM processes your documents on Google's servers. ChatGPT file uploads pass through OpenAI's infrastructure. Most RAG-as-a-service platforms store your data in third-party cloud environments you do not control. For generic research on public topics, this may be acceptable. For anything personal, sensitive, or business-critical, it is a real liability.
Consider what a serious personal knowledge base might contain: journal entries, health records, financial documents, business strategy notes, client communications, personal goals, therapy reflections. Every one of these becomes a data point on someone else's server the moment you upload it to a cloud-based AI tool. Terms of service can change. Data retention policies are opaque. Breach disclosures are routine.
The Digital Sovereignty Case
Readers of this site understand the principle: when you control the infrastructure, you control the experience. It is the same reason we recommend owning your own modem instead of renting from your ISP, and it is the same reason we recommend running AI models on your own hardware instead of routing every query through a cloud API.
An LLM knowledge base built on Obsidian and Ollama runs entirely on your local network. Your research notes, your documents, your personal journals — none of it leaves your house. The entire system is plain markdown files on your disk. No proprietary database. No vendor lock-in. No subscription that can be discontinued. If Obsidian disappeared tomorrow, your files would still be readable in any text editor on any operating system. If you want to switch from Ollama to a different LLM runner, the wiki does not care — it is just markdown.
This is digital sovereignty applied to knowledge management.
What You Need to Get Started
Hardware Requirements
The good news: if you already have a reasonably modern computer, you can start with zero additional investment. The table below breaks down three practical tiers.
| Tier | Hardware | Cost | Model Capability | Best For |
|---|---|---|---|---|
| Entry | Any PC/Mac with 16 GB RAM (post-2020) | $0 (use what you have) | 7B–8B parameter models (Llama 3.3 8B, Phi-4) | Trying the workflow, small wikis (under 50 sources) |
| Mid | Mini PC or desktop with 32 GB RAM | $300–$500 (e.g., Beelink SER8) | 14B–32B parameter models (Qwen 2.5 14B, Llama 3.3) | Dedicated always-on AI server, medium wikis |
| Power | Desktop with NVIDIA GPU (24 GB VRAM) | $650–$750 (used RTX 3090) | 70B+ parameter models, MoE models (Llama 4 Scout) | Large-scale wikis (100+ sources), fast inference, complex synthesis |
For detailed GPU comparisons, model benchmarks, and specific buying recommendations, see our full hardware guide for running local AI models.
Software Stack
Every component in this stack is free for personal use:
- Obsidian — A local-first markdown editor that stores your notes as plain files on your disk. Free for personal use. Not open-source, but its file-over-app philosophy means your data is never locked in. Alternatives include Logseq (open-source) and Foam (a VS Code extension).
- Ollama — An open-source local LLM runner. Install it, pull a model, and you have a local AI endpoint that any tool can talk to. Binds to localhost by default, which is the correct security posture.
- Obsidian Web Clipper — A browser extension that converts web articles into clean markdown files with a single click. This is your primary source ingestion tool.
- Claude Code (optional) — Anthropic's command-line coding agent. If you prefer a cloud-assisted workflow, Claude Code can operate on your local file system directly. Note the privacy tradeoff: your prompts and file contents are processed on Anthropic's servers. For purely local operation, stick with Ollama.
- Optional tools: Marp (markdown-based slide decks), Dataview (Obsidian plugin for querying page metadata), qmd (a local search engine for markdown files with hybrid BM25/vector search).
How to Set Up Your LLM Knowledge Base (Step by Step)
Step 1: Create Your Directory Structure
The architecture has three layers, and getting them right from the start matters. Create the following inside a new Obsidian vault:
my-knowledge-base/
├── raw/ # Immutable source documents (articles, papers, data files)
│ └── assets/ # Downloaded images from clipped articles
├── wiki/ # LLM-generated and maintained pages (summaries, concepts, entities)
├── output/ # Query results, slides, charts, reports
├── index.md # Content-oriented catalog of every wiki page
├── log.md # Chronological record of ingests, queries, lint passes
└── SCHEMA.md # Instructions that tell the LLM how to operate
raw/ is your source of truth. The LLM reads from it but never modifies it. Articles, papers, datasets, images — anything you are tracking goes here. These files are immutable.
wiki/ is where the LLM does all its work. Summaries, entity pages, concept pages, comparisons, synthesis documents — the LLM creates, updates, and maintains everything in this directory. You read it; the LLM writes it.
SCHEMA.md is the critical configuration file. It tells the LLM how the wiki is structured, what conventions to follow, and what workflows to execute when ingesting sources, answering questions, or running maintenance. This is the difference between a disciplined wiki maintainer and a generic chatbot. More on this in Step 4.
Step 2: Install and Configure Obsidian
1. Download Obsidian from obsidian.md and install it on your computer.
2. Open Obsidian and create a new vault pointed at your my-knowledge-base/ directory.
3. Configure local image storage. Go to Settings, then Files and Links, and set "Attachment folder path" to raw/assets/. This ensures all downloaded images are stored locally so the LLM can reference them directly instead of relying on URLs that may break.
4. Install the Obsidian Web Clipper browser extension from the Obsidian website. Configure it to save clipped articles into your raw/ folder as markdown files.
5. Optional but recommended: In Settings, then Hotkeys, search for "Download" and bind "Download attachments for current file" to a keyboard shortcut (such as Ctrl+Shift+D). After clipping an article, hitting this shortcut pulls all embedded images to local storage.
6. Optional plugins: Install Dataview if you want to run queries over page metadata, and Marp if you want the LLM to generate slide presentations from wiki content.
Step 3: Set Up Your Local LLM
1. Install Ollama from ollama.com. It is available for macOS, Linux, and Windows.
2. Pull a model appropriate for your hardware:
# Entry tier (16 GB RAM, no dedicated GPU)
ollama pull llama3.3:8b
# Mid tier (32 GB RAM or 12+ GB VRAM)
ollama pull qwen2.5:14b
# Power tier (24 GB VRAM)
ollama pull llama4-scout
3. Verify the model is running:
ollama run llama3.3:8b "Summarize the concept of digital sovereignty in two sentences."
4. Connect Obsidian to your local LLM. Install the Obsidian Copilot community plugin. In Copilot settings, add a new model provider, set the base URL to http://127.0.0.1:11434 (Ollama's default), select your model, and type "local" in the API key field. This gives you an AI chat panel inside Obsidian that runs entirely on your machine.
Important security note: Ollama binds to localhost (127.0.0.1) by default. Do not change this to 0.0.0.0 unless you have specifically segmented your network and understand the implications. Exposing an LLM endpoint to your entire local network — or worse, the internet — is a serious security risk. If you need to access Ollama from another device on your network, use a reverse proxy with authentication, or see our Pi-hole setup guide for DNS-level monitoring of your AI infrastructure.
Step 4: Write Your Schema File
The schema file is what transforms a generic LLM into a disciplined knowledge base maintainer. Create SCHEMA.md in the root of your vault with instructions covering at minimum the following areas:
Page types and naming conventions. Define what kinds of pages the wiki contains — source summaries, entity pages, concept pages, comparison pages, synthesis pages — and how each should be named and structured. For example: source summaries go in wiki/sources/ and use the format YYYY-MM-DD_source-title.md; concept pages go in wiki/concepts/ and use a plain descriptive name.
Metadata standards. Specify YAML frontmatter that every page must include — tags, date created, date updated, source count (how many raw sources inform this page), and confidence level. This metadata powers Dataview queries later.
Linking conventions. Tell the LLM to use Obsidian-style wikilinks ([[page-name]]) for internal connections and standard markdown links for external references. Require bidirectional linking: if page A references page B, page B should reference page A.
Ingest workflow. Define the exact steps the LLM should follow when processing a new raw source: read the document, discuss key takeaways with you, write a source summary page, update the index, update all relevant entity and concept pages, and append an entry to the log.
Query workflow. Define how the LLM should answer questions: read the index first, identify relevant pages, read those pages, synthesize an answer with citations to specific wiki pages, and optionally file the answer as a new wiki page.
Lint workflow. Define health-check criteria: find contradictions between pages, identify stale claims superseded by newer sources, flag orphan pages with no inbound links, surface important concepts mentioned but lacking their own page, and suggest new sources to seek out.
You and the LLM will co-evolve this schema over time as you learn what works for your domain. This is the same principle behind Claude Code's MEMORY.md architecture — a lightweight index file that guides AI behavior — scaled up to a full knowledge management system.
Step 5: Ingest Your First Source
Start small. Clip a single article using the Web Clipper and save it to raw/. Then tell your LLM:
I just added a new source to raw/. Please read it, tell me the key takeaways,
then file it into our wiki following the ingest workflow in SCHEMA.md.
Watch what happens. The LLM should create a source summary page in wiki/sources/, update index.md with the new entry, create or update relevant concept and entity pages in wiki/, and append a timestamped entry to log.md. A single source might touch 10 to 15 wiki pages as the LLM creates cross-references and updates existing content.
Stay involved for the first several ingests. Read the summaries, check the cross-references, and correct the LLM if it miscategorizes something or misses a connection. This early guidance trains the LLM (through the schema) to handle your domain well. After 10 to 20 sources, the pattern becomes reliable enough that you can ingest with less supervision.
Step 6: Query, Lint, Repeat
Query your wiki. Ask questions that require synthesizing multiple sources. The LLM reads the index, identifies relevant pages, and synthesizes an answer with citations. If the answer is valuable — a comparison you asked for, a connection you discovered, an analysis that took work — file it back into the wiki as a new page. Your explorations compound into the knowledge base the same way ingested sources do.
Run periodic health checks. Every week or so, tell the LLM to lint the wiki. It will scan for contradictions between pages, identify claims that newer sources have superseded, flag orphan pages with no inbound links, surface important concepts that deserve their own page but do not have one yet, and suggest gaps where a web search or a new source could fill in missing information. This is one of the most valuable operations in the entire workflow — the LLM is actively suggesting what questions to ask next and what sources to seek out.
The compounding loop: You add sources. The LLM compiles them. You ask questions. The answers get filed as new material. The wiki grows more sophisticated. Better material enables better answers to harder questions. The system is self-reinforcing — every interaction makes it richer.
LLM Wiki vs. RAG vs. NotebookLM
The table below clarifies how Karpathy's approach differs from the two most common alternatives.
| Feature | LLM Wiki (Local) | RAG Pipeline | NotebookLM / Cloud AI |
|---|---|---|---|
| Data location | Your hardware, your network | Varies (can be local or cloud-hosted) | Google/OpenAI/third-party servers |
| Knowledge accumulation | Persistent, compounding wiki | None — re-derived on every query | None — re-derived on every query |
| Cross-referencing | Pre-built and maintained by the LLM | None (similarity search only) | Limited to session context |
| Vendor lock-in | Zero — plain markdown files | Moderate (vector DB, embeddings model) | High (proprietary platform) |
| Privacy | Complete — nothing leaves your network | Depends on implementation | Low — all data processed on third-party servers |
| Setup complexity | Moderate (one-time setup, then low maintenance) | High (embeddings, vector DB, retrieval tuning) | Low (upload and go) |
| Contradiction detection | Built-in via lint workflow | Not typically supported | Not typically supported |
| Cost | $0 (existing hardware) to $750 (dedicated GPU) | $20–$200+/month (cloud hosting, API calls) | Free tier with limits, then $20+/month |
| Best for | Long-term research, private knowledge, full ownership | Enterprise search over large document collections | Quick exploration of public/non-sensitive material |
The tradeoff is clear. NotebookLM wins on convenience — upload your files and start asking questions in sixty seconds. RAG wins at enterprise scale with thousands of documents and multiple users. But for anyone building a personal or small-team knowledge base that involves sensitive information and a long time horizon, the local LLM wiki is the strongest option available. You own every byte, the knowledge compounds over time, and there are no recurring costs beyond electricity.
Security Considerations
Network Isolation for Your AI Server
If you run Ollama on a dedicated machine (a mini PC, an old desktop, a NAS), make sure that machine is properly segmented on your network. Ideally, place it on its own VLAN with firewall rules that restrict inbound connections to only the devices that need access. This is the same network isolation principle we recommend for any AI agent deployment like OpenClaw.
At minimum, verify that Ollama is bound to localhost (127.0.0.1) and not exposed on 0.0.0.0. Run ss -tlnp | grep 11434 on Linux to check what address Ollama is listening on. If you see 0.0.0.0:11434, any device on your local network can send it prompts — which is not what you want unless you have deliberately configured access controls.
Pair your local AI setup with DNS-level monitoring via Pi-hole. Pi-hole's query log gives you complete visibility into what your AI infrastructure is doing on the network. If a plugin or tool starts making unexpected DNS queries to unfamiliar domains, you will see it immediately.
Model Provenance
Only download models from trusted sources: the official Ollama model library, Hugging Face model pages from verified organizations (Meta, Mistral, Google, Alibaba/Qwen), or direct project repositories. Quantized model files in GGUF format from unknown uploaders could theoretically contain manipulated weights. Treat model downloads with the same caution you would treat any software installation.
The MEMORY.md Connection
If Karpathy's LLM wiki architecture sounds familiar to readers who followed our analysis of the Claude Code source leak, it should. Claude Code's leaked architecture revealed a strikingly similar pattern: a lightweight MEMORY.md index file loaded into context, with detailed knowledge stored in separate topic files and a verify-before-acting discipline. The four-phase consolidation cycle (orient, gather, consolidate, prune) maps directly onto the LLM wiki's ingest-compile-query-lint workflow.
Karpathy's approach is essentially the MEMORY.md pattern scaled up from a single project's context into a full personal knowledge system. And as we noted in our Claude Code coverage, the pattern is model-agnostic. You can implement it using Ollama with any open-weight model and basic filesystem scripting. No vendor lock-in required.
Supply Chain Awareness
Every Obsidian plugin you install, every Python package you pull, every CLI tool you add to your workflow extends your attack surface. The LiteLLM supply chain compromise in March 2026 demonstrated what happens when a single trusted dependency is weaponized — stolen SSH keys, cloud credentials, and API keys across thousands of installations in under three hours.
For your knowledge base setup, keep dependencies minimal. Obsidian, Ollama, and the Web Clipper are all you need to get started. If you add community plugins or third-party tools, audit them before installation. Prefer tools with transparent source code and active maintainers. And if you are building custom scripts to automate your wiki workflows, pin your dependencies to exact versions and review what each package does before trusting it with access to your files.
Practical Use Cases
Research deep-dives. The original use case. Drop papers, articles, and reports into raw/ over weeks or months. The LLM compiles them into a structured wiki with an evolving thesis, cross-referenced concepts, and flagged contradictions. By the time you need to write something, the synthesis already exists.
Personal knowledge management. Journal entries, health notes, therapy reflections, self-improvement goals, podcast summaries, book highlights. The LLM builds a structured picture of you over time — one that stays private on your own hardware.
Home network documentation. Your equipment configurations, ISP history, troubleshooting notes, firmware update records. If you have been following our guides for local AI hardware, Pi-hole setup, and OpenClaw configuration, you already have the kind of accumulated knowledge that benefits from structured organization.
Small business and team knowledge. Meeting transcripts, project documents, client notes, Slack thread summaries. One person feeds sources into the system; the LLM maintains the wiki; everyone on the team can query it. The wiki stays current because the LLM handles the maintenance that no one wants to do manually.
Reading companions. Build a wiki as you read a book. The LLM creates pages for characters, themes, plot threads, and connections between them. By the time you finish, you have a personal reference that captures not just what happened but how everything connects.
Frequently Asked Questions
Do I need a powerful GPU to run an LLM knowledge base locally?
No. If you have a computer with 16 GB of RAM built after 2020, you can run 7B–8B parameter models through Ollama with no GPU at all. These smaller models handle wiki maintenance tasks — summarizing, categorizing, linking — surprisingly well. A dedicated GPU (like a used RTX 3090 with 24 GB VRAM for around $700) makes everything faster and lets you run larger, more capable models, but it is not required to get started. See our local AI hardware guide for detailed recommendations at every budget level.
Can I use this with Claude or ChatGPT instead of a local model?
Yes, with a tradeoff. Claude Code can operate directly on your local file system and follows the same schema-driven workflow. However, your prompts and file contents are processed on Anthropic's (or OpenAI's) servers. For non-sensitive research on public topics, this is often fine and you get access to more capable models. For personal journals, health records, business strategy, or anything you would not post publicly, local models keep your data entirely on your network. You can also use a hybrid approach: local models for sensitive material, cloud models for public research.
How is this different from just using NotebookLM or ChatGPT with file uploads?
Three fundamental differences. First, knowledge accumulates in an LLM wiki — the synthesis is persistent and grows richer with every source. Cloud tools re-derive answers from scratch every time. Second, your data stays on your hardware instead of being processed on third-party servers. Third, plain markdown files have zero vendor lock-in — you own them forever and can read them with any text editor. If a cloud service shuts down or changes its terms, your data goes with it.
What happens if I switch LLM providers later?
Nothing breaks. The wiki is plain markdown. The schema file is plain text. The index and log files are plain text. You can switch from Ollama to LM Studio, from Llama to Qwen, from local to cloud and back again. The wiki does not know or care which LLM reads it. This is one of the strongest arguments for the file-over-app approach: your knowledge infrastructure outlasts any specific tool.
How large can the wiki get before I need RAG infrastructure?
Karpathy reports that at roughly 100 articles and 400,000 words, the system works well without any vector database or embedding-based retrieval at all. The LLM navigates the wiki using its own self-maintained index files and summaries. For most personal knowledge bases, this scale is more than sufficient. If you grow beyond this — into thousands of pages — tools like qmd (a local search engine for markdown files with hybrid BM25/vector search) can add search capabilities without requiring any cloud infrastructure.
Is Obsidian free to use?
Yes, for personal use. Obsidian is free for individual, non-commercial use. Commercial use requires a paid license. Note that Obsidian is not open-source — it is a proprietary application with a local-first philosophy. If open-source matters to you, Logseq is a fully open-source alternative that also uses local markdown files, and Foam is a lightweight VS Code extension that adds bidirectional linking to any markdown folder. The LLM wiki pattern works with any of these tools — or with no tool at all, just a file browser.
Can multiple people collaborate on the same LLM wiki?
Yes. Since the wiki is just a directory of markdown files, you can put it in a Git repository and get version history, branching, and multi-user collaboration for free. Each collaborator can ingest sources and run queries; Git handles merging. For teams, the Obsidian co-creator Steph Ango recommends keeping a clean "personal vault" separate from the agent-facing wiki — let the LLM work in its own space, and only pull distilled artifacts into your personal vault once they have been reviewed. This "contamination mitigation" strategy keeps LLM-generated content from cluttering your personal notes.

