Claude Code Leak: Anti-Distillation, Undercover Mode, and Self-Healing Memory (Part 2)

Community analysis of the Claude Code leak reveals frustration tracking, fake tool injection to poison competitors, AI Undercover Mode for open-source repos, and a Self-Healing Memory architecture worth replicating locally.

ModemGuides Support Updated on Mar 31, 2026

Last updated: March 2026

This is Part 2. Read Part 1: Claude Code Source Leak and npm Supply Chain Attack for the full breakdown of the initial leak, the axios compromise, remote killswitches, and the immediate security steps you should take.

Key Takeaways

Anthropic confirmed the leak was "a release packaging issue caused by human error, not a security breach." The 512,000-line codebase is permanently mirrored across thousands of GitHub forks and cannot be retracted.
Community analysis uncovered anti-distillation mechanisms that inject fake tool definitions into API responses, a frustration-tracking regex, and an "Undercover Mode" that instructs the AI to hide its identity when contributing to open-source repositories.
The leaked code reveals a sophisticated "Self-Healing Memory" architecture and an unreleased autonomous agent mode called KAIROS — engineering patterns that local-first AI builders can replicate without the telemetry or vendor lock-in.

The Fallout: "Human Error" and Vanishing Repositories

Anthropic's official statement, provided to Fortune, The Register, and other outlets on the afternoon of March 31, confirmed that the leak was real. The company described it as a packaging mistake, not a security breach, and stated that no customer data or credentials were exposed. They said they were rolling out measures to prevent a recurrence.

That statement carefully avoids addressing the content of what was leaked. It does not comment on the feature flags, the telemetry mechanisms, the anti-distillation system, or any of the other architectural details that community researchers have spent the day dissecting.

The timing is difficult to ignore. Five days earlier, on March 26, Fortune reported that a misconfigured content management system had exposed approximately 3,000 unpublished internal assets, including draft blog posts about an unreleased model called Claude Mythos (internally codenamed "Capybara"). Two configuration-level exposures of sensitive internal information in a single week, from a company that markets itself as the safety-first AI lab, raises legitimate questions about operational discipline.

There is also a toolchain angle. Anthropic acquired Bun, the JavaScript runtime, and Claude Code is built on top of it. An open Bun bug (oven-sh/bun#28001), filed on March 11, reports that source maps are served in production mode even when Bun's own documentation says they should be disabled. That issue remains unresolved. If Anthropic's own toolchain shipped a known bug that exposed their own product's source code, the supply-chain irony is layered: a supply chain tool with a known defect caused a supply chain exposure of a product that itself depends on supply chain packages (like axios) that were independently compromised that same day.

Meanwhile, the code is permanently in the wild. The original GitHub mirror accumulated thousands of stars and over 41,500 forks before Anthropic could respond. Some mirrors have been removed or pivoted to derivative projects out of concern over copyright, but the full TypeScript source is archived across enough locations that retraction is not possible. This is now public knowledge, and the security and AI communities are treating it accordingly.

Frustration Tracking, Fake Tools, and Undercover Mode

The most discussed findings from the leaked source code are not about what Claude Code does for users. They are about what it does in the background — tracking sentiment, manipulating API responses, and hiding its own identity.

The Frustration Regex

A file called userPromptKeywords.ts contains a regex pattern designed to detect when users are swearing at or expressing frustration with the tool. The pattern matches profanity, insults, and phrases like "so frustrating" and "this sucks." When a match is detected, the event is tagged and sent as telemetry.

The Hacker News thread on this finding was largely amused by the irony: a company with arguably the most capable language model in the world chose a hardcoded regex over its own AI for sentiment analysis. The practical explanation is straightforward — a regex is faster and cheaper than an inference call just to check if someone is upset. But the open question is more important than the implementation detail: why is Anthropic tracking user frustration at all, and where does that data go?

The leaked code does not answer that question. The telemetry is collected and transmitted, but the server-side handling is not part of the client codebase. Users have no visibility into how frustration data is aggregated, stored, or used. There is no documented way to opt out of this specific telemetry category independent of disabling all analytics.

Anti-Distillation: Decoy Tools in the System Prompt

In claude.ts, there is a flag called ANTI_DISTILLATION_CC. When enabled, Claude Code sends a parameter in its API requests that instructs the server to silently inject fake tool definitions into the system prompt. These decoy tools do not correspond to any real functionality. They exist to corrupt the training data of anyone recording Claude Code's API traffic to train a competing model.

This is a direct response to a real problem. In February 2026, Anthropic published a detailed report documenting industrial-scale distillation campaigns by three Chinese AI labs — DeepSeek, Moonshot, and MiniMax — which collectively generated over 16 million exchanges through approximately 24,000 fraudulent accounts to extract Claude's capabilities. The anti-distillation mechanism is specifically designed to make that kind of theft less effective.

The mechanism is gated behind a GrowthBook feature flag (tengu_anti_distill_fake_tool_injection) and only activates for first-party CLI sessions using Anthropic's own API. It does not activate for third-party API providers or SDK-based integrations.

There is also a second anti-distillation layer: server-side connector-text summarization. When enabled, the API replaces Claude's full reasoning text between tool calls with a compressed summary and a cryptographic signature. The original text can be restored from the signature on subsequent turns, but anyone recording API traffic only captures the summaries, not the complete chain of thought. This second mechanism is even more narrowly scoped — analysis of the source indicates it is restricted to Anthropic-internal users.

How effective are these defenses? Probably not very, against a determined attacker. Security researcher Alex Kim noted that the fake tools injection requires four conditions to be true simultaneously, and a proxy that strips the relevant field from request bodies before they reach the API would bypass it entirely. An environment variable (CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS) can also disable the entire system. The real protection against distillation is almost certainly legal and operational, not technical.

From a digital sovereignty perspective, the concern is not that Anthropic is defending against distillation — that is reasonable. The concern is that the system prompt your tool sends to the API may contain fabricated tool definitions that you did not put there and cannot see. If you are auditing your own API traffic for security or compliance purposes, decoy tools injected server-side make that audit less trustworthy. You cannot distinguish between real tool definitions and defensive fakes without access to the server-side logic, which is not open source.

Undercover Mode: Hiding AI Contributions in Open Source

The file undercover.ts, approximately 90 lines, implements a mode that strips all traces of AI involvement when Claude Code is used to contribute to open-source repositories. The system prompt injected during Undercover Mode instructs the AI to never include the phrase "Claude Code," never mention internal model codenames (like "Capybara" or "Tengu"), never add "Co-Authored-By" attribution lines, and never reference internal Slack channels, project names, or tooling.

The system is a one-way door. You can force it on with an environment variable, but there is no way to force it off. In external builds, the entire function is dead-code-eliminated — it compiles to trivial returns, invisible to anyone examining the distributed package.

Hiding internal codenames is a reasonable operational security measure. But the instructions go further than that. The AI is explicitly told not to indicate that it is an AI. Commits and pull requests authored by Claude Code, operated by Anthropic employees working on public repositories, will contain no indication of AI involvement. As multiple Hacker News commenters pointed out, this raises transparency concerns for the open-source ecosystem. Contributors and maintainers reviewing code have a legitimate interest in knowing whether a submission was human-authored or AI-generated, particularly when evaluating code quality, assessing potential for hallucinated logic, or making decisions about project direction.

This is not a hypothetical. Anthropic employees actively use Claude Code in their work, including contributions to external projects. Undercover Mode ensures those contributions appear indistinguishable from human work.

KAIROS and Self-Healing Memory: How Claude Code Manages Long Sessions

Not everything in the leaked code is concerning. Some of it is genuinely impressive engineering — the kind of architecture that developers building local AI tools can learn from and replicate.

The MEMORY.md Architecture

One of the most significant architectural revelations is how Claude Code solves what developers have been calling "context entropy" — the tendency for AI agents to become confused, hallucinate, or lose track of project state during long-running sessions.

The solution is a three-layer memory system built around a file called MEMORY.md. This file is not a data store. It is a lightweight index of pointers, with each line kept to approximately 150 characters, that is permanently loaded into the AI's context window. Actual project knowledge is distributed across separate topic-specific files that are fetched on demand. Raw conversation transcripts are never fully reloaded into context — instead, the system searches them for specific identifiers when needed.

The system enforces what the codebase calls Strict Write Discipline: the AI is only allowed to update its memory index after a successful file write operation. This prevents the model from contaminating its own context with information about failed attempts, speculative changes, or operations that were rolled back.

Most critically, the architecture treats its own memory as unreliable. The AI is instructed to treat MEMORY.md entries as hints, not facts, and to verify information against the actual codebase before acting on it. This "skeptical memory" approach — trust but verify — is why Claude Code maintains coherence across sessions that other AI coding tools struggle with.

The memory consolidation cycle follows a four-phase pattern:

Orient — Read MEMORY.md and scan existing memory files
Gather — Check logs and identify outdated or contradictory memories
Consolidate — Merge observations, update facts, resolve conflicts
Prune — Keep MEMORY.md under 200 lines and 25KB

This is a pattern any developer building AI tooling can implement. Memory needs regular consolidation, not just accumulation. The index-and-verify approach works with any model backend — it is not specific to Claude.

KAIROS: The Always-On Background Agent

The most discussed unreleased feature in the leak is KAIROS, named after the ancient Greek concept of "the opportune moment." KAIROS appears over 150 times in the source code and represents a fundamental shift in how AI coding tools could work.

Where current AI coding tools are reactive — they respond when you prompt them — KAIROS is designed to be proactive. It is an autonomous daemon mode that allows Claude Code to continue working in the background while the user is idle. The code includes scaffolding for background daemon workers, cron-scheduled refreshes every five minutes, GitHub webhook subscriptions, daily append-only observation logs, and a process called autoDream.

The autoDream process performs memory consolidation while the user is away: merging observations from different sessions, removing logical contradictions, and converting vague notes into verified facts. When the user returns, the agent's context has been cleaned up and prepared — without the user doing anything.

KAIROS is fully built but gated behind a compile-time flag. It is not present in any external release. Anthropic has not publicly acknowledged it.

From a security perspective, KAIROS is worth watching carefully. An always-on background agent with filesystem access, terminal execution privileges, and the ability to take proactive actions without user initiation represents a significant expansion of the trust surface. If KAIROS ships in its current form, users will need to decide whether the productivity benefits justify giving a cloud-connected AI tool persistent autonomous access to their development environment.

Lessons for Local-First AI Builders

The engineering in Claude Code is sophisticated. But every one of these capabilities — memory consolidation, context management, multi-agent coordination — can be replicated using open-weight models running on your own hardware, with no telemetry, no vendor lock-in, and no features you cannot inspect.

Capability	Cloud-Dependent (Claude Code)	Local-First (Ollama + Open Tools)
Telemetry	Frustration tracking, session analytics, hourly settings polling	Zero telemetry by default
Anti-Distillation	Fake tools injected into system prompt without user visibility	No prompt manipulation — you control the full system prompt
Memory Architecture	MEMORY.md with Strict Write Discipline (proprietary implementation)	Same pattern implementable with any model + filesystem access
Background Agents	KAIROS daemon (unreleased, cloud-connected)	Cron jobs + local model API — fully inspectable and controllable
Cost	Subscription or per-token API pricing	Hardware cost only after initial purchase
API Lock-in	Attestation hash ties tool to Anthropic's API	No lock-in — switch models or providers at any time
Code Visibility	Proprietary (exposed only through accidental leak)	Fully open source and auditable

The MEMORY.md pattern in particular is something any developer can implement today. Create a markdown file that serves as a lightweight index (not a data dump). Keep each entry under 150 characters. Store detailed knowledge in separate topic files. Force your agent to verify facts against the actual project state before acting. This is the core of Claude Code's context management advantage, and it works with Llama, DeepSeek, Qwen, or any other model you can run locally through Ollama.

If you are considering building a local AI development environment, our guide to the best hardware for running local AI models covers GPU, mini PC, and Apple Silicon options across every budget tier — from free CPU-only setups to dedicated inference machines.

Network Defense: Securing Your Dev Environment

Whether you continue using Claude Code or migrate to local tools, the network-level protections are the same. These recommendations apply to any cloud-connected AI tool with filesystem and terminal access.

DNS-level monitoring with Pi-hole. If Claude Code (or any AI agent) is making network requests you did not authorize, a Pi-hole instance on your network will log every DNS query. You can see exactly which domains your development tools are contacting, block telemetry endpoints, and catch unexpected outbound connections in real time. This is passive monitoring that costs nothing beyond a Raspberry Pi and catches behavior that would otherwise be invisible.

VLAN isolation for development machines. Place your AI-assisted development environment on an isolated network segment. If you are running open-source router firmware like OpenWrt or pfSense, you can create VLANs that prevent your development machine from reaching your personal devices, NAS drives, or other sensitive systems on the same network. If an AI tool is compromised — or simply behaves in ways you did not expect — the blast radius is contained.

VPN for ISP-level privacy. A VPN does not prevent telemetry from a tool running locally on your machine — Claude Code's frustration regex and session analytics operate at the application layer, not the network layer. But a VPN does prevent your ISP from correlating your development activity, API usage patterns, and browsing profile. If you are doing work that benefits from compartmentalization, routing your development traffic through Proton VPN or Mullvad VPN adds a layer of separation between your identity and your activity. Neither provider logs traffic, and Mullvad does not even require an email address to create an account.

Anthropic's native installer does not solve the telemetry problem. After the axios supply chain attack, Anthropic recommended using their native installer instead of npm to avoid dependency-chain risks. This is sound advice for supply chain security — the native installer bypasses npm entirely. But the frustration tracking, session analytics, hourly settings polling, and remote feature flag system are all built into Claude Code itself, not into its npm dependencies. Changing the installation method does not change what the tool does once it is running.

Missed Part 1? The first half of this series covers the initial source code leak, the axios supply chain attack that hit the same day, the remote killswitch and feature flag system baked into Claude Code, and the steps you should take right now to secure your development environment. Read Part 1 here.

Frequently Asked Questions

What did Anthropic officially say about the Claude Code leak?

Anthropic confirmed the leak in statements to multiple outlets, calling it "a release packaging issue caused by human error, not a security breach." The company said no customer data or credentials were exposed and that they are implementing measures to prevent future incidents. They did not comment on the specific features, telemetry mechanisms, or architectural details revealed in the leaked code.

What is anti-distillation, and should regular users be concerned?

Anti-distillation is a defense mechanism designed to prevent competitors from copying Claude's capabilities by recording API traffic and using it to train rival models. The system injects fake tool definitions into the system prompt that would corrupt any training dataset built from intercepted requests. It is aimed at industrial-scale attackers, not individual users. However, the mechanism means that the system prompt Claude Code sends on your behalf may contain fabricated elements you cannot see or audit, which matters if you are monitoring your own API traffic for security or compliance purposes.

What is context entropy and how does Claude Code's memory system address it?

Context entropy is the tendency for AI agents to become confused, lose track of project state, or hallucinate during long sessions as the context window fills up with outdated or contradictory information. Claude Code addresses this with a three-layer memory system centered on MEMORY.md — a lightweight index of pointers that stays permanently in context. Actual knowledge is stored in separate files and fetched on demand. The AI is required to verify its own memories against the real codebase before acting, and can only update its index after successful file operations. This prevents the accumulation of stale or incorrect information that degrades performance over time.

What is KAIROS and is it active in current Claude Code releases?

KAIROS is an unreleased autonomous agent feature that allows Claude Code to run as a persistent background daemon — monitoring your project, consolidating memory, resolving contradictions, and potentially taking proactive actions while you are idle. It includes background workers, cron scheduling, GitHub webhook integration, and a "dreaming" system for memory consolidation. KAIROS is fully built but gated behind a compile-time flag and is not present in any external release. Anthropic has not publicly acknowledged it.

Can I replicate Claude Code's memory architecture with local AI models?

Yes. The MEMORY.md pattern — a lightweight index file loaded into context, with detailed knowledge stored in separate topic files and a verify-before-acting discipline — is model-agnostic. You can implement the same architecture using Ollama with any open-weight model (Llama, DeepSeek, Qwen) and basic filesystem scripting. The four-phase consolidation cycle (orient, gather, consolidate, prune) works with any AI backend. Our hardware guide for local AI covers the infrastructure you need to get started.

Does Undercover Mode affect Claude Code users outside of Anthropic?

No — Undercover Mode activates when Anthropic employees use Claude Code to contribute to public or open-source repositories. In external builds distributed to regular users, the entire Undercover Mode function is dead-code-eliminated during compilation. The concern is not about direct impact on external users, but about transparency in the open-source ecosystem: contributions made by AI under Undercover Mode carry no indication of AI involvement, which affects how maintainers and reviewers evaluate submitted code.

What should I do if I use Claude Code for sensitive development work?

Monitor your network traffic using Pi-hole or a similar DNS-level tool to see what Claude Code contacts. Isolate your development machine on a dedicated VLAN if your router supports it. Consider local models for tasks that do not require frontier-level capabilities — they handle 70 to 80 percent of daily coding work with zero cloud dependency. For tasks that do require Claude, understand that the tool includes telemetry, remote configuration polling, and feature flags that can change its behavior without a user-initiated update. Make your trust decisions with that information, not without it.

ModemGuides Support USA-Based Modem & Router Technical Support Expert

Our entirely USA-based team of technicians each have over a decade of experience in assisting with installing modems and routers. We are so excited that you chose us to help you stop paying equipment rental fees to the mega-corporations that supply us with internet service.

Updated on Mar 31, 2026

Claude Code Leak: Anti-Distillation, Undercover Mode, and Self-Healing Memory (Part 2)

Key Takeaways

The Fallout: "Human Error" and Vanishing Repositories

Frustration Tracking, Fake Tools, and Undercover Mode

The Frustration Regex

Anti-Distillation: Decoy Tools in the System Prompt

Undercover Mode: Hiding AI Contributions in Open Source

KAIROS and Self-Healing Memory: How Claude Code Manages Long Sessions

The MEMORY.md Architecture

KAIROS: The Always-On Background Agent

Lessons for Local-First AI Builders

Network Defense: Securing Your Dev Environment

Frequently Asked Questions

What did Anthropic officially say about the Claude Code leak?

What is anti-distillation, and should regular users be concerned?

What is context entropy and how does Claude Code's memory system address it?

What is KAIROS and is it active in current Claude Code releases?

Can I replicate Claude Code's memory architecture with local AI models?

Does Undercover Mode affect Claude Code users outside of Anthropic?

What should I do if I use Claude Code for sensitive development work?

Leave a comment

Latest Articles

Stop Smart TV Spying: Disable ACR Tracking Guide

Best Wireless Routers for Spectrum by Speed Tier (2026)

Best Modems for Spectrum Internet by Speed Tier (2026)

Best DNS Servers for Speed and Privacy in 2026 (How to Change Your DNS)

How to Read and Improve Your Internet Speed Test Results (2026)

How to Delete Your Digital Footprint (2026 Step-by-Step)

Quick links

Cart 0