
Quick Summary:
- Local & Open-source: Clawdbot runs locally, giving users full data ownership (unlike cloud-based solutions like ChatGPT).
- File-based Memory: Memory consists of pure Markdown files, making it easy to edit and control.
- Hybrid Search: Combines semantic search (Vector) and keyword search (BM25) for precise information retrieval.
- Memory Flush: An automated mechanism that saves critical information to disk before context compaction to prevent data loss.
How Does Clawdbot Retain Information?
Clawdbot utilizes a two-layer, locally stored file-based system. Instead of relying solely on the limited context window of the language model, Clawdbot records events, decisions, and critical information into Markdown files (MEMORY.md and memory/YYYY-MM-DD.md). When retrieval is needed, it employs a Hybrid Search engine that combines Vector Search (for semantic similarity) and BM25 (for exact keywords) via a lightweight SQLite database, enabling accurate and effectively infinite memory.
Distinguishing Context and Memory
To understand Clawdbot, it is essential to distinguish between two concepts that most other AIs merge into one:
- Context:
- Nature: Temporary; exists in the model’s RAM for each specific Request.
- Limit: Bound by the token window (e.g., 200k tokens).
- Components: System Prompt + Current Chat History + Tool Results.
- Memory:
- Nature: Permanent; stored on the hard drive (Disk).
- Limit: Unbounded (Unlimited).
- Components:
MEMORY.mdfile,memory/*.mddirectory, and Session Transcripts.
Data Storage Structure
Clawdbot organizes memory within the Agent’s working directory (e.g., ~/clawd/) using a tiered structure:
- Layer 1: Daily Logs (
memory/YYYY-MM-DD.md)- “Append-only” recording format.
- Stores daily notes and real-time conversations.
- Layer 2: Long-term Memory (
MEMORY.md)- Stores distilled knowledge, key events, user preferences, and major decisions.
- Context Files (Configuration):
AGENTS.md: Instructions for the Agent.SOUL.md: Personality and tone.USER.md: Information about the user.
Data Loss Prevention: Memory Flush & Compaction
A common weakness of LLMs is that when the context becomes too long, they must compress it (Compaction), leading to information loss. Clawdbot solves this with a 3-step process:
- Detection: When the context reaches approximately 75% of its limit (soft threshold).
- Memory Flush: The system triggers a background “Silent Turn.” The Agent reviews the conversation, extracts key information, and immediately writes it to the Markdown files on the disk.
- Compaction: Only after the information is safely secured on the disk does the system summarize the old chat history to free up tokens.
Technical Feature Summary
| Feature | Detailed Description | Benefit |
| Search Technology | SQLite with sqlite-vec (Vector) + FTS5 (Full-text). | No expensive Vector DB required; runs lightly on local hardware. |
| Search Algorithm | Hybrid Score = (0.7 * Vector) + (0.3 * Text). | Retrieves both semantic meaning (concepts) and exact keywords (IDs, proper names). |
| Pruning | Automatically trims long logs (e.g., npm install outputs) from the context. | Saves tokens and API costs without losing the conversation flow. |
| Cache-TTL | Clears old tool results when the API cache (Anthropic) expires (5 mins). | Optimizes costs when re-caching the prompt. |
| Multi-Agent | Each Agent has its own workspace directory and separate SQLite index. | Complete memory isolation between “Work” and “Personal” agents. |
Conclusion: Clawdbot is not just a local AI Assistant; it is a testament to returning data control to the user. With its transparent approach (Markdown) and durable memory mechanism, it radically solves the “goldfish memory” problem often found in current cloud AI models.








