Agent Memory: Filesystem vs Database

I’m digesting the current “filesystem vs database” debate for agent memory. Currently I’m seeing 2 camps in how we build agent memory:

On the one side, we have the “file interfaces are all you need” camp.
n the other side, we have the “filesystems are just bad databases” camp.

“File interfaces are all you need” camp

Leaders like Anthropic, Letta, Langchain & LlamaIndex are leaning towards file interfaces because “files are surprisingly effective as agent memory”.

Anthropic’s memory tool treats memory as a set of files (the storage implementation is left up to the developer)
Langsmith’s agent builder also represents memory in as a set of files (the data is stored in a database and files are exposed to the agent as a filesystem)
Letta found that simple filesystem tools like grep and ls outperformed specialized memory or retrieval tools in their benchmarks -LlamaIndex argues that for many use cases a well-organized filesystem with semantic search might be all you need

Agents are good at using filesystems because models are optimized for coding tasks (including CLI operations) duringpost-training.

That’s why we’re seeing a “virtual filesystem” pattern where the agent interface and the storage implementation are decoupled.

“Filesystems are just bad databases” camp

But then you have voices like Dax from OpenCode who rightly points out that “a filesystem is just the worst kind of database”.

swyx and colleagues in the database space warn about accidentally reinventing databases by solving the agent memory problem. Avoid writing worse versions of:

search indexes,
transaction logs,
locking mechanisms,

Trade-offs

It’s important to match the complexity of your system to the complexity of your problem.

Simplicity vs scale

Files are simple and CLI tools can even outperform specialized retrieval tools.

But these CLI tools don’t scale well & can become a bottleneck.

Querying and aggregations

grep can be effective and a hard baseline to beat. And if you want to improve retrieval performance with hybrid or semantic search?

Luckily, there are CLI tools available for semantic search (e.g., semtools or mgrep).

The question remains: How well they scale and how effective agents are at using them when they are not as common in the training data.

Also at some point you might want some aggregations as well.

Plain text vs complex data

File interfaces and native CLI tools are great for plain-text files. What happens when memory becomes multimodal?

Concurrency

If you have a single agent accessing one memory file sequentially, no need to think about this.

If you have a multi-agent system, you want a database before implementing buggy lock mechanisms.

We’re just scratching the surface: security concerns, permission management, schema validation, etc. are more arguments for databases over filesystems for agent memory use cases.

I think this is an interesting conversation and I’mm curious to see where it goes.

Originally posted on X/LinkedIn.