<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Leonie Monigatti</title>
<link>https://www.leoniemonigatti.com/blog.html</link>
<atom:link href="https://www.leoniemonigatti.com/blog.xml" rel="self" type="application/rss+xml"/>
<description>Leonie Monigatti&#39;s portfolio and blog about Machine Learning and AI Engineering.</description>
<generator>quarto-1.9.37</generator>
<lastBuildDate>Sun, 10 May 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>How I write the first draft with AI</title>
  <link>https://www.leoniemonigatti.com/blog/essays/how-i-write-with-ai.html</link>
  <description><![CDATA[ 





<p>Technical writing has been part of my job description for the past 3 years. And ever since ChatGPT launched, I’ve been experimenting with different workflows and <a href="who-wrote-this.qmd">debating about how much to even use AI in my writing process</a>.</p>
<p>This week I overheard someone say <span class="text-highlight">„If your first draft isn’t written with AI, you’re doing it wrong“. Sorry, but I strongly disagree.</span> And when I learned this week that <a href="https://x.com/trq212">Thariq</a> from the Claude Code team who has been writing great technical pieces shared that <a href="https://x.com/MilksandMatcha/status/2052812382137971115?s=20">his “first draft is barely ever written with AI“</a>, I felt reassured to share my own process.</p>
<p><span class="text-highlight">If you expected this to be about my secret prompt template and which skills I run to de-slopify a piece of writing, I’m sorry to disappoint you.</span></p>
<p>My writing usually starts in Notion, a Google doc, or plain markdown (depending on my mood). Unless it’s a tutorial where the structure I want to follow is clear, it starts out as a huge bullet point list of notes. Straight from brain to doc.</p>
<p>As I research more or have more thoughts on the topic I add more bullets and start clustering them and organizing them under separate ideas. During the outlining process I move the pieces around and merge similar ideas into one. Usually I like to be in Notion for this step because you can just grab and drag and drop single bullet points or whole sections. Once I’m done with that, it looks like this:</p>
<pre><code>- Intro
   - Bullet point 1 ...
   - Bullet point 2 ...
- Outline sentence 1 ...
   - Bullet point 1 ...
   - Bullet point 2 ...
- Outline sentence 2 ...
   - Bullet point 1 ...
   - Bullet point 2 ...
- ...</code></pre>
<p>Then I give Claude (my current favorite for this task) the <a href="https://www.google.com/url?sa=t&amp;source=web&amp;rct=j&amp;opi=89978449&amp;url=https://jordanbpeterson.com/wp-content/uploads/2018/02/Essay_Writing_Guide.docx&amp;ved=2ahUKEwjdjN-w066UAxVQSPEDHSq1GFMQFnoECBkQAQ&amp;usg=AOvVaw3KyLd4DMN7itJ47rOkO__z">Jordan Peterson’s Essay Writing Guide</a> and my outline and ask it for its feedback on the outline. This is to refine my core thesis. Is this interesting? Does that narrative flow make sense? In the past, I’ve thrown out the original outline and rewritten the whole thing based on Claude’s feedback.</p>
<p>Once I am happy with the rough structure, I start writing the fist draft. For this I copy the bullet points into a Google doc under a tab „Outline“ and the I duplicate it under a tab „Draft 1“ (depending on the complexity I may write up to 8 or 9 drafts).</p>
<p>Then I start the actual drafting process. During this process the bullet points get converted into actual sentences and paragraphs. I don’t write sequentially. During this process I do many passes over the document: Which ever sentence feels the easiest, gets written first. Manually.</p>
<p>I also create a tab called „Garbage collection“. This is where all bullet points or even whole sections go, when I know they need to be removed but it would make the decision harder for me if I just deleted it. This way, I can just go back when I realize I still need it. Also I like to do a pass over the “Garbage collection” tab at the end to see if I can repurpose any ideas as standalone tweets (If they can’t, they probably weren’t good thoughts to begin with).</p>
<p>The drafting process can be a lot of work. It is only when I reach a point where I start to struggle to find the right words, that I start using Claude to convert single bullets into sentences. And later I might even ask Claude to „convert these bullet points into a coherent paragraph. Ignore the order of the bullet points. Feel free to reorder them to make sure they logically flow well“.</p>
<p>As you can see, <span class="text-highlight">my drafting process might be shockingly manual</span> for some. But I believe that the core ideas, structure, and narrative flow overall need to come from the human. During my editing process, I use AI more to help me refine sentences, grammar, and reading flow, which I can share in a different blog.</p>
<!--
<span class="text-highlight">
-->



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/essays/how-i-write-with-ai.html</guid>
  <pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Implementing a virtual filesystem over Elasticsearch</title>
  <link>https://www.leoniemonigatti.com/blog/virtual-filesystem-elasticsearch.html</link>
  <description><![CDATA[ 





<p>LLMs have been trained on vast amounts of shell sessions and codebases. That’s why agents are naturally good at using CLIs and navigating filesystems.</p>
<p>A recent <a href="https://x.com/hwchase17/status/2011814697889316930">blog post describing how LangSmith Agent Builder’s memory system is built on a filesystem</a> kicked off a discussion about whether “filesystems are all you need”. Harrison Chase, LangChain’s CEO and the blog’s author, later clarified that they actually don’t store the data in a real filesystem but in a database and only expose it to the agent in the shape of a filesystem. A few weeks later, <a href="https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant">Mintlify described how they built a virtual filesystem on top of their existing database</a> to enable an agent to run <code>cat</code>, <code>ls</code>, <code>grep</code>, and <code>find</code> via the <a href="https://github.com/vercel-labs/just-bash"><code>just-bash</code> library</a>.</p>
<p>Inspired by these blogs, I explore how you can <strong>implement a virtual filesystem over Elasticsearch</strong>. This is intended as a <strong>proof-of-concept</strong> implementation and aims to stay as close as possible to the architecture described in the Mintlify blog. You can find the resulting implementation of <code>ElasticsearchFs</code> at <a href="https://github.com/iamleonie/elasticsearch-fs">iamleonie/elasticsearch-fs</a>.</p>
<section id="what-is-a-virtual-filesystem-for-agents" class="level2">
<h2 class="anchored" data-anchor-id="what-is-a-virtual-filesystem-for-agents">What is a “virtual filesystem” for agents</h2>
<p>The term “virtual filesystem” is traditionally used in the context of operating systems. There, a <a href="https://en.wikipedia.org/wiki/Virtual_file_system">virtual filesystem</a> is the layer that lets every program use the same <code>open</code>, <code>read</code>, and <code>write</code> calls whether it’s reading from an SSD, a USB stick, or a network share.</p>
<p>In the context of AI agents, a virtual filesystem describes a filesystem-shaped interface on top of persistent storage, such as a relational database, a vector store, or, as in this case, Elasticsearch. This lets the agent use <code>ls</code>, <code>cat</code>, <code>find</code>, and <code>grep</code> over the stored data. The agent running, for example, <code>grep -r "access_token" /docs</code> is searching a filesystem and doesn’t know it’s interacting with a search index. Thus, commands like <code>grep</code> become an interface, the implementation of which can make use of powerful search features, such as vector search or hybrid search.<br>
<img src="https://www.leoniemonigatti.com/blog/images/virtual-filesysten-os-vs-agents.png" title="Virtual filesystem Comparison: Operating Systems vs. AI agents" class="img-fluid" alt="“Virtual filesystem Comparison: Operating Systems vs.&nbsp;AI agents”"></p>
</section>
<section id="architecture-overview" class="level2">
<h2 class="anchored" data-anchor-id="architecture-overview">Architecture Overview</h2>
<p>The resulting implementation of the <code>ElasticsearchFs</code> virtual filesystem has four layers, similar to the Mintlify blog:</p>
<ol type="1">
<li><strong>Agent layer:</strong> The LLM agent runs shell commands via tool calls.<br>
</li>
<li><strong>Shell layer:</strong> <a href="https://github.com/vercel-labs/just-bash"><code>just-bash</code></a> is a TypeScript library that intercepts those shell command strings, parses flags, handles pipes, and dispatches to a <code>IFileSystem</code> interface.<br>
</li>
<li><strong><code>ElasticsearchFs</code> layer:</strong> Implements the <code>IFileSystem</code> interface against the underlying data layer.</li>
<li><strong>Data layer:</strong> Data is stored in an Elasticsearch Serverless cluster.</li>
</ol>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.leoniemonigatti.com/blog/images/virtual-filesystem-database-architecture.png" title="Virtual filesystem for AI agents over a Database (Elasticsearch)" class="img-fluid figure-img"></p>
<figcaption>“Virtual filesystem for AI agents over a Database (Elasticsearch)”</figcaption>
</figure>
</div>
<p>In addition to the implementation described in the Mintlify blog, we also consider the following <a href="https://docs.langchain.com/oss/python/deepagents/backends#use-a-virtual-filesystem">design guidelines for virtual filesystems described in the LangChain documentation</a>:</p>
<ul>
<li><strong>Absolute path handling and normalization.</strong> Paths are always absolute. <code>normalizePath</code> ensures any path the agent passes is canonicalized before any tree lookup or Elasticsearch call (e.g., <code>auth/oauth</code> becomes <code>/auth/oauth</code>).<br>
</li>
<li><strong>Implement <code>ls</code> and <code>glob</code> efficiently (server-side filtering where available, otherwise local filter).</strong> Both operations run in-memory after session boot without any Elasticsearch calls.</li>
<li><strong>Handle errors explicitly:</strong> LangChain’s guideline recommends structured result types with an error field for missing files or invalid patterns. Since our implementation builds on top of <code>just-bash</code>, we use POSIX-style <code>ErrnoException</code> errors (<code>ENOENT</code>, <code>ENOTDIR</code>, <code>EROFS</code>). Additionally, we follow Mintlify’s <strong>read-only</strong> design and thus every write operation throws <code>EROFS</code>.</li>
</ul>
</section>
<section id="implementation-details" class="level2">
<h2 class="anchored" data-anchor-id="implementation-details">Implementation details</h2>
<p>The implementation splits into four areas: access control, filesystem navigation, file reading, and file search. You can find the full implementation in the <a href="https://github.com/iamleonie/elasticsearch-fs"><code>elasticsearch-fs</code> repository</a>.</p>
<section id="access-control-via-dls" class="level3">
<h3 class="anchored" data-anchor-id="access-control-via-dls">Access control via DLS</h3>
<p><code>ElasticsearchFs</code> delegates file access control to Elasticsearch <a href="https://www.elastic.co/docs/deploy-manage/users-roles/cluster-or-deployment-auth/controlling-access-at-document-field-level#document-level-security">Document Level Security</a> (DLS): Each query gets attached to an Elasticsearch role, and every request made under that role is automatically filtered by that query at search time. This reduces the chance of accidental data leaks because access checks are enforced in the data layer for every query path.</p>
<p>For this POC implementation, we follow the example path tree policy shape from the Mintlify blog with three roles: <code>PUBLIC</code>, <code>BILLING</code>, and <code>INTERNAL</code>. Additionally, we have a fourth role <code>SYSTEM</code> for ingestion.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: left;">Path</th>
<th style="text-align: left;">PUBLIC</th>
<th style="text-align: left;">INTERNAL</th>
<th style="text-align: left;">BILLING</th>
<th style="text-align: left;">SYSTEM</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">/auth/oauth.mdx</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="even">
<td style="text-align: left;">/auth/api-keys.mdx</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="odd">
<td style="text-align: left;">/internal/billing.mdx</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="even">
<td style="text-align: left;">/internal/audit-log.mdx</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="odd">
<td style="text-align: left;">/api-reference/users.mdx</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="even">
<td style="text-align: left;">/api-reference/payments.mdx</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
<tr class="odd">
<td style="text-align: left;">/api-reference/search-use-case/*</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
<td style="text-align: left;">x</td>
</tr>
</tbody>
</table>
<p>The roles are created and attached to an API key in the Serverless project by setting privileges as follows:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"PUBLIC"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"cluster"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-4">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"indices"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb1-5">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-6">        <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"names"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"elasticsearchfs-chunks"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-7">        <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"privileges"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"read"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-8">        <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"query"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-9">          <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"bool"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-10">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"should"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb1-11">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"prefix"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"auth/"</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-12">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"prefix"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"api-reference/search-use-case/"</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-13">              <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"term"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"api-reference/users"</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-14">            <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-15">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"minimum_should_match"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb1-16">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-17">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-18">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-19">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-20">        <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"names"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"elasticsearchfs-meta"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-21">        <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"privileges"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"read"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb1-22">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-23">    <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb1-24">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-25"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
</section>
<section id="ls-cd-find-navigating-the-filesystem" class="level3">
<h3 class="anchored" data-anchor-id="ls-cd-find-navigating-the-filesystem"><code>ls</code>, <code>cd</code>, <code>find</code>: Navigating the filesystem</h3>
<p><code>ElasticsearchFs</code> follows a similar startup shape as the Mintlify blog. The ingestion step writes a <code>__path_tree__</code> document to a metadata index (<code>elasticsearchfs-meta</code>), and each session loads that one document first.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode ts code-with-copy"><code class="sourceCode typescript"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> doc <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> client<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">get</span>({</span>
<span id="cb2-2">  index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'elasticsearchfs-meta'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-3">  id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'__path_tree__'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-4">})<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-5"></span>
<span id="cb2-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> json <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Buffer</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">from</span>(payload<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'base64'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">toString</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'utf8'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb2-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> pathTree <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">JSON</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse</span>(json)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div></div>
<p>Visibility is then resolved by pruning that path tree with the selected runtime profile (<code>PUBLIC</code>, <code>BILLING</code>, <code>INTERNAL</code>, <code>SYSTEM</code>) against <code>isPublic</code> and <code>groups</code>. That means a <code>BILLING</code> session never lists <code>/internal/audit-log.mdx</code>, and probing that path returns <code>ENOENT</code> because the path doesn’t exist in the profile-scoped tree.</p>
<p>The visible slugs are transformed into two compact structures:</p>
<ul>
<li><code>Set&lt;string&gt;</code> of canonical file paths</li>
<li><code>Map&lt;string, string[]&gt;</code> from directory path to child names.</li>
</ul>
<p>After session initialization, <code>ls</code>, <code>cd</code>, and <code>find</code> are all tree-walking operations over the same in-memory state and do not need to query Elasticsearch. <code>ls</code> reads direct children from <code>dirs</code>, <code>cd</code> validates whether a target directory exists in that map, and <code>find</code> traverses the <code>dirs</code> graph recursively while checking <code>files</code> for leaf paths.</p>
</section>
<section id="cat-reading-files" class="level3">
<h3 class="anchored" data-anchor-id="cat-reading-files"><code>cat</code>: Reading files</h3>
<p>Reading files is a single Elasticsearch call. When the agent runs <code>cat /auth/oauth.mdx</code>, <code>just-bash</code> calls <code>readFile</code>, which resolves the path to the slug <code>auth/oauth</code> and queries Elasticsearch for that slug:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode ts code-with-copy"><code class="sourceCode typescript"><span id="cb3-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">async</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readFile</span>(path<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">string</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Promise</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">string</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> {</span>
<span id="cb3-2">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> slug <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">this</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">resolveReadFileSlug</span>(path)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-3"></span>
<span id="cb3-4">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> results <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">this</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">client</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">search</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>FileHitSource<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>({</span>
<span id="cb3-5">    index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> ELASTICSEARCHFS_FILES_INDEX<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-6">    size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-7">    _source<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'content'</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-8">    query<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> {</span>
<span id="cb3-9">      bool<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> {</span>
<span id="cb3-10">        filter<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> [{ term<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> { slug } }]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-11">      }<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-12">    }<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-13">  })<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-14"></span>
<span id="cb3-15">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> hit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> results<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hits</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hits</span>[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-16">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> content <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> hit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">?.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">_source</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">?.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">content</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-17">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (content <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">===</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">undefined</span>) {</span>
<span id="cb3-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">throw</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enoent</span>()<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-19">  }</span>
<span id="cb3-20"></span>
<span id="cb3-21">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> content<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-22">}</span></code></pre></div></div>
<p>Additionally, <code>resolveReadFileSlug</code> rejects directory paths with <code>ENOTDIR</code> and unknown paths with <code>ENOENT</code> before any network call.</p>
</section>
<section id="grep-the-two-stage-optimization" class="level3">
<h3 class="anchored" data-anchor-id="grep-the-two-stage-optimization"><code>grep</code>: The two-stage optimization</h3>
<p>Similar to Mintlify, we implement a two-stage optimization for the <code>grep</code> implementation because a naive <code>grep -r</code> would read every in-scope file over the network. However, <code>just-bash</code>’s <code>defineCommand</code> hook lets us register a custom <code>grep</code>, which allows us to implement a two-stage <code>grep</code> optimization. The custom <code>grep</code> receives raw argv tokens, which we parse with <code>yargs-parser</code>.</p>
<p>The <strong>first stage (coarse filtering)</strong> narrows the candidate set first by running a search query over the database. Depending on the pattern shape, it chooses a query type for literal patterns or regex patterns.</p>
<p>Literal patterns (<code>-F</code> / <code>--fixed-strings</code>, or no regex metacharacters) are split into two coarse-query shapes. First, ignore-case literals (<code>-i</code>) use <code>match_phrase</code> on <code>content</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb4-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"query"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb4-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"bool"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb4-4">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"filter"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"terms"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;slugs-in-scope&gt;"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb4-5">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"must"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"match_phrase"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"content"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;literal-pattern&gt;"</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb4-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>On the other hand, case-sensitive literals use <code>regexp</code> on <code>content.pattern</code> with an escaped literal:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"query"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"bool"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-4">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"filter"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"terms"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;slugs-in-scope&gt;"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-5">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"must"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb5-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-7">          <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"regexp"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-8">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"content.pattern"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-9">              <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"value"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".*(&lt;escaped-literal-pattern&gt;).*"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-10">              <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"case_insensitive"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">false</span></span>
<span id="cb5-11">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-12">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-13">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-14">      <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb5-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>Regex patterns use <code>regexp</code> on <code>content.pattern</code>, which is a <code>wildcard</code> multi-field. Note that running <a href="https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-regexp-query"><code>regexp</code> on the <code>content</code> field is possible as well but only allows term-level matching</a>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"query"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"bool"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-4">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"filter"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"terms"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"slug"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;slugs-in-scope&gt;"</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span><span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb6-5">      <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"must"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">[</span></span>
<span id="cb6-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-7">          <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"regexp"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-8">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"content.pattern"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-9">              <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"value"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".*(&lt;regex-pattern&gt;).*"</span></span>
<span id="cb6-10">            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-11">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-12">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-13">      <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb6-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb6-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>In the <strong>second stage (fine-filter)</strong>, each candidate slug is read via <code>readFile</code>. The content is split on line boundaries and each line is tested against a compiled <code>RegExp</code> (or a plain <code>includes</code> check for fixed-string patterns).</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode ts code-with-copy"><code class="sourceCode typescript"><span id="cb7-1">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Coarse Filter: Ask backing store for slugs matching the string/regex</span></span>
<span id="cb7-2">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> matchedSlugs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> elasticsearchFs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">findMatchingFiles</span>(</span>
<span id="cb7-3">      coarseFilter<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-4">      slugsUnderDirs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-5">    )<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb7-6"></span>
<span id="cb7-7">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (matchedSlugs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">length</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">===</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> { stdout<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> stderr<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">''</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> exitCode<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> }<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb7-8"></span>
<span id="cb7-9">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Fine Filter: Narrow to resolved hit paths.</span></span>
<span id="cb7-10">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> matchedPaths <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> matchedSlugs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>((slug) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slugToPath</span>(slug))<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb7-11"></span>
<span id="cb7-12">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Exec: Let the in-memory RegExp engine format the final output</span></span>
<span id="cb7-13">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">execBuiltin</span>(</span>
<span id="cb7-14">    scannedArgs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-15">    matchedPaths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-16">    elasticsearchFs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-17">    shouldPrefixFilePath<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb7-18">  )<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div></div>
</section>
</section>
<section id="known-limitations" class="level2">
<h2 class="anchored" data-anchor-id="known-limitations">Known Limitations</h2>
<p>This implementation is a POC and has the following known limitations:</p>
<ul>
<li><strong>Coarse grep can produce false negatives</strong> The coarse stage uses Elasticsearch/Lucene regexp syntax while the fine stage uses JavaScript <code>RegExp</code>. This mismatch can produce false negatives before validation.<br>
</li>
<li><strong>No bulk-prefetch.</strong> Mintlify’s grep coarse stage identifies candidate files in the data store, then bulk-fetches all matching chunks into a cache before the fine-filter pass, so large recursive queries complete in milliseconds. <code>ElasticsearchFs</code> reads each candidate file individually via <code>readFile</code>, with no in-process cache.<br>
</li>
<li><strong><code>stat</code> returns placeholder metadata.</strong> <code>size</code> is always <code>0</code> and <code>mtime</code> is always epoch — <code>stat</code> makes no Elasticsearch call. Session boot loads only path-tree metadata (and not per-file timestamps/sizes); although <code>updated_at</code> exists on documents, this implementation does not materialize it into the in-memory tree. The practical consequence: <code>ls -l</code>, <code>find -mtime</code>, and <code>find -size</code> are not truthful.<br>
</li>
<li><strong><code>.mdx</code>-only file type.</strong> The path tree is keyed on <code>/&lt;slug&gt;.mdx</code> exclusively. Files with any other extension cannot be represented in the tree and are invisible to the agent.</li>
</ul>
</section>
<section id="trying-it-out" class="level2">
<h2 class="anchored" data-anchor-id="trying-it-out">Trying it out</h2>
<p>You can use the virtual filesystem over Elasticsearch as follows:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode ts code-with-copy"><code class="sourceCode typescript"><span id="cb8-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> { ElasticsearchFs } <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../src/core/elasticsearchfs.js'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> { runElasticGrep } <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../src/core/grep.js'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> { Bash<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> defineCommand } <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'just-bash'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> { createESClient } <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../src/es-adapter/client.js'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> { initSessionTree } <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../src/session.js'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-6"></span>
<span id="cb8-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 1. Initialize ES client with the user profile</span></span>
<span id="cb8-8"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> profile <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PUBLIC"</span></span>
<span id="cb8-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> client <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">createESClient</span>(profile)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-10"></span>
<span id="cb8-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 2. Initialize the path tree </span></span>
<span id="cb8-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> session_tree <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">initSessionTree</span>(client<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> profile)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-13"></span>
<span id="cb8-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 3. Initialize the virtual filesystem</span></span>
<span id="cb8-15"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> elasticsearchFs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">new</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ElasticsearchFs</span>({</span>
<span id="cb8-16">  client<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> client<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-17">  files<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> session_tree<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">files</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-18">  dirs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> session_tree<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dirs</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-19">})<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-20"></span>
<span id="cb8-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 4. Define custom grep command</span></span>
<span id="cb8-22"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> grep <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">defineCommand</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'grep'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">async</span> (args<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ctx) <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">=&gt;</span></span>
<span id="cb8-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runElasticGrep</span>(args<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> elasticsearchFs)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-24">)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-25"></span>
<span id="cb8-26"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 5. Initialize Bash instance over Elasticsearch backend with custom grep command </span></span>
<span id="cb8-27"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">const</span> bash <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">new</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Bash</span>({</span>
<span id="cb8-28">  fs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> elasticsearchFs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-29">  cwd<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'/'</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-30">  customCommands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> [grep]<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb8-31">})<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-32"></span>
<span id="cb8-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 6. Usage examples</span></span>
<span id="cb8-34"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> bash<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exec</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'grep -ri "OAuth" /auth'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-35"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> bash<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exec</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'cat /auth/oauth.mdx'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-36"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">await</span> bash<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exec</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'ls /api-reference'</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span></code></pre></div></div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>This blog implements <code>ElasticsearchFs</code>, a virtual filesystem over Elasticsearch. It is built on top of <code>just-bash</code> and exposes <code>cat</code>, <code>ls</code>, <code>cd</code>, <code>find</code>, and <code>grep</code> for an agent while keeping access control close to the data. This POC has not been benchmarked. The goal is to explore the architecture and implementation, not to validate latency or cost claims.</p>
<p>The code is available in the <a href="https://github.com/iamleonie/elasticsearch-fs">repository</a>. For a more production-ready implementation, the next upgrades could be an in-process/remote cache for repeated file reads during grep-heavy sessions or a semantic search alternative to <code>grep</code>. Note that the latter would require additional chunking on the files and reassembly at query time.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Mintlify. <a href="https://www.mintlify.com/blog/how-we-built-a-virtual-filesystem-for-our-assistant"><em>How we built a virtual filesystem for our assistant</em></a><br>
</li>
<li>Vercel Labs. <a href="https://github.com/vercel-labs/just-bash"><code>just-bash</code> GitHub repository</a><br>
</li>
<li>LangChain. <a href="https://docs.langchain.com/oss/python/deepagents/backends#use-a-virtual-filesystem"><em>Design guidelines: Use a virtual filesystem</em></a></li>
</ul>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/virtual-filesystem-elasticsearch.html</guid>
  <pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Fine-tuning LFM2.5-1.2B-Instruct with GRPO</title>
  <link>https://www.leoniemonigatti.com/blog/fine-tuning-lfm2-5-1-2b-instruct-with-grpo.html</link>
  <description><![CDATA[ 





<p>In this notebook, we will explore the core concepts of <strong>GRPO (Group Relative Policy Optimization)</strong> by fine-tuning <a href="https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct">LFM2.5-1.2B-Instruct</a> using <a href="https://unsloth.ai">Unsloth</a>.</p>
<p><strong>GRPO</strong> is a reinforcement learning algorithm designed for training language models with reward signals instead of labeled examples. In contrast to supervised fine-tuning (SFT), where you tell the model the exact right answer, GRPO lets the model <em>explore</em> different outputs and reinforces the ones that score higher on the reward functions. That’s why GRPO is ideal for verifiable tasks where you can programmatically evaluate the correctness of model outputs, such as math problems, code generation, and structured data tasks.</p>
<p><strong>LFM2.5-1.2B-Instruct</strong> is a general-purpose instruction-tuned model. Since it is quite small, it is suitable for agentic tasks, data extraction, and RAG but less so for knowledge-intensive tasks and programming.</p>
<p>In this example, LFM2.5-1.2B-Instruct learns to extract structured invoice fields from noisy OCR text. The outputs are easy to verify programmatically for GRPO: we can reward the model for producing valid JSON, using the right schema, and recovering the correct values.</p>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">Prerequisites</h2>
<p>To run this tutorial as a notebook, you will need:</p>
<ul>
<li><strong>GPU</strong> for fine-tuning .If you don’t have one locally, you can run this notebook for free on <a href="https://colab.research.google.com/github/Liquid4All/cookbook/blob/main/finetuning/notebooks/grpo_with_unsloth.ipynb">Google Colab</a> using a free NVIDIA T4 GPU instance or on <a href="https://www.kaggle.com/">Kaggle</a></li>
<li>Optional: <code>HF_TOKEN</code> for faster downloads of the training dataset from Hugging Face.</li>
</ul>
<p><em>Note: This notebook as trained on a free Colab notebook using a NVIDIA T4 GPU. (Torch: 2.10.0+cu128, CUDA: 7.5., CUDA Toolkit: 12.8., Triton: 3.6.0)</em></p>
<div id="80db9d0f-70a0-4e6d-8ad1-6ef907b822f5" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:04:10.838635Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:04:10.838158Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:04:15.097116Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:04:15.096364Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:04:10.838590Z&quot;}}" data-trusted="true" data-execution_count="1">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> torch</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">assert</span> torch.cuda.is_available(), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No GPU detected. Make sure you've enabled GPU in Colab: Runtime &gt; Change runtime type &gt; T4 GPU"</span></span></code></pre></div></div>
</div>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>Install the required packages (Unsloth <code>v2026.4.8</code>, Transformers <code>v4.57.6</code>, vLLM <code>v0.19.1</code>, trl <code>v0.24.0</code>).</p>
<div id="cell-01-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:04:15.099016Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:04:15.098621Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:06:12.999480Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:06:12.998630Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:04:15.098991Z&quot;}}" data-trusted="true">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>pip install <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>qU unsloth trl vllm transformers datasets matplotlib</span></code></pre></div></div>
</div>
<p>Additionally, we will set <code>UNSLOTH_VLLM_STANDBY</code>, which enables a lower-VRAM standby mode and reduces GPU memory usage. We will also fix the random seed.</p>
<div id="i9dcEL6WiDJq" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:06:13.001190Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:06:13.000853Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:06:13.005611Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:06:13.004861Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:06:13.001157Z&quot;}}" data-trusted="true" data-execution_count="3">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> os</span>
<span id="cb3-2">os.environ[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'UNSLOTH_VLLM_STANDBY'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Unsloth Standby reduces VRAM by 30%+</span></span>
<span id="cb3-3"></span>
<span id="cb3-4">SEED <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span></span></code></pre></div></div>
</div>
</section>
<section id="load-model-and-tokenizer" class="level2">
<h2 class="anchored" data-anchor-id="load-model-and-tokenizer">Load Model and Tokenizer</h2>
<p>We load <strong>LFM2.5-1.2B-Instruct</strong> as the base model, including the tokenizer. Then we apply LoRA adapters.</p>
<div id="cell-03-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:06:13.007148Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:06:13.006663Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:07:52.563512Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:07:52.562610Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:06:13.007123Z&quot;}}" data-outputid="c4c67337-fcfd-4bc6-b98a-71ebef5c9cbb" data-trusted="true">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> unsloth <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> FastLanguageModel</span>
<span id="cb4-2"></span>
<span id="cb4-3">MODEL_NAME     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unsloth/LFM2.5-1.2B-Instruct"</span></span>
<span id="cb4-4">max_seq_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4096</span>   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Must be larger than prompt_length for max_new_tokens to be greater than 0</span></span>
<span id="cb4-5">lora_rank      <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#16    # # Choose any number &gt; 0 ! Suggested 8, 16, 32, 64, 128 (higher rank = smarter, but slower)</span></span>
<span id="cb4-6"></span>
<span id="cb4-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load model</span></span>
<span id="cb4-8">model, tokenizer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> FastLanguageModel.from_pretrained(</span>
<span id="cb4-9">    model_name      <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> MODEL_NAME,</span>
<span id="cb4-10">    max_seq_length  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_seq_length,</span>
<span id="cb4-11">    load_in_4bit <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,       <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># False for LoRA 16bit</span></span>
<span id="cb4-12">    fast_inference <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,     <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enable vLLM fast inference</span></span>
<span id="cb4-13">    max_lora_rank <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lora_rank,</span>
<span id="cb4-14">    load_in_fp8 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Float8 RL / GRPO!</span></span>
<span id="cb4-15">)</span>
<span id="cb4-16"></span>
<span id="cb4-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Inject LoRA adapters into attention and MLP layers</span></span>
<span id="cb4-18">model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> FastLanguageModel.get_peft_model(</span>
<span id="cb4-19">    model,</span>
<span id="cb4-20">    r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lora_rank,</span>
<span id="cb4-21">    target_modules <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [</span>
<span id="cb4-22">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"q_proj"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"k_proj"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"v_proj"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"out_proj"</span>,   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Attention layers</span></span>
<span id="cb4-23">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"in_proj"</span>,                                  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Conv layers</span></span>
<span id="cb4-24">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"w1"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"w2"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"w3"</span>,                           <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># FFN</span></span>
<span id="cb4-25">    ],</span>
<span id="cb4-26">    lora_alpha <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> lora_rank<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,              <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Scaling factor for LoRA updates</span></span>
<span id="cb4-27">    use_gradient_checkpointing <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unsloth"</span>,        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Reduces memory usage </span></span>
<span id="cb4-28">    random_state <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SEED,</span>
<span id="cb4-29">)</span>
<span id="cb4-30"></span>
<span id="cb4-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Count trainable parameters</span></span>
<span id="cb4-32">trainable <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(p.numel() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.parameters() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> p.requires_grad)</span>
<span id="cb4-33">total     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>(p.numel() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> p <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> model.parameters())</span>
<span id="cb4-34"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Trainable params : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>trainable<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>trainable<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>total<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">% of total)"</span>)</span>
<span id="cb4-35"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Total params     : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>total<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:,}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>2026-05-04 19:06:21.934230: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1777921582.153165      57 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1777921582.216471      57 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1777921582.738130      57 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1777921582.738168      57 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1777921582.738171      57 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1777921582.738173      57 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.4.8: Fast Lfm2 patching. Transformers: 4.57.6. vLLM: 0.19.1.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.35. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.</code></pre>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"fe218ff46334471783568eb20d4ec557","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"967779bd38c14b5483f9358ccc5d4a78","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"6e62791dc6c649b89652a5b343a8c113","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"382ac62e4b7444ce8b5b0e76c9ef7327","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"0b2b4d31f3294002882fc980049afc01","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"b844f4540dd24984afab9367b31ba76d","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>Trainable params : 22,216,704 (1.9% of total)
Total params     : 1,192,557,312</code></pre>
</div>
</div>
</section>
<section id="data-preparation" class="level2">
<h2 class="anchored" data-anchor-id="data-preparation">Data Preparation</h2>
<p>For this tutorial, we use the <a href="https://huggingface.co/datasets/Navneetkumar11/rvl-cdip-invoice-extracted">Navneetkumar11/rvl-cdip-invoice-extracted</a> dataset from Hugging Face. This is a OCR extraction dataset. Each example contains raw OCR text from a scanned invoice plus a normalized JSON extraction target.</p>
<p>Let’s load and preprocess the dataset. We train on a compact invoice header schema with only two keys <code>invoice_date</code> and <code>total_amount</code>.</p>
<div id="e27a65ad-8940-4621-9e1e-17f66c69ef3a" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:07:52.565414Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:07:52.564676Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:25.976490Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:25.975844Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:07:52.565349Z&quot;}}" data-trusted="true" data-execution_count="5">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datasets <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> load_dataset</span>
<span id="cb9-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> json</span>
<span id="cb9-3"></span>
<span id="cb9-4">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> load_dataset(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Navneetkumar11/rvl-cdip-invoice-extracted"</span>, split<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"train"</span>)</span>
<span id="cb9-5">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> ex: ex[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"extraction_confidence"</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"high"</span>)</span>
<span id="cb9-6"></span>
<span id="cb9-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> format_dataset(example):</span>
<span id="cb9-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb9-9">        extracted <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> json.loads(example[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"extracted"</span>])</span>
<span id="cb9-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> (<span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">TypeError</span>, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>, json.JSONDecodeError):</span>
<span id="cb9-11">        extracted <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb9-12"></span>
<span id="cb9-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> {</span>
<span id="cb9-14">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>: example[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"raw_ocr_text"</span>],</span>
<span id="cb9-15">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_invoice_dates"</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(extracted.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"invoice_date"</span>)).strip() <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> extracted.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"invoice_date"</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb9-16">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_total_amounts"</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(extracted.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"total_amount"</span>)) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> extracted.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"total_amount"</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span>,</span>
<span id="cb9-17">    }</span>
<span id="cb9-18"></span>
<span id="cb9-19">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(format_dataset)</span>
<span id="cb9-20"></span>
<span id="cb9-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Filter out entries with missing values</span></span>
<span id="cb9-22">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">filter</span>(<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">lambda</span> ex: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">all</span>(ex[k] <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">is</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> k <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [ <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_invoice_dates"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_total_amounts"</span>]))</span>
<span id="cb9-23"></span>
<span id="cb9-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Remove unused columns</span></span>
<span id="cb9-25">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.remove_columns(</span>
<span id="cb9-26">    [c <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> c <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> dataset.column_names <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> c <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">not</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> [</span>
<span id="cb9-27">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>,</span>
<span id="cb9-28">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_invoice_dates"</span>,</span>
<span id="cb9-29">        <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ground_truth_total_amounts"</span>,</span>
<span id="cb9-30">    ]]</span>
<span id="cb9-31">)</span>
<span id="cb9-32"></span>
<span id="cb9-33"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Shuffle and split data into training and evaluation datasets</span></span>
<span id="cb9-34">dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.shuffle(seed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>SEED)</span>
<span id="cb9-35">train_ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.select(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1500</span>))</span>
<span id="cb9-36">eval_ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> dataset.select(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1500</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1600</span>))</span>
<span id="cb9-37"></span>
<span id="cb9-38">dataset</span></code></pre></div></div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"c841fc226655438a87009c111a241e5c","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"7b3569de7a5747a28d9081a63518ef55","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"74346d084e98483cbeb41c67ebe969b9","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"3d7f21b06677422299d26736ada91309","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"f83c024891c24bc49ac2c9ec67a20aea","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"96c9bfbc7dcb4b3fb5c58d243c9ef1d1","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"cfb7651471f741989d057bc3927f9cc4","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"c30ae4ab934f44b2964bdca09b917793","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>Dataset({
    features: ['text', 'ground_truth_invoice_dates', 'ground_truth_total_amounts'],
    num_rows: 2404
})</code></pre>
</div>
</div>
<div id="2558d4c8-6345-41ab-ade3-531610fa358e" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:25.978657Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:25.978359Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:25.984204Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:25.983228Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:25.978617Z&quot;}}" data-trusted="true" data-execution_count="6">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1">dataset[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span></code></pre></div></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>{'text': 'P\nROMOTIONS\nON THE OCEAN INC.\n4205 Pleasant Valley Rd.\nSuite 167\nRaleigh, NC 27612\nAugust 1, 1995\nPROMOTIONS ON THE OCEAN, INC.\nINVOICE # LOR 3-95\nLORILLARD PO # 3119\nDate:\nJune 1-June 31, 1995\nEvent/Location: Newport Summer Nightclub Promotion/Virginia Beach, VA\nDescription:\nNightclub Sampling &amp; Promotional Activities\nCasting, selection &amp; training of models for sampling and activities.\n# Days:\n9\nAvg. duration of days:\n3.5 Hrs.\nAvg. # Supervisors:\n2\nAvg. # Supervisors day rate: $73.50\nTOTAL COSTS: $1,323.00\nPor 8/3\n18 stulas\nLegit\n93114516\nNorth Myrtle Beach, SC - Virginia Beach, VA - Myrtle Beach, SC :unselected:',
 'ground_truth_invoice_dates': '1995-08-01',
 'ground_truth_total_amounts': 1323.0}</code></pre>
</div>
</div>
</section>
<section id="format-prompts-chat-template" class="level2">
<h2 class="anchored" data-anchor-id="format-prompts-chat-template">Format Prompts (Chat Template)</h2>
<p>Because GRPO’s trainer expects a dataset where each row has a <code>"prompt"</code> field, we will format the prompts using a chat template. LFM2.5 uses a ChatML-style format, where every conversation is wrapped in special tokens:</p>
<pre><code>&lt;|im_start|&gt;system
You are a helpful assistant.&lt;|im_end|&gt;
&lt;|im_start|&gt;user
What is 2+2?&lt;|im_end|&gt;
&lt;|im_start|&gt;assistant</code></pre>
<p>The model then generates from the <code>assistant</code> position onward.</p>
<p>In the system prompt we define how the model should the exact schema, how to handle missing fields, and what normalization to apply.</p>
<div id="cell-05-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:25.985945Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:25.985460Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:28.797163Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:28.796338Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:25.985902Z&quot;}}" data-outputid="336c7d1b-8754-4601-a8dc-7deb3c3ad09c" data-trusted="true" data-execution_count="7">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1">SYSTEM_PROMPT <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""You extract structured invoice header data from OCR text.</span></span>
<span id="cb14-2"></span>
<span id="cb14-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Return a JSON object with exactly this kes and no others:</span></span>
<span id="cb14-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- "invoice_date": string in YYYY-MM-DD format or null</span></span>
<span id="cb14-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- "total_amount": JSON number or null</span></span>
<span id="cb14-6"></span>
<span id="cb14-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Rules:</span></span>
<span id="cb14-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Output exactly one JSON object.</span></span>
<span id="cb14-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Do not include markdown fences.</span></span>
<span id="cb14-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Do not include explanations.</span></span>
<span id="cb14-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Do not include any extra keys.</span></span>
<span id="cb14-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Do not invent values.</span></span>
<span id="cb14-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Do not use commas in numbers.</span></span>
<span id="cb14-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- total_amount must be a JSON number, not a string.</span></span>
<span id="cb14-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">- Use null when a field is missing or unclear.</span></span>
<span id="cb14-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"""</span></span>
<span id="cb14-17"></span>
<span id="cb14-18"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> format_prompt(example):</span>
<span id="cb14-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">"""Convert a dataset row into a ChatML-formatted prompt string."""</span></span>
<span id="cb14-20">    messages <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [</span>
<span id="cb14-21">        {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"role"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"system"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content"</span>: SYSTEM_PROMPT},</span>
<span id="cb14-22">        {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"role"</span>: <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"user"</span>,   <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content"</span>: example[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>]},</span>
<span id="cb14-23">    ]</span>
<span id="cb14-24"></span>
<span id="cb14-25">    prompt <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tokenizer.apply_chat_template(</span>
<span id="cb14-26">        messages,</span>
<span id="cb14-27">        tokenize<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>,</span>
<span id="cb14-28">        add_generation_prompt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span></span>
<span id="cb14-29">    )</span>
<span id="cb14-30">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prompt"</span>: prompt}</span>
<span id="cb14-31"></span>
<span id="cb14-32">train_ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> train_ds.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(format_prompt)</span>
<span id="cb14-33">eval_ds  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> eval_ds.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(format_prompt)</span></code></pre></div></div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"a78cbe47e24342199ff45a3891994c94","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"217d04682c6149729d6a610f79ff1351","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
</div>
<p>Below you can see an example of the <code>prompt</code> field using the ChatML template:</p>
<div id="ca38c71c-97fa-4652-a27a-67383157c45a" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:28.798565Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:28.798193Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:28.803760Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:28.803066Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:28.798523Z&quot;}}" data-trusted="true" data-execution_count="8">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1">ex <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> train_ds[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>]</span>
<span id="cb15-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"--- Raw OCR text (input) ---"</span>)</span>
<span id="cb15-3"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(ex[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>][:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span>])</span>
<span id="cb15-4"></span>
<span id="cb15-5"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">--- Formatted prompt (what the model sees) ---"</span>)</span>
<span id="cb15-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(ex[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prompt"</span>][:<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>])</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>--- Raw OCR text (input) ---
P
ROMOTIONS
ON THE OCEAN INC.
4205 Pleasant Valley Rd.
Suite 167
Raleigh, NC 27612
August 1, 1995
PROMOTIONS ON THE OCEAN, INC.
INVOICE # LOR 3-95
LORILLARD PO # 3119
Date:
June 1-June 31, 1995
Event/Location: Newport Summer Nightclub Promotion/Virginia Beach, VA
Description:
Nightclub Sampling &amp; Pr

--- Formatted prompt (what the model sees) ---
&lt;|startoftext|&gt;&lt;|im_start|&gt;system
You extract structured invoice header data from OCR text.

Return a JSON object with exactly this kes and no others:
- "invoice_date": string in YYYY-MM-DD format or null
- "total_amount": JSON number or null

Rules:
- Output exactly one JSON object.
- Do not include markdown fences.
- Do not include explanations.
- Do not include any extra keys.
- Do not invent values.
- Do not use commas in numbers.
- total_amount must be a JSON number, not a string.
- Use nul</code></pre>
</div>
</div>
</section>
<section id="define-the-reward-functions" class="level2">
<h2 class="anchored" data-anchor-id="define-the-reward-functions">Define the Reward functions</h2>
<p>Reward functions are the key ingredients for GRPO. They let us know if the model doing well or not.</p>
<p>For this example, we will define three reward functions, each with a distinct learning signal:</p>
<ul>
<li><strong>JSON structure</strong>: did the model return the right schema?</li>
<li><strong>Field presence</strong>: did it contain the correct fields?</li>
<li><strong>Field quality</strong>: are the values correct? If not, are they close to the desired value?</li>
</ul>
<p>General best practices for reward functions are:</p>
<ul>
<li>Provide clear signals with partial credit where possible</li>
<li>Be deterministic and consistent</li>
<li>Execute quickly for training efficiency</li>
<li>Fail gracefully when the model emits malformed JSON</li>
</ul>
<section id="r1-json-structure-reward" class="level3">
<h3 class="anchored" data-anchor-id="r1-json-structure-reward">R1: JSON Structure Reward</h3>
<p>This reward function teaches the model that structure matters before content does:</p>
<ul>
<li>If the output is valid JSON -&gt; <strong>+1</strong></li>
<li>If the output becomes valid after light cleanup like removing code fences or extra surrounding text -&gt; <strong>+0.5</strong></li>
<li>Not valid even after clean up -&gt; <strong>no reward</strong></li>
</ul>
<div id="cell-06-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:28.805159Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:28.804787Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:28.823068Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:28.822330Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:28.805119Z&quot;}}" data-trusted="true" data-execution_count="9">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> json</span>
<span id="cb17-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> re</span>
<span id="cb17-3"></span>
<span id="cb17-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> _clean_json(text: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>:</span>
<span id="cb17-5">    text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> text.strip()</span>
<span id="cb17-6"></span>
<span id="cb17-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Strip code fences</span></span>
<span id="cb17-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> text.startswith(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"```"</span>):</span>
<span id="cb17-9">        text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> re.sub(<span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">r"</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">^</span><span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">```</span>(?:<span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">json</span>)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">?</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">\s</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, text, flags<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>re.IGNORECASE)</span>
<span id="cb17-10">        text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> re.sub(<span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">r"</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">\s</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">```</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">$</span><span class="vs" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, text)</span>
<span id="cb17-11">    text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> text.strip()</span>
<span id="cb17-12"></span>
<span id="cb17-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Extract JSON object by braces</span></span>
<span id="cb17-14">    start, end <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> text.find(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"{"</span>), text.rfind(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"}"</span>)</span>
<span id="cb17-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> start <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> end <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">and</span> end <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> start:</span>
<span id="cb17-16">        text <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> text[start:end<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb17-17"></span>
<span id="cb17-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> text.strip()</span>
<span id="cb17-19">    </span>
<span id="cb17-20"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> reward_valid_json(completions, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs):</span>
<span id="cb17-21">    scores <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb17-22">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> completion <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> completions:</span>
<span id="cb17-23">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> completion <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(completion, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> completion[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content"</span>]</span>
<span id="cb17-24"></span>
<span id="cb17-25">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb17-26">            json.loads(response)</span>
<span id="cb17-27">            scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>)        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># clean valid JSON</span></span>
<span id="cb17-28">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> json.JSONDecodeError:</span>
<span id="cb17-29">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Second pass: try light cleanup for common model/OCR numeric issues</span></span>
<span id="cb17-30">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb17-31">                json.loads(_clean_json(response))</span>
<span id="cb17-32">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># valid after cleaning (had ``` etc.)</span></span>
<span id="cb17-33">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> json.JSONDecodeError:</span>
<span id="cb17-34">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># invalid JSON</span></span>
<span id="cb17-35">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> scores</span></code></pre></div></div>
</div>
</section>
<section id="r2-key-presence-reward" class="level3">
<h3 class="anchored" data-anchor-id="r2-key-presence-reward">R2: Key Presence Reward</h3>
<p>This reward function teaches the model to follow the target schema, not just produce any JSON:</p>
<ul>
<li>If the output contains exactly the expected keys, <code>invoice_date</code> and <code>total_amount</code> -&gt; <strong>+1</strong></li>
<li>If it contains the expected keys plus extra unwanted keys -&gt; <strong>+0.5</strong></li>
<li>If it includes only some of the expected keys -&gt; <strong>+0.2</strong></li>
<li>If it includes none of the expected keys, or the output still is not valid JSON -&gt; <strong>no reward</strong></li>
</ul>
<div id="cell-07-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:28.824585Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:28.824183Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:28.840407Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:28.839434Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:28.824546Z&quot;}}" data-trusted="true" data-execution_count="10">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1">EXPECTED_KEYS <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"invoice_date"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"total_amount"</span>}</span>
<span id="cb18-2"></span>
<span id="cb18-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> reward_correct_keys(completions, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs):</span>
<span id="cb18-4">    scores <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb18-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> completion <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> completions:</span>
<span id="cb18-6">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> completion <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(completion, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> completion[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content"</span>]</span>
<span id="cb18-7">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _clean_json(response)</span>
<span id="cb18-8">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb18-9">            data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> json.loads(response)</span>
<span id="cb18-10">            keys <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">set</span>(data.keys())</span>
<span id="cb18-11">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> keys <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> EXPECTED_KEYS:</span>
<span id="cb18-12">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># exact match</span></span>
<span id="cb18-13">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">elif</span> EXPECTED_KEYS.issubset(keys):</span>
<span id="cb18-14">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># correct keys + extra keys</span></span>
<span id="cb18-15">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">elif</span> EXPECTED_KEYS <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> keys:</span>
<span id="cb18-16">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>)    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># some correct keys</span></span>
<span id="cb18-17">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb18-18">                scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>)</span>
<span id="cb18-19">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> (json.JSONDecodeError, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">AttributeError</span>):</span>
<span id="cb18-20">            scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>)</span>
<span id="cb18-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> scores</span></code></pre></div></div>
</div>
</section>
<section id="r3-field-quality-reward" class="level3">
<h3 class="anchored" data-anchor-id="r3-field-quality-reward">R3: Field Quality Reward</h3>
<p>This reward function teaches the model that getting the actual field values right matters most:</p>
<p>For <code>invoice_date</code>:</p>
<ul>
<li>If using the target <code>YYYY-MM-DD</code> format earns <strong>+0.1</strong></li>
<li>If the predicted date is an exact match with the ground truth -&gt; <strong>+0.9</strong></li>
<li>If the date is not exact but still parses as a real date, the model gets partial credit for each correct component:
<ul>
<li>Correct year -&gt; <strong>+0.1</strong></li>
<li>Correct month -&gt; <strong>+0.1</strong></li>
<li>Correct day -&gt; <strong>+0.1</strong></li>
</ul></li>
</ul>
<p>For <code>total_amount</code>:</p>
<ul>
<li>If producing any valid numeric value earns <strong>+0.1</strong></li>
<li>The closer the predicted amount is to the ground-truth amount, the more reward it gets, up to <strong>+0.2</strong></li>
<li>If the amount is an exact match -&gt; <strong>+0.7</strong></li>
<li>If the output is not valid JSON, or the values cannot be parsed, those parts receive <strong>no reward</strong></li>
</ul>
<p>The final score is divided by <code>2</code> so the overall reward stays in a controlled range.</p>
<div id="cell-08-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:28.841995Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:28.841486Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:28.855598Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:28.854804Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:28.841953Z&quot;}}" data-trusted="true" data-execution_count="11">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> datetime <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> datetime</span>
<span id="cb19-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> dateutil <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> parser</span>
<span id="cb19-3"></span>
<span id="cb19-4"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> reward_correct_values(completions, </span>
<span id="cb19-5">                          ground_truth_invoice_dates, </span>
<span id="cb19-6">                          ground_truth_total_amounts, </span>
<span id="cb19-7">                          <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>kwargs):</span>
<span id="cb19-8">    scores <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb19-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> completion, date_gt, amount_gt <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">zip</span>(</span>
<span id="cb19-10">        completions, ground_truth_invoice_dates, ground_truth_total_amounts</span>
<span id="cb19-11">    ):</span>
<span id="cb19-12">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> completion <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">isinstance</span>(completion, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>) <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> completion[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"content"</span>]</span>
<span id="cb19-13">        response <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> _clean_json(response)</span>
<span id="cb19-14"></span>
<span id="cb19-15">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb19-16">            data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> json.loads(response)</span>
<span id="cb19-17">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> (json.JSONDecodeError, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">AttributeError</span>):</span>
<span id="cb19-18">            scores.append(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>)</span>
<span id="cb19-19">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">continue</span></span>
<span id="cb19-20"></span>
<span id="cb19-21">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span></span>
<span id="cb19-22"></span>
<span id="cb19-23">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Invoice date: ISO format + partial credit per matching component</span></span>
<span id="cb19-24">        date_str <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">str</span>(data.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"invoice_date"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>))</span>
<span id="cb19-25">        gt_parsed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datetime.strptime(date_gt, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%d</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb19-26">        </span>
<span id="cb19-27">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb19-28">            <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Follows the correct format</span></span>
<span id="cb19-29">            parsed_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> datetime.strptime(date_str, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%Y-%m-</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%d</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb19-30">            score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb19-31">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>:</span>
<span id="cb19-32">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb19-33">                <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Doesn't follow the correct format but is a valid date</span></span>
<span id="cb19-34">                parsed_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> parser.parse(date_str.strip(), fuzzy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb19-35">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">Exception</span>:</span>
<span id="cb19-36">                parsed_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">None</span></span>
<span id="cb19-37"></span>
<span id="cb19-38">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> parsed_date <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> gt_parsed: <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Exact match</span></span>
<span id="cb19-39">            score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span></span>
<span id="cb19-40">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb19-41">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> parsed_date: <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Approximate match</span></span>
<span id="cb19-42">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> parsed_date.year  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> gt_parsed.year:  score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb19-43">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> parsed_date.month <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> gt_parsed.month: score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb19-44">                <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> parsed_date.day   <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> gt_parsed.day:   score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb19-45">        </span>
<span id="cb19-46">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Total amount: scaled by closeness to ground truth</span></span>
<span id="cb19-47">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">try</span>:</span>
<span id="cb19-48">            amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(data.get(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"total_amount"</span>))</span>
<span id="cb19-49">            score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span></span>
<span id="cb19-50">            gt_amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>(amount_gt)</span>
<span id="cb19-51">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> gt_amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>: <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># div0 check</span></span>
<span id="cb19-52">                amount_score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span></span>
<span id="cb19-53">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb19-54">                deviation <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> gt_amount) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">abs</span>(gt_amount)</span>
<span id="cb19-55">                amount_score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">max</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> deviation)</span>
<span id="cb19-56">            score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> amount_score</span>
<span id="cb19-57">            </span>
<span id="cb19-58">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> amount <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> gt_amount: <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Exact match</span></span>
<span id="cb19-59">                score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span></span>
<span id="cb19-60">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">except</span> (<span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">TypeError</span>, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">ValueError</span>, <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">AttributeError</span>):</span>
<span id="cb19-61">            <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">pass</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># not a valid number, no credit</span></span>
<span id="cb19-62"></span>
<span id="cb19-63">        score <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb19-64">        scores.append(score)</span>
<span id="cb19-65">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> scores</span></code></pre></div></div>
</div>
<p>After scoring a group of G responses, GRPO computes each response’s <strong>advantage</strong>. The advantage describes how much better or worse it was than the group average, normalized by the group’s spread:</p>
<pre><code>A(i) = (r(i) - mean(r)) / std(r)</code></pre>
<p>Responses with positive advantages get reinforced, while responses with negative advantages get suppressed. That’s why having a <code>std=0</code> is not ideal because then the responses’ advantages are also 0. That also means, the group serves as a baseline without any need for a critic network.</p>
</section>
</section>
<section id="configure-and-run-grpo-training" class="level2">
<h2 class="anchored" data-anchor-id="configure-and-run-grpo-training">Configure and Run GRPO Training</h2>
<p>Now we configure the GRPO training hyperparameters with the following key configurations:</p>
<ul>
<li><code>num_generations</code>: how many responses per prompt. Higher values lead to better advantage estimates, but linearly more VRAM and compute.</li>
<li><code>beta</code>: penalty for diverging from the reference policy. Higher → model stays closer to base model. Lower → more aggressive learning (risk of collapse).</li>
<li><code>learning_rate</code>: very low because the LoRA weights are already close to useful.</li>
<li><code>max_steps</code>: total number of training steps.</li>
</ul>
<div id="O_Pfvz9-sLoQ" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:28.856750Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:28.856433Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:08:30.722195Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:08:30.721563Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:28.856694Z&quot;}}" data-trusted="true" data-execution_count="12">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> vllm <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> SamplingParams</span>
<span id="cb21-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> trl <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GRPOConfig</span>
<span id="cb21-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb21-4"></span>
<span id="cb21-5"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">def</span> prompt_token_length(example):</span>
<span id="cb21-6">    token_ids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tokenizer(example[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prompt"</span>], add_special_tokens<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"input_ids"</span>]</span>
<span id="cb21-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> {<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prompt_length_tokens"</span>: <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(token_ids)}</span>
<span id="cb21-8"></span>
<span id="cb21-9">tokenized <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> train_ds.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">map</span>(prompt_token_length)</span>
<span id="cb21-10"></span>
<span id="cb21-11">maximum_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>(np.quantile(tokenized[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"prompt_length_tokens"</span>], <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>))</span>
<span id="cb21-12"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"90th percentile prompt length (tokens) ="</span>, maximum_length)</span>
<span id="cb21-13"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Max completion length ="</span>, <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(max_seq_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> maximum_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">128</span>))</span>
<span id="cb21-14"></span>
<span id="cb21-15">max_prompt_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> maximum_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># + 1 just in case!</span></span>
<span id="cb21-16">max_completion_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">min</span>(max_seq_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> max_prompt_length, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">128</span>)</span>
<span id="cb21-17"></span>
<span id="cb21-18">vllm_sampling_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SamplingParams(</span>
<span id="cb21-19">    min_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,</span>
<span id="cb21-20">    top_p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,</span>
<span id="cb21-21">    top_k <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb21-22">    seed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> SEED,</span>
<span id="cb21-23">    stop <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [tokenizer.eos_token],</span>
<span id="cb21-24">    include_stop_str_in_output <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb21-25">)</span>
<span id="cb21-26"></span>
<span id="cb21-27">training_args <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GRPOConfig(</span>
<span id="cb21-28">    vllm_sampling_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> vllm_sampling_params,</span>
<span id="cb21-29">    temperature <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,</span>
<span id="cb21-30">    learning_rate <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5e-6</span>,</span>
<span id="cb21-31">    weight_decay <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.01</span>,</span>
<span id="cb21-32">    warmup_steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>,</span>
<span id="cb21-33">    lr_scheduler_type <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"linear"</span>,</span>
<span id="cb21-34">    optim <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"adamw_8bit"</span>,</span>
<span id="cb21-35">    logging_steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb21-36">    per_device_train_batch_size <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,</span>
<span id="cb21-37">    gradient_accumulation_steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Increase to 4 for smoother training</span></span>
<span id="cb21-38">    num_generations <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Decrease if out of memory</span></span>
<span id="cb21-39">    beta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>,  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># increase kl_coef from default 0.04 to penalize KL drift more</span></span>
<span id="cb21-40">    max_prompt_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_prompt_length,</span>
<span id="cb21-41">    max_completion_length <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> max_completion_length,</span>
<span id="cb21-42">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># num_train_epochs = 1, # Set to 1 for a full training run</span></span>
<span id="cb21-43">    max_steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">250</span>,</span>
<span id="cb21-44">    save_steps <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>,</span>
<span id="cb21-45">    report_to <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>, <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Can use Weights &amp; Biases</span></span>
<span id="cb21-46">    output_dir <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"outputs"</span>,</span>
<span id="cb21-47"></span>
<span id="cb21-48">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For optional training + evaluation</span></span>
<span id="cb21-49">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># fp16_full_eval = True,</span></span>
<span id="cb21-50">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># per_device_eval_batch_size = 4,</span></span>
<span id="cb21-51">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># eval_accumulation_steps = 1,</span></span>
<span id="cb21-52">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># eval_strategy = "steps",</span></span>
<span id="cb21-53">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># eval_steps = 1,</span></span>
<span id="cb21-54">)</span></code></pre></div></div>
<div class="cell-output cell-output-display">
<script type="application/vnd.jupyter.widget-view+json">
{"model_id":"00971691612047e8b4e1aa6c8762aada","version_major":2,"version_minor":0,"quarto_mimetype":"application/vnd.jupyter.widget-view+json"}
</script>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>90th percentile prompt length (tokens) = 730
Max completion length = 128</code></pre>
</div>
</div>
<p>Now we wire everything together in the <code>GRPOTrainer</code>. During training the GRPO trainer handles sampling <code>G</code> completions per prompt, calling the reward function, computing advantages, computing the GRPO policy gradient loss and KL penalty, and updating the LoRA weights.</p>
<p>Instead of consolidating the three reward functions into a single <code>reward_fn</code>, we pass them separately for more transparent logging. We also add some weighting to the reward functions, since <code>reward_valid_json</code> and <code>reward_correct_keys</code> saturate quite quickly.</p>
<div id="cell-10-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:08:30.723501Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:08:30.723146Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:36:24.408944Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:36:24.408270Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:08:30.723459Z&quot;}}" data-outputid="39456583-7905-499e-93ee-af2fe9e3a3cd" data-trusted="true">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> trl <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> GRPOTrainer</span>
<span id="cb23-2"></span>
<span id="cb23-3">trainer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> GRPOTrainer(</span>
<span id="cb23-4">    model <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> model,</span>
<span id="cb23-5">    processing_class <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> tokenizer,</span>
<span id="cb23-6">    reward_funcs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [reward_valid_json,</span>
<span id="cb23-7">                    reward_correct_keys,</span>
<span id="cb23-8">                    reward_correct_values],</span>
<span id="cb23-9">    reward_weights<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>],  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># upweight values</span></span>
<span id="cb23-10">    args <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> training_args,</span>
<span id="cb23-11">    train_dataset <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> train_ds,</span>
<span id="cb23-12"></span>
<span id="cb23-13">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># For optional training + evaluation</span></span>
<span id="cb23-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># train_dataset = new_dataset["train"],</span></span>
<span id="cb23-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># eval_dataset = new_dataset["test"],</span></span>
<span id="cb23-16">)</span>
<span id="cb23-17"></span>
<span id="cb23-18">trainer.train()</span></code></pre></div></div>
</div>
<p><strong>What a healthy GRPO training curve looks like:</strong></p>
<ul>
<li><strong>Reward mean</strong> should rise steadily</li>
<li><strong>Reward std</strong> should stay non-zero (if it collapses to 0, all responses in every group are tied, the advantage = 0, and the gradient vanishes)</li>
<li><strong>KL divergence</strong> should grow slowly and stabilize. Spikes indicates the model has diverged too far from the reference policy. This means, either the learning rate is too high or <code>kl_coef</code> is too low</li>
</ul>
<div id="cell-11-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:45:26.611386Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:45:26.610661Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:45:27.378209Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:45:27.377502Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:45:26.611332Z&quot;}}" data-trusted="true">
<div class="cell-output cell-output-display">
<div>
<figure class="figure">
<p><img src="https://www.leoniemonigatti.com/blog/fine-tuning-lfm2-5-1-2b-instruct-with-grpo_files/figure-html/cell-15-output-1.png" class="img-fluid figure-img"></p>
</figure>
</div>
</div>
</div>
<p>These curves show that the model learns the output format very quickly: <code>reward_valid_json</code> and <code>reward_correct_keys</code> stay close to 1, so most generations are valid JSON with the right schema. The noisier `reward_correct_values curve tells the more interesting story, since extracting the exact invoice date and amount from OCR is the hard part and remains the main source of variation throughout training. The KL curve rises with a few spikes, which is normal here and suggests the model is changing meaningfully from the base policy without becoming obviously unstable.</p>
<p>A natural next step would be to strengthen the reward function so it reflects real extraction quality more sharply, especially by rewarding exact value matches more and giving less credit to partially correct but still misleading dates or amounts. After that, I’d try a slightly longer run with a bit more data or a cleaner subset of the OCR examples, since this model already seems to have learned the JSON format and now mostly needs help improving value accuracy.</p>
</section>
<section id="save-checkpoint" class="level2">
<h2 class="anchored" data-anchor-id="save-checkpoint">Save Checkpoint</h2>
<p>Before evaluation, we will save the fine-tuned model. For this, we only have to save the LoRA adapter, as the base model weights remain unchanged and do not need to be saved.</p>
<div id="cell-12-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:36:25.270032Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:36:25.269803Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:36:25.678733Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:36:25.677879Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:36:25.269996Z&quot;}}" data-trusted="true" data-execution_count="15">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb24-1">FINAL_ADAPTER_PATH <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"./grpo_lfm2.5_invoice_adapter"</span></span>
<span id="cb24-2"></span>
<span id="cb24-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Save the final LoRA adapter from the current fine-tuned model</span></span>
<span id="cb24-4">model.save_pretrained(FINAL_ADAPTER_PATH)</span>
<span id="cb24-5">tokenizer.save_pretrained(FINAL_ADAPTER_PATH)</span></code></pre></div></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<pre><code>('./grpo_lfm2.5_invoice_adapter/tokenizer_config.json',
 './grpo_lfm2.5_invoice_adapter/special_tokens_map.json',
 './grpo_lfm2.5_invoice_adapter/chat_template.jinja',
 './grpo_lfm2.5_invoice_adapter/tokenizer.json')</code></pre>
</div>
</div>
</section>
<section id="inference-and-evaluation" class="level2">
<h2 class="anchored" data-anchor-id="inference-and-evaluation">Inference and Evaluation</h2>
<p>Let’s compare the <strong>base model</strong> and the <strong>GRPO fine-tuned model</strong> on a few held-out examples from <code>eval_ds</code> to see if the fine-tuned model actually is an improvement over the base model.</p>
<p>For this, we first load both the base model and the fine-tuned model.</p>
<p>::: {#1f3a914c-f946-4d9c-8f13-af8925f29e3f .cell _kg_hide-output=‘true’ quarto-private-1=‘{“key”:“execution”,“value”:{“iopub.execute_input”:“2026-05-04T19:36:25.680873Z”,“iopub.status.busy”:“2026-05-04T19:36:25.679754Z”,“iopub.status.idle”:“2026-05-04T19:37:05.013941Z”,“shell.execute_reply”:“2026-05-04T19:37:05.013289Z”,“shell.execute_reply.started”:“2026-05-04T19:36:25.680807Z”}}’ trusted=‘true’}</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load a fresh base model for comparison</span></span>
<span id="cb26-2">base_model, base_tokenizer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> FastLanguageModel.from_pretrained(</span>
<span id="cb26-3">    model_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>MODEL_NAME,</span>
<span id="cb26-4">    max_seq_length<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>max_seq_length,</span>
<span id="cb26-5">    load_in_4bit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb26-6">)</span>
<span id="cb26-7"></span>
<span id="cb26-8">FastLanguageModel.for_inference(base_model)</span>
<span id="cb26-9"></span>
<span id="cb26-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load the saved fine-tuned adapter as a fresh model for comparison</span></span>
<span id="cb26-11">ft_model, ft_tokenizer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> FastLanguageModel.from_pretrained(</span>
<span id="cb26-12">    model_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>FINAL_ADAPTER_PATH,</span>
<span id="cb26-13">    max_seq_length<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>max_seq_length,</span>
<span id="cb26-14">    load_in_4bit<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>,</span>
<span id="cb26-15">)</span>
<span id="cb26-16"></span>
<span id="cb26-17">FastLanguageModel.for_inference(ft_model)</span></code></pre></div></div>
<p>:::</p>
<p>Let’s look at a few examples:</p>
<div id="77948322-cf3e-4962-8ef5-6129e4890b76" class="cell" data-trusted="true">
<div class="cell-output cell-output-stdout">
<pre><code>==================================================
Example 1

Ground truth:
"CUSTOMER NO.\nLENGER\nDTV.\nVENDOR NO.\nCREDIT DATE\n1276900000\n62\n0616\n0000013554\n01/20/95\nSOLD TO\nSHIP TO\nFAIRMONT WHOLESALE INC\nFAIRMONT WHOLESALE INC\nPO BOX 905\n000 NO NORTH AVE\nFAIRMONT\nMN56031\nFAIRMONT\nHN56031\nTHIS IS NOT A CREDIT MEMO - CHECK ATTACHED\nQUANTITY\n(IN THOUSANDS)\nBRAND NAME\nAMOUNT\n6\nTRUE HEN K\n335,79\n6\nTRUE MEN 1\n335,70\n6\nSTYLE LT I\n335,70\n6\nSTYLE MNLT\n335,70\n6\nSTYLE LT M\n335.70\n1.2\nSTYLE LT F\n671,40\nTOTALS-\n42\n2,349.90\nLORILLARD PLUS DISBURSEMENT IS\n$1.30 OF TOTAL QUANTITY - THANK YOU\nGROSS AMOUNT -\n54.60\nNET AMOUNT -\n54.60\nGROSS AMOUNT REFLECTS LIST PRICE AS OF PURCHASE DATE\n95602449"
{
  "invoice_dates": "1995-01-20",
  "total_amount": 2349.9
}

Base model output:
{
  "invoice_date": "2025-01-20",
  "total_amount": 2349000
}
R1=1.000 | R2=1.000 | R3=0.200 

Fine-tuned model output:
{
  "invoice_date": "1995-01-20",
  "total_amount": 2349.90
}
R1=1.000 | R2=1.000 | R3=1.000 

==================================================
Example 2

Ground truth:
"PRO BILLIARDS\nTOUR\nINVOICE\nDATE:\nMarch 20, 1997\nACCOUNT:\nSports Marketing Enterprises\nP.O. Box 2955\nWinston-Salem, N.C. 27102\nPRODUCT:\nBusiness License.\nDESCRIPTION:\n1/2 cost of license for Camel/RJR booth space at\nSands Regency XXIII (June 1996)\nAMOUNT:\n$300.00\nAMOUNT DUE:\n$300.00\nPAYABLE TO:\nSands Regency Hotel Casino\n345 N. Arlington Ave.\nReno, Nv. 89501\nDATE DUE:\nApril 15, 1997\nPRIVILEGED MATERIAL REDACTED\n51809 5560\n4412 Commercial Way . Spring Hill, Florida 34606 . (352) 596-7808 \u00b7 Fax (352) 596-7441\nVisit our website: www.propool.com"
{
  "invoice_dates": "1997-03-20",
  "total_amount": 300.0
}

Base model output:
{
  "invoice_date": "1997-03-20",
  "total_amount": null
}
R1=1.000 | R2=1.000 | R3=0.500 

Fine-tuned model output:
{
  "invoice_date": "1997-03-20",
  "total_amount": 300.00
}
R1=1.000 | R2=1.000 | R3=1.000 

==================================================
Example 3

Ground truth:
"ORIGINAL INVOICE\n\u0130\u0130TR\u0130\nHIT RESEARCH INSTITUTE\n50534\n#7622\nLorillard Research Center\n420 English St.\nGreensboro, NC 27405\nAttn: Dr. Thomas A. Vollmuth\nPLEASE REFER TO OUR INVOICE NUMBER\nAND REMIT TO:\nP. O. BOX 92003\nCHICAGO, ILLINOIS 60675\nDATE\nPROJECT No.\nACCOUNT NUMBER\nCONTRACT OR P. O. No.\nTERMS\n6/26/89\nL08170\n112-10 :selected:\ncr\n348B\nNET CASH\nAcute Toxicity Study in Rats\nRange Finding &amp; LD50 Compound\nA262\n$ 4,150.00\nA268\n4,150.00\nA270\n4,150.00\nOK\n\u041d\u0410. \u0422\u0438\u043c\u0435\u043f\u0430\u0434\u0430\n7-5-890\nDept. 8700\nAcct. 4111 :selected:\nAMOUNT DUE $12,450.00\nFORM 583\n87148148 :unselected: :unselected: :unselected: :unselected:"
{
  "invoice_dates": "1989-06-26",
  "total_amount": 12450.0
}

Base model output:
{
  "invoice_date": "2029-06-26",
  "total_amount": 4120500
}
R1=1.000 | R2=1.000 | R3=0.200 

Fine-tuned model output:
{
  "invoice_date": "1990-06-26",
  "total_amount": 4150.00
}
R1=1.000 | R2=1.000 | R3=0.233 
</code></pre>
</div>
</div>
<p>As you can see, the fine-tuned model isn’t perfect (for example, it gets the <code>total_amount</code> in the last example wrong) but it is performing much better at many of the extraction tasks than the base model.</p>
</section>
<section id="push-to-hugging-face-hub" class="level2">
<h2 class="anchored" data-anchor-id="push-to-hugging-face-hub">Push to Hugging Face Hub</h2>
<p>To push your model to the Hugging Face Hub, uncomment the lies below and fill in your repository name.</p>
<div id="cell-13-code" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;execution&quot;,&quot;value&quot;:{&quot;iopub.execute_input&quot;:&quot;2026-05-04T19:37:50.835898Z&quot;,&quot;iopub.status.busy&quot;:&quot;2026-05-04T19:37:50.835542Z&quot;,&quot;iopub.status.idle&quot;:&quot;2026-05-04T19:37:50.839384Z&quot;,&quot;shell.execute_reply&quot;:&quot;2026-05-04T19:37:50.838601Z&quot;,&quot;shell.execute_reply.started&quot;:&quot;2026-05-04T19:37:50.835855Z&quot;}}" data-trusted="true" data-execution_count="18">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># model.push_to_hub("your-username/lfm2.5-350m-grpo-invoice-extractor")</span></span>
<span id="cb28-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># tokenizer.push_to_hub("your-username/lfm2.5-350m-grpo-invoice-extractor")</span></span></code></pre></div></div>
</div>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>In this notebook, we explored the key principles of a small GRPO fine-tuning pipeline to fine-tune LFM2.5-1.2B-Instruct to extract JSON from raw OCR text using Unsloth. The main goal was to teach the model to produce more accurate structured outputs, with rewards based on whether the response is valid JSON, contains the right keys, and predicts the correct values.</p>
<p>The training pipeline in this notebook is intended for learning purposes and thus the fine-tuned model still has room for improvement. For better results, I recommend revisiting the reward functions, training for at least 300 steps (which may take about 30 minutes on a T4), and experimenting with hyperparameter tuning.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Unsloth Documentation. <a href="https://unsloth.ai/docs/models/tutorials/lfm2.5">Liquid LFM2.5: How To Run &amp; Fine-tune</a></li>
<li>Unsloth Documentation. <a href="https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/tutorial-train-your-own-reasoning-model-with-grpo">Tutorial: Train your own Reasoning model with GRPO</a></li>
</ul>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/fine-tuning-lfm2-5-1-2b-instruct-with-grpo.html</guid>
  <pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Addicted to intelligence</title>
  <link>https://www.leoniemonigatti.com/blog/additected-to-intelligence.html</link>
  <description><![CDATA[ 





<p>This week I attended a local meetup that revolved around the topic of “local AI”. Throughout different sessions and conversations, I noticed a recurring theme: <em>How dependent we already are on our AI systems.</em></p>
<p>I have adapted AI into my day-to-day workflow, but I don’t use AI for 100% of my work. And I’d argue I could do my job without it.</p>
<p>When I wrote this, <span class="text-highlight">I realized that’s what someone with a problem would say.</span> Then I remembered what happened a few weeks ago when I was in the middle of writing a blog post.</p>
<p>Although my writing process is to a large extent still manual, when Claude went down, my first thought wasn’t “Oh, well” but rather “I can’t do my work now”. The first signs of withdrawal despite having access to other models.</p>
<p>But what happens when Claude doesn’t go down for a few hours but instead maybe your account gets suspended? All your workflows, skills, memory, conversation history gone (if you don’t have it backed up).</p>
<p>I realized I got hooked on heavily subsidised $20 subscriptions to intelligence. Now that the addiction is clear by signs of withdrawal and being willing to spend more on intelligence, the question is: Do we take care of our own supply with local AI or kick the habit altogether?</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/additected-to-intelligence.html</guid>
  <pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>The disappointing feeling when you realize something was AI-generated</title>
  <link>https://www.leoniemonigatti.com/blog/disappointing-feeling.html</link>
  <description><![CDATA[ 





<p>One thing I pride myself on is making great technical diagrams. But those I make manually, and they take time. Boxes are placed intentionally. Colors, lines, and arrows that actually mean something. Proportions and placement to reduce mental load. A good diagram is a form of thinking made visible. It takes time to get right.</p>
<p>Recently, I saw someone make an amazing technical drawing. Naturally, I reached out to learn more about how they made it.</p>
<p>They responded with a huge prompt.</p>
<p>That’s when I realized it was fully AI-generated. I tried it myself. While those drawings looked great at first glance, it was difficult to get the details right. When I went back to the original image, <span class="text-highlight"> all of a sudden, I started seeing all the signs of slop.</span> The arrows were going in the wrong direction. The proportions of objects were off. The longer I looked at it, the worse it got.</p>
<p>It’s one thing to spot AI-generated content immediately, but it’s another <span class="text-highlight">to think something is genuine only to find out it’s AI-generated</span> once you take a closer look. Somehow, that’s very disappointing.</p>
<p>Although the diagram didn’t change, my relationship to it did. The disappointment is that what I assumed was the amount of thought behind it turned out to be none. (There’s probably a German word for it.)</p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/disappointing-feeling.html</guid>
  <pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>I’ve been blogging wrong</title>
  <link>https://www.leoniemonigatti.com/blog/ive-been-blogging-wrong.html</link>
  <description><![CDATA[ 





<p>Ever since I started writing on the internet, I always thought that by the time I published 100 blogs, I would be good. <span class="text-highlight">I’ve published 90 so far and just realized I’ve been doing it wrong.</span></p>
<p>For me, writing is learning. When I learn a new topic, I write an explainer. When I play with a new developer tool, I write a tutorial. That’s why the majority of my blogs are tutorials and explainers. In an increasingly generic slop world, neither of those will ever be valuable to anyone again.</p>
<p>What becomes more important are human experiences: Real thoughts and lessons from failures. In the past, none of my posts in this direction have gotten any traction.</p>
<p>Starting with this post, I commit to <span class="text-highlight">an experiment: Write one post from human experience a week.</span> First, just to get into the habit. Second, to eventually get better at writing.</p>
<p>They will be bad. They will be short. I do not expect anyone to read them, but only to hold myself accountable. And hopefully, someday, one of them will be helpful to someone. <!--

I've written 90 blog posts. I thought by the time I've written a 100
I’ve started writing tutorials. I’ve written SEO explainers. 
I’ve used blogging as a learning tool. So admittedly saying I’ve been blogging wrong is clickbait. 
But I want to start writing more to help me refine my thoughts.
Not necessarily to share a pov but to help me develop one.

You don’t need opinions on everything

any interesting person I’ve talked to has some 
Opinions. Strong opinions.
What you learn
In an ever generic AI slop world, my SEO explainers won’t be adding any value anymore.
<span class="text-highlight">I want to commit to writing one shitty thought piece a week just to get into the habit.</span>

Please don’t read them but Please hold me accountable.
--></p>



<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/ive-been-blogging-wrong.html</guid>
  <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>The shell tool is not a silver bullet for context engineering</title>
  <link>https://www.leoniemonigatti.com/</link>
  <description>...</description>
  <category>Elastic Search Labs Blog</category>
  <guid>https://www.leoniemonigatti.com/</guid>
  <pubDate>Wed, 25 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Agent Journey Map: Designing Software for AI Agents</title>
  <link>https://www.leoniemonigatti.com/blog/agent-experience.html</link>
  <description><![CDATA[ 





<p>Developer Experience and User Experience are two key considerations when shaping a product. Today, software products are no longer exclusively used by humans but also by AI agents. This shift requires a new lens to design your product: <em>Agent Experience.</em></p>
<p>The term “Agent Experience (AX)” was first <a href="https://biilmann.blog/articles/introducing-ax/">coined by Mathias Biilmann, Netlify’s CEO, in January 2025 as the “holistic experience AI agents will have as the user of a product or platform”</a>.</p>
<p>I’ve spent the last three years working in Developer Growth and Developer Relations in the AI space and watched the playbook rewrite itself. I have been seeing a shift from how we think about SEO to how we consider how content gets mentioned in LLM responses, how we think about making documentation not only accessible to developers but also AI agents, and how we are starting to formulate engineering best practices around the ease of use for agents.</p>
<p>In this blog, I’ve put all the techniques I’ve come across so far into a practical framework. Note that the concept of Agent Experience is only a year old, and the industry is currently iterating on different ideas and techniques, which might already be outdated in a few months.</p>
<section id="agent-experience-ax-vs-developer-experience-dx-vs-user-experience-ux" class="level2">
<h2 class="anchored" data-anchor-id="agent-experience-ax-vs-developer-experience-dx-vs-user-experience-ux">Agent Experience (AX) vs Developer Experience (DX) vs User Experience (UX)</h2>
<p>Agents have entered the chat. Until now, software products have had two target audiences in mind when improving usability: End-users and developers. Now we have agents interacting with software products. Coding agents can build software using different software products, including databases or frameworks; Computer use agents can interact with different software products, including email or calendar apps on your behalf.</p>
<p><strong>User Experience (UX)</strong> centers on how end-users successfully accomplish tasks with your product. <strong>Developer Experience (DX)</strong> focuses on how developers successfully build with your product.</p>
<p><strong>Agent Experience (AX)</strong> is different: the “user” is an AI agent that discovers, evaluates, and operates your platform, often without a human in the loop. The central question becomes: <em>How does an AI agent successfully use or build with your product?</em> This also means the agent can be seen as both the end-user and a developer persona.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
<col style="width: 25%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th>User Experience (UX)</th>
<th>Developer Experience (DX)</th>
<th>Agent Experience (AX)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Target audience</td>
<td>End-user</td>
<td>Developer</td>
<td>AI Agent</td>
</tr>
<tr class="even">
<td>Focus</td>
<td>How to use the product</td>
<td>How to build with the product</td>
<td>How agents use and build with the product</td>
</tr>
<tr class="odd">
<td>Metric</td>
<td>CSAT, NPS, Task success rate, etc.</td>
<td>Time to first API call, Build speed, etc.</td>
<td>Picked by agents, token usage, number of feedback loops</td>
</tr>
</tbody>
</table>
<p>When designing for DX or UX, keeping the target audience’s <strong>emotions</strong> in mind is a key aspect: Are they frustrated, confused, delighted? For AI agents, this <em>sentiment curve</em> doesn’t exist. Instead, we have to replace emotions with failure modes (e.g., ambiguous error response, auth failure, etc.) and the sentiment curve with reliability curves (“How likely is the agent to succeed at each stage”).</p>
</section>
<section id="stages-of-the-agent-journey-map" class="level2">
<h2 class="anchored" data-anchor-id="stages-of-the-agent-journey-map">Stages of the Agent Journey Map</h2>
<p>Agent Experience has many different facets. Some aspects include answering questions like “How do you make an agent pick your software product?” or “How do you make your software product accessible for agents?”</p>
<p>This reminded me of a similar concept from UX and DX: In UX and DX, the target audience’s goals, questions, and answers are commonly mapped to so-called “user journey maps” or “developer journey maps”, which represent the end-to-end path the user or developer takes when interacting with your software product. In agent experience, the agent’s path is similar to the end-user’s or the developer’s, except that the human persona is now an AI agent.</p>
<p>In this section, I map the path an agent takes from discovery to success on an analogous “Agent Journey Map”. You can think of it as an adoption funnel for agents. The map I came up with follows the five stages of Discover, Evaluate, Onboard, Integrate, Advocate, as shown below.</p>
<p><img src="https://www.leoniemonigatti.com/blog/images/agent_journey_map.png" class="img-fluid" alt="Agent Experience"> (Inspired by <a href="https://www.devrel.agency/developerjourney">Developer Journey Map</a>)</p>
<p>(Note that in DX, you often have a “Scale” stage between “Integrate” and “Advocate”, which I couldn’t figure out how to properly manage in the AX case. If you have an idea how to improve this, please reach out.)</p>
<section id="discover" class="level3">
<h3 class="anchored" data-anchor-id="discover">Discover</h3>
<p>The “Discover” stage tries to answer the question “<em>Can the agent know about and find your platform?</em>” This stage is about the visibility of your platform in LLM training data and search results when the agent calls web search tools for research.</p>
<p>This is the billion-dollar question. I have seen many agencies promising solutions to the discoverability and visibility for LLMs, but I haven’t seen any proof that they work.</p>
<p>A common assumption I see is to increase the volume of mentions of your product in the foundation models’ training data. There are also plenty of agencies popping up promising increased LLM citations by generating User Generated Content (UCG) for popular platforms, such as Medium or dev.to.</p>
<p>Another aspect is the SEO equivalent of getting mentioned in LLM responses, which goes by many different names, such as GEO (Generative Engine Optimization), LLMO (LLM Optimization), or AEO (Answer Engine Optimization). Many SEO tools incorporate GEO/LLMO/AEO into their product and promise better rankings by adjusting blog structures for skimmability and chunking, improved citations for proof of authority and relevance, and easy readability. However, I haven’t seen any confirmation on whether those actually impact mentions in LLMs.</p>
<p>Finally, considering how LLMs formulate search queries when using web search tools, the web is now flooded with low-quality SEO articles with titles, such as “Top 10”, “in 2026”, “X vs Y”, and so on for different tools.</p>
</section>
<section id="evaluate" class="level3">
<h3 class="anchored" data-anchor-id="evaluate">Evaluate</h3>
<p>The “Evaluate” stage tries to answer the question “<em>Can the agent assess if your platform fits the task and meets its needs?</em>” This, I’d say, is similar to how a user or developer would evaluate your product by browsing your website or documentation. Similar to UX and DX, your website should have clear capability descriptions. What’s new in AX is considering making websites more accessible to agents.</p>
<p>In September 2024, <a href="https://www.answer.ai/posts/2024-09-03-llmstxt.html">Answer.AI proposed the llms.txt file format as a standardized way to provide information to help LLMs use websites</a>. While the industry has been promoting this standard as a best practice, from my experience, the llms.txt receives little traffic. <a href="https://dri.es/markdown-llms-txt-and-ai-crawlers">A recent blog post by Dries Buytaert, founder of Drupal, reports the same, stating “The bots it was designed for don’t look for it.”</a></p>
<p>The same blog by Dries also observes that Markdown files get requested more than the llms.txt. Many companies, such as the <a href="https://ai.google.dev/">Gemini API docs</a> or the <a href="https://www.elastic.co/docs/get-started">Elastic docs</a>, now have “View Markdown” options. However, Dries’ blog also shows that Markdown files are viewed by bots, but not as much as the regular HTML files.</p>
</section>
<section id="onboard" class="level3">
<h3 class="anchored" data-anchor-id="onboard">Onboard</h3>
<p>The “Onboard” stage tries to answer the question “<em>Can the agent get set up easily (without human intervention)?</em>” This stage considers everything about the agent’s <em>first access</em>, including:</p>
<ul>
<li>“How fast can the agent get started?”: In Developer Experience, you’d measure this by a metric like “Time to Hello World” (or “Time to first token/first API call/etc.”). In Agent Experience, having an <a href="https://www.youtube.com/watch?v=CEvIs9y1uog">Agent Skill</a> that guides the agent on how to successfully use an API can be helpful, aside from the regular documentation.<br>
</li>
<li>“Can the agent get started without a human?”: Does the agent have the right permissions, or must a human be in the loop? For example, when registering on your platform to obtain an API key, maybe the authentication step is where agents most commonly fail because OAuth flows are designed for humans (in the loop). Another consideration is if your offering could benefit from a sandboxed environment.</li>
</ul>
</section>
<section id="integrate" class="level3">
<h3 class="anchored" data-anchor-id="integrate"><strong>Integrate</strong></h3>
<p>The “Integrate” stage tries to answer the question, “<em>Can the agent operate your platform reliably?</em>” Currently, I’m seeing four main ways an AI agent uses a software product:</p>
<p><strong>CLIs</strong> are arguably the simplest interface for an agent to use because LLMs have been trained heavily on CLI usage.</p>
<p><strong>APIs</strong> are the most established way for agents to interact with a software product. The key for AX is ensuring your API has clear, machine-readable specs (e.g., OpenAPI) and error responses that give the agent enough context to self-correct without human intervention. This is especially important when your tool is new and not yet in any LLM’s training data.</p>
<p>An interesting <a href="https://hornet.dev/blog/how-we-build-a-retrieval-engine-for-agents">approach by HORNET.dev is to use verifiable APIs (including configuration, queries, and deployment)</a> to let agents learn how to use their retrieval engine through guided feedback loops, analogous to code that can be tested and self-corrected.</p>
<p><strong>Model Context Protocol (MCP)</strong> is an open standard for connecting LLMs with external tools and data sources <a href="https://www.anthropic.com/news/model-context-protocol">proposed by Anthropic in 2024</a> and <a href="https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation">donated to the Linux Foundation at the end of 2025</a>. Many companies now manage MCP servers that expose tools, up-to-date documentation, and code examples directly to agents via MCP clients. Although MCP promised a cleaner interface, a recent blog post <a href="https://ejholmes.github.io/2026/02/28/mcp-is-dead-long-live-the-cli.html">argues that LLMs don’t need specialized protocols because they can figure things out on their own with a CLI and some documentation</a>.</p>
<p><strong>Agent Skills</strong> are reusable, self-contained instructions that guide an agent on how to successfully accomplish a specific task with your product. This is especially valuable during onboarding, but well-designed skills can reduce failure rates throughout the usage.</p>
<p>—–</p>
<p>Additionally, when I discuss <a href="https://www.elastic.co/search-labs/blog/database-retrieval-tools-context-engineering#building-the-right-database-retrieval-tools-(%E2%80%9Clow-floor,-high-ceiling%E2%80%9D)">best practices for writing tools for search agents</a> with my colleagues at Elastic, they talk about “Low floor, high ceiling”: a concept from User Experience (UX) design that describes products that are easy to get started with (low floor) yet capable of supporting advanced, complex use cases (high ceiling). In the context of search agents, that means:</p>
<ul>
<li><strong>Low floor:</strong> Make it easy and accessible for agents to solve repetitive tasks with little reasoning overhead by abstracting tasks into specialized tools. While this reduces the complexity for commonly known tasks, having access to only specialized tools will prevent the agent from solving ambiguous tasks.<br>
</li>
<li><strong>High ceiling:</strong> Enables an agent to solve complex, ambiguous tasks with general-purpose tools (such as all-purpose search tools or plain exec tools) even when there are no specialized tools available. However, as the agent has to figure out how to solve the task on its own, it might require more iterations to solve the problem.</li>
</ul>
</section>
<section id="advocate" class="level3">
<h3 class="anchored" data-anchor-id="advocate">Advocate</h3>
<p>The “Advocate” stage tries to answer the question “<em>Does the agent advocate for your platform?</em>” For example, how do agents decide what software products and developer tools to recommend for a certain coding task? How does an agent pick their go-to tech stack?<br>
A recent report presented an analysis of what <a href="https://amplifying.ai/research/claude-code-picks/report">Claude Code Actually Chooses</a>. The report showcases that Claude Code seems to favor specific software products over others. The report highlighted the competitive intelligence of understanding what and how AI agents actually choose software tools.</p>
<p>Unfortunately, I don’t know how AI agents choose software tools. I can only assume that this might be related to the techniques in the “Discover” stage of the quantity and sentiment your product is mentioned in the training data, and what ranks high for specific web search queries the agent composes when researching. Additionally, I could imagine that the amount your product is used in public projects in GitHub repositories could play a role as well.<br>
This question essentially loops back to the “Discover” stage, which makes the <em>agent journey a cycle and not a funnel</em>.</p>
</section>
</section>
<section id="summary" class="level2">
<h2 class="anchored" data-anchor-id="summary">Summary</h2>
<p>As AI agents establish themselves with use cases such as coding agents, considering Agent Experience when designing your software product becomes more important.</p>
<p>As we’ve seen, the industry proposes and experiments with different standards, such as the llms.txt or MCP, and already starts to discard them. All the aspects mentioned in this blog are a snapshot of the techniques practitioners are experimenting with right now and may be outdated in just a few months.</p>
<p>Whether you think “Agent Experience” is just another buzzword, the reality is that agents are now very real users of your product. If you are a big AI lab developing LLMs (which are the core part of an agent), Agent Experience might be less of a priority for you right now. However, if you have a developer tool or software product you want agents to use, Agent Experience should become a priority. People are already discussing whether Product-led growth (PLG) approaches will soon be replaced by <a href="https://x.com/tbpn/status/2025276920494260473">“Agent-led growth”</a> approaches.</p>


</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/agent-experience.html</guid>
  <pubDate>Tue, 24 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Building effective database retrieval tools for context engineering</title>
  <link>https://www.leoniemonigatti.com/</link>
  <description>...</description>
  <category>Elastic Search Labs Blog</category>
  <guid>https://www.leoniemonigatti.com/</guid>
  <pubDate>Mon, 09 Mar 2026 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Agent Memory: Filesystem vs Database</title>
  <link>https://www.leoniemonigatti.com/blog/filesystem-vs-database-for-agent-memory.html</link>
  <description><![CDATA[ 





<p>I’m digesting the current “filesystem vs database” debate for agent memory. Currently I’m seeing 2 camps in how we build agent memory:</p>
<ul>
<li>On the one side, we have the “file interfaces are all you need” camp.</li>
<li>n the other side, we have the “filesystems are just bad databases” camp.</li>
</ul>
<section id="file-interfaces-are-all-you-need-camp" class="level2">
<h2 class="anchored" data-anchor-id="file-interfaces-are-all-you-need-camp">“File interfaces are all you need” camp</h2>
<p>Leaders like Anthropic, Letta, Langchain &amp; LlamaIndex are leaning towards file interfaces because “files are surprisingly effective as agent memory”.</p>
<ul>
<li><a href="../blog/claude-memory-tool.html">Anthropic’s memory tool</a> treats memory as a set of files (the storage implementation is left up to the developer)</li>
<li><a href="https://x.com/hwchase17/status/2011814697889316930">Langsmith’s agent builder</a> also represents memory in as a set of files (the data is stored in a database and files are exposed to the agent as a filesystem)</li>
<li><a href="https://www.letta.com/blog/benchmarking-ai-agent-memory">Letta</a> found that simple filesystem tools like <code>grep</code> and <code>ls</code> outperformed specialized memory or retrieval tools in their benchmarks -<a href="https://www.llamaindex.ai/blog/files-are-all-you-need">LlamaIndex</a> argues that for many use cases a well-organized filesystem with semantic search might be all you need</li>
</ul>
<p>Agents are good at using filesystems because models are optimized for coding tasks (including CLI operations) duringpost-training.</p>
<p>That’s why we’re seeing a “virtual filesystem” pattern where the agent interface and the storage implementation are decoupled.</p>
</section>
<section id="filesystems-are-just-bad-databases-camp" class="level2">
<h2 class="anchored" data-anchor-id="filesystems-are-just-bad-databases-camp">“Filesystems are just bad databases” camp</h2>
<p>But then you have voices like Dax from OpenCode who rightly points out that <a href="https://x.com/thdxr/status/2011638639831499041">“a filesystem is just the worst kind of database”</a>.</p>
<p><a href="https://x.com/swyx/status/2011984243430236608?s=20">swyx</a> and <a href="https://x.com/jeffreyhuber/status/2011953780053737961">colleagues in the database space</a> warn about accidentally reinventing databases by solving the agent memory problem. Avoid writing worse versions of:</p>
<ul>
<li>search indexes,</li>
<li>transaction logs,</li>
<li>locking mechanisms,</li>
</ul>
</section>
<section id="trade-offs" class="level2">
<h2 class="anchored" data-anchor-id="trade-offs">Trade-offs</h2>
<p>It’s important to match the complexity of your system to the complexity of your problem.</p>
<section id="simplicity-vs-scale" class="level3">
<h3 class="anchored" data-anchor-id="simplicity-vs-scale">Simplicity vs scale</h3>
<p>Files are simple and CLI tools can even outperform specialized retrieval tools.</p>
<p>But these CLI tools don’t scale well &amp; can become a bottleneck.</p>
</section>
<section id="querying-and-aggregations" class="level3">
<h3 class="anchored" data-anchor-id="querying-and-aggregations">Querying and aggregations</h3>
<p><code>grep</code> can be effective and a hard baseline to beat. And if you want to improve retrieval performance with hybrid or semantic search?</p>
<p>Luckily, there are CLI tools available for semantic search (e.g., <a href="https://github.com/run-llama/semtools"><code>semtools</code></a> or <a href="https://github.com/mixedbread-ai/mgrep"><code>mgrep</code></a>).</p>
<p>The question remains: How well they scale and how effective agents are at using them when they are not as common in the training data.</p>
<p>Also at some point you might want some aggregations as well.</p>
</section>
<section id="plain-text-vs-complex-data" class="level3">
<h3 class="anchored" data-anchor-id="plain-text-vs-complex-data">Plain text vs complex data</h3>
<p>File interfaces and native CLI tools are great for plain-text files. What happens when memory becomes multimodal?</p>
</section>
<section id="concurrency" class="level3">
<h3 class="anchored" data-anchor-id="concurrency">Concurrency</h3>
<p>If you have a single agent accessing one memory file sequentially, no need to think about this.</p>
<p>If you have a multi-agent system, you want a database before implementing buggy lock mechanisms.</p>
<hr>
<p>We’re just scratching the surface: security concerns, permission management, schema validation, etc. are more arguments for databases over filesystems for agent memory use cases.</p>
<p>I think this is an interesting conversation and I’mm curious to see where it goes.</p>
<hr>
<p>Originally posted on <a href="https://x.com/helloiamleonie/status/2013256958535401503">X</a>/<a href="https://www.linkedin.com/posts/804250ab_digesting-the-current-filesystem-vs-database-activity-7419022671964766208-DG1V?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAABdZ4YQB5f0bhOeOvQJ3YEUtKThe0GEP4tc">LinkedIn</a>.</p>


</section>
</section>

<a onclick="window.scrollTo(0, 0); return false;" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a> ]]></description>
  <guid>https://www.leoniemonigatti.com/blog/filesystem-vs-database-for-agent-memory.html</guid>
  <pubDate>Mon, 19 Jan 2026 00:00:00 GMT</pubDate>
</item>
</channel>
</rss>
