What I Built on My Birthday (Instead of Celebrating It at midnight)

Samrat Biswas
Feb 16
14 min read

Updated: 6 days ago

I used to celebrate by pushing my physical limits at a pub; this year, I celebrated by pushing my cognitive limits in a terminal.

Samrat Biswas building an epic mapping analysis tool overnight using Claude Code & Codex on AWS

A few years ago, I would have spent tonight at a rooftop lounge somewhere in Kolkata. Loud music, too many people, lots of drinks and food, and the kind of high-octane celebration where you get intoxicated far beyond reasonable limits. Birthdays meant midnight parties, groups, and groups meant noise, and noise meant you didn’t have to sit with yourself long enough to ask uncomfortable questions about what you were actually building with the year you just finished.

I’m not just that person anymore (though that is still an alter ego relevant in my life). I don’t know exactly when it changed - there wasn’t a single moment. It was more like erosion. The slow wearing away of the idea that celebration requires witnesses. That progress requires permission. That building something real has to wait for Monday.

This is the story of what happened between 11:28 PM on February 12th and 6 AM on

February 13th - the six hours before the birthday really began. And what I found waiting for me on the other side.

If you run a delivery team, manage estimation at an agency, or lead engineering at a company that outsources development, this is probably about you. If you’re a one-person operator who’s figured out that versatility is a career strategy - this is definitely about you.

The Thought That Wouldn’t Leave: Epic Mapping

During the day, I’d been doing what I do - managing operations, reviewing deliverables, making sure the machinery of a 150-person tech team keeps turning. Routine. Except one task kept pulling at something deeper.

I was reviewing an Epic Mapping document. The standard deliverable we use to scope software engagements before estimation - tabs of epics, stories, descriptions, priorities. I’ve reviewed hundreds of these over the years. I know what good decomposition looks like because I’ve seen what bad decomposition costs.

A quick note on the term: "Epic Mapping" isn't standard agile vocabulary but practiced since 2015. I introduced it around 2016 when I started leading teams and it needed a reference. The industry has words for the contents (epics, stories, acceptance criteria) and the processes (estimation, refinement, sprint planning), but nobody had named the document itself: the pre-estimation scoping artifact that tells you whether your decomposition is clean enough to estimate against. User Story Mapping is a visualization exercise for user journeys. An Epic Mapping is a deliverable. I needed a name for the thing I was reviewing hundreds of times a year, so I made one.

But this time, instead of just flagging issues, I found myself thinking about the abstract mechanics of what I was doing. The semantic structure underneath the review. What makes a story “under-decomposed” isn’t that it has too few words - it’s that it conceals decision points. What makes an epic “over-decomposed” isn’t that it has too many stories - it’s that the stories don’t represent independent units of value. These aren’t rules you can write in a checklist. They’re patterns that live in operational intuition, shaped by years of watching estimates blow up because someone hid three integration points inside a single story titled “User Login.”

I kept turning it over throughout the afternoon. Not the specific review - that was done. But the structure of the judgment. The question of whether the methodology in my head could be externalized. Encoded. Made to run without me.

By evening, I knew I wouldn’t be going to sleep early.

The hours between dinner and midnight were not productive in any measurable way. I was pacing. Doing that thing where you open the fridge, stare at nothing, close the fridge, and go back to your desk. Scribbling half-schemas on the back of an envelope. The specific kind of restlessness that happens when a problem has gotten its hooks into the part of your brain that doesn’t shut off on command.

I wasn’t planning to build a product. I was trying to settle an argument with myself about whether the methodology in my head was transmissible - whether the judgment calls I make on decomposition quality could be formalized into something a machine could execute, or whether they were irreducibly intuitive. The kind of question that sounds philosophical until you realize the answer determines whether you’re a consultant or a product company.

11:47 PM - The First Commit: Building a Format-Agnostic Spreadsheet Analyzer

The apartment was quiet. Birthday eve. The argument resolved itself the way these things usually do - not through thinking, but through doing. I opened a terminal, spun up an AWS compute instance, and started building.

I need to be precise about what happened next, because the super lazy narrative would be “I asked AI to code something, and it did.” That’s not what this was. It wouldn’t have been even close.

I’d been pushing and training my team on Cursor and Claude Code - Anthropic’s terminal-based coding agent - for almost a year on various projects. I kinda know what it’s good at and where it breaks. It can generate code at machine speed, but its garbage in, garbage out. It cannot make architectural decisions. It doesn’t know that a two-pass enrichment pipeline - optimized model for structural overview, expensive model for deep per-story analysis - is the right pattern for this domain. It doesn’t know that Common Items in an epic mapping should scale as a percentage of application hours, not get estimated independently. It doesn’t know that confidence tiers on estimates need to be forced, not suggested, because left to their own devices, language models will call everything “medium confidence” and move on.

I know these things because I’ve spent a decade in software delivery operations. The AI knows how to write a streaming SSE endpoint in Node.js. We need each other.

So that’s how it went. Not vibe coding. Not “generate me an app.” A conversation - sometimes an argument - where I brought the business logic, the architecture, the schemas, the domain constraints, and the machine brought the implementation velocity. I’d sketch an approach. It would push back on the data model. I’d explain why the estimate schema needed variance bands, not point estimates. It would show me an edge case in my column detection logic that I hadn’t considered. We’d go back and forth until the code was right. Claude Code (mix of Sonnet 4.5 & Opus 4.6) and OpenAI Codex 5.3 worked hard & PRed each other for most of it. Plus a bit of cursor (minimal) and Gemini (not for coding).

There were stretches where I was waiting for it to finish generating. There were stretches where it was waiting for me to decide how the enrichment pipeline should handle partial failures. Neither of us was the bottleneck for long.

12:43 AM - Epic Mapping, The Analysis Engine

The first thing that worked end-to-end was a Python analyzer with fuzzy column detection. Not everyone’s spreadsheet looks the same - Jira exports have different headers than Azure DevOps exports, which look different from Notion databases dumped to Excel. The analyzer uses keyword scoring to figure out which column is the story title, which is the description, which holds the epic name. It handles newlines embedded inside header cells, compound “Epic / Title” columns, #REF! values where formulas broke.

I pushed it to the AWS instance. Ran unit tests. The column detection was failing on a specific pattern - multi-line headers where “Front-end\nEffort Hours” was being treated as two separate cells. We argued about whether to normalize at the parsing layer or the detection layer. I won that argument. Normalize early, detect on clean data.

Stress tests. Adversarial inputs - empty tabs, duplicate column names, skeleton sheets with headers but no data. Found bugs. Fixed them. Found more. Fixed those too.

By 1 AM, I had a tested analysis engine that could take any reasonable epic mapping spreadsheet and produce a structural assessment.

1:59 AM - Designing Estimation Tiers Around Data Hygiene

By 1:30 AM, the analyzer was wrapped in an Express server with auth, rate limiting, and file upload handling. The architectural decision that mattered wasn’t the plumbing - it was the tier system, designed around a specific insight: the most expensive problem in software scoping isn’t bad estimates - it’s dirty inputs. If your epic mapping has inconsistent column layouts, unnamed stories, duplicate entries across tabs, and broken formula references, no estimator - human or AI - can give you a reliable number. Data hygiene comes first. Always.

So the free tiers - Basic Reformat and Lite Analysis - exist to answer the most basic question every COO should ask before estimation begins: is this data clean enough to estimate against? Most of the time, the answer is no. Now you know that in thirty seconds instead of finding out three weeks into the project. The paid tiers - Deep Review (Sonnet finds decomposition gaps) and Enrich & Decompose (Opus enriches every story with estimates, confidence tiers, and open questions) - only make sense once the foundation is clean.

The free tiers aren’t charity. They’re the front door. Every person who runs a reformat on their backlog and sees thirty broken references they didn’t know existed has just learned something about their process. That learning creates demand for the deeper tiers.

2:45 AM - Error Recovery

This is where the collaboration got interesting.

I was thinking about what happens when Opus returns malformed JSON on a 15-minute enrichment call. In production, that’s a $3-5 operation that just evaporated. The user gets an error. They lose their credit. They lose trust.

Claude Code suggested a JSON salvage function - count open braces and brackets, close them, try to parse. I pushed back: that works for flat truncation but not for nested structures where the bracket order matters. We went back and forth. Settled on a layered approach: salvage for simple truncation, retry with exponential backoff for transient failures, partial save after each tab so the worst case is losing one tab, not the whole run.

Then I added resume-from-crash. If the server restarts mid-enrichment, the completed tabs are in SQLite. The pipeline checks what’s already done and picks up where it left off.

These aren’t features you think of when you’re building a demo. They’re features you think of when you’ve been the person on the other end of a production failure at 3 AM, apologizing to a client while tailing logs in a terminal you opened on your phone in a cab. Which, coincidentally, is exactly what time it was - except this time, I was building the apology out of the system entirely.

3:30 AM - Frontend, Pipeline, Feedback Loop

The next two hours were a blur of React components, email gates, and database schemas - the kind of work that’s hard to narrate because it’s twenty small decisions, not one big one. Drag-and-drop file upload. Step wizard with SSE streaming so users watch their enrichment progress live. Authentication middleware.

At some point around 4 AM, I realized I hadn’t eaten since dinner, which I somehow finished.

I made instant noodles (poured hot water basically), the kind of meal that exists specifically for people who are too deep in something to cook properly - and ate them at my desk while writing lead capture logic. Every person who uploads an epic mapping spreadsheet has a software project that needs scoping. That’s a qualified lead. The tool does the qualification automatically.

By then, the agent was barely able to keep up with my pace. I grew impatient and played a FIFA Season’s match on my Xbox. 15 min and 7 (conceded 3) goals later, I was back.

Fast forward to 5:15 AM, now I was building the part most people would have skipped: the feedback loop. A benchmark upload pipeline where I can feed in completed project data - actual hours versus estimated hours - and the system extracts heuristics that tune future enrichments. Constitution versioning so the rules governing how the AI enriches stories evolve over time, with an audit trail.

I built this because I’ve learned that the hard part of operational tools isn’t the first version. It’s the system that keeps the tool honest after you stop paying attention.

5:40 AM - Why Confidence Tiers Must Be Forced, Not Suggested

Every estimate gets a confidence tier - high, medium, low, speculative. Each tier carries a variance band. High confidence means the estimate is within ±15%. Speculative means the range could be 2x-3x in either direction.

This isn’t the AI guessing at confidence. This is my estimation framework - the one I’ve refined across 130+ projects - encoded into prompt architecture that forces the model to articulate why it’s confident or uncertain. What are the unknowns? What assumptions is the estimate resting on? What questions need answers before this number can be trusted?

The model tried to mark everything “medium” the first time. I actually laughed - out loud, alone, at 5:45 AM - because this is exactly what junior estimators do. They hedge everything to the middle because extremes require conviction, and conviction requires explaining yourself. I restructured the prompt to require specific justification for each tier. We went three rounds before the output matched what I’d produce manually. The AI learned the same lesson my team learns: “medium” isn’t an estimate, it’s an admission that you haven’t thought hard enough.

6:22 AM - The Last Commit

Admin panel deployed. Full test suite passing - thirty Python tests on the analyzer, twenty-seven JavaScript tests on the server. Schema validation. Rate limiter verification. Syntax checks across all eight server modules.

I pushed the final archive to the AWS instance. Forty-three files. Four thousand five hundred lines of production code. A product that encodes a methodology I’ve spent a decade developing, running autonomously on infrastructure I can hand to someone else who needs it.

By now, the first thin sliver of sunlight broke over the rapidly evolving skyline. It was the kind of warmth that feels earned, not just given: a quiet validation that while the city slept, something new had been brought into the world.

I closed the terminal, almost in a trance-like state. Stood on the balcony for a few minutes watching Kolkata decide whether it was night or morning - that grey interstitial hour where the street dogs stop barking and the first autorickshaws haven’t started yet. My eyes burned. My soul content.

It felt good. Not triumphant. Not “I just built a startup” good. More like the feeling after a long run - the quiet kind of good that lives in your body, not your ego.

What This Means for People Like Us

I want to talk about this honestly, because I think it matters for a lot more people than just me.

There’s a particular kind of person in the tech industry - the one who never fit neatly into a single job title. Not purely a developer. Not purely a manager. Not purely a strategist. Some combination of all three, held together by the compulsion to understand how systems actually work, end to end.

We used to be called “generalists,” which was code for “doesn’t specialize enough to be taken seriously.” Then “full-stack,” which at least acknowledged that breadth had value.

Now I’d call us something else: one-person operating companies. People who carry the architecture, the business logic, the delivery methodology, and enough technical skill to build the thing - not just spec it.

For most of the last decade, the industry told us we had to choose our lane. That specialization was the only path to scale. That you can’t be the person who designs the system and the person who codes it and the person who sells it. Pick a lane.

AI is changing the math in present continuous.

The "individual contributor" ceiling just got blown off. A single person with deep domain expertise, multi domain fluency (not always a polymath) and enough technical fluency to orchestrate AI can now produce what used to require a small team. Not because they're superhuman because the communication overhead between brain and output dropped to near zero. No standups. No ticket grooming. No waiting.

Fred Brooks (Brooks's Law) quantified part of this in 1975: communication channels on a project grow as n(n-1)/2. A team of 5 has 10 channels. Double that team to 10 and you don't get double the channels, you get 45 (n(n-1)/2). Most of the "overhead" we complain about in software delivery isn't bureaucracy. It's physics. Necessary infrastructure for a world where implementation required coordination, and coordination scaled quadratically.

What happened last night - six hours, one person, ~4,500 lines of production code, four analysis tiers, a tested pipeline running on cloud infrastructure - was not possible two years ago. Not because the ideas didn’t exist. I had the methodology. I had the frameworks. I had the domain expertise. What I didn’t have was an implementation partner that could keep up with the rate at which I could make decisions.

That’s what AI actually does for people like us. It doesn’t replace the thinking. It removes the bottleneck between thinking and building. The gap between “I know exactly how this should work” and “it works” collapsed from weeks to hours. Not because the AI is smarter than a development team - it isn’t. Because the communication overhead disappeared. No standup. No ticket grooming. No waiting for the frontend developer to finish the sprint before the backend developer can integrate. Just a continuous loop: decide, build, test, revise.

This is what the industry isn’t ready to talk about yet. The most valuable person in the next era of software isn’t the specialist. It’s the versatile operator - the one who knows enough about enough things to direct AI effectively across the full stack. The one whose bottleneck was never knowledge but always bandwidth. AI is bandwidth.

That doesn’t mean specialists become irrelevant. It means the ratio changes. Where you once needed a team of eight to build a product, you now need two or three people who each operate like a team of two/three. The generalist who was not an “expert” at doing one thing is now devastatingly efficient at doing all the things, because the marginal cost of switching between architecture, backend, frontend, devops, and product design just dropped to near zero.

If you’re one of these people - the ones who always had more ideas than execution capacity, who could see the whole system but couldn’t build it fast enough alone - this is your moment. Not because the tools are magic. Because the tools finally match the way your brain already works: across domains, in parallel, at the speed of decision.

What Birthdays Mean Now

I turned a year older at midnight. I was somewhere between the rate limiter implementation and the format-agnostic column mapper. I didn’t notice.

That’s not a flex. It’s a data point about what happens when priorities shift. The version of me that wanted the rooftop bar wasn’t wrong - he was just solving a different problem. He needed to be seen. To be validated. To fill the room with enough noise to believe the year had mattered.

The version of me sitting at a desk at 4 AM, arguing with an AI about whether a streaming timeout should be 30 seconds or 60, doesn’t need that anymore. Not because I’ve transcended the need for human connection - I haven’t, I couldn’t, and I won’t pretend otherwise. But because I’ve found something that the parties never gave me: the quiet satisfaction of watching something take shape that didn’t exist six hours ago. Something real. Something that works. Something that carries forward everything I know, running on a server that doesn’t care what day it is.

I’ll have the usual birthday afternoon. Cake. Dinner with loved ones. Phone calls. The people who matter. That hasn’t changed. What changed is that celebration no longer feels like the point of the day. It’s the rest between sessions.

Tomorrow is Valentine’s Day. I’ll probably spend part of it adding integration tests against real data, or refining the confidence tier calibration, or figuring out whether the enrichment pipeline should let users retry individual failed tabs from the frontend.

Some people may find that boring or sad. I find it honest.

The work continues. That’s the only birthday present that compounds. And I already got the best gift from my 2yr old son, who came running and hugged me tightly while saying something in his language that sounded like happy birthday.

If you’re sitting on a requirement spreadsheet right now, wondering if your stories are the right size or if you have similar ideas - drop me a message.

If this essay hit a nerve, here’s the next step: I can help you learn the patterns, walk the process, and fly into repeatable excellence without the midnight heroics. You may reach out to me at linkedin, substack or samratbiswas.com.

And if you want anything production-grade, secure, fast and scalable, I do not do that alone. My team at Unified Infotech build awesome software that the Monday version of me can sign off on.

Code/Repo (MVP, documented debt included): atomikd-requirement_analyzer

p.s. This was never exposed to the public internet and will be patched before any real users find it.