Code Is Lava: What a 48-Hour Hackathon Taught Us About AI-Native Engineering

By Juliano Pereira · Mar 24, 2026

TL;DR — In February 2026, KYP ran a three-day internal hackathon with a deliberately provocative premise: five teams, one real production system to rewrite, two days to build it, AI as the primary engineering force. The theme was “Code Is Lava” — the idea that manually written software ages so fast it might as well be molten, and that the ability to regenerate high-quality software with AI is now the most important engineering skill. The winning team used a language none of them had ever written before. The second-place team spent the entire first day planning with agents and not writing a single line of code. Both outcomes were surprises. Neither should have been.

Why We Did This

KYP is not experimenting with AI-assisted development. We have committed to it. The operating model we have been building — spec-driven workflows, BMAD multi-agent frameworks, organizational context as code — is not a pilot. It is the direction.

But commitment is not the same as capability. You cannot read your way to a new mental model of engineering. You have to build something real, under pressure, with feedback that is immediate and unambiguous.

The hackathon was that forcing function. Not a showcase. Not a team-building exercise. An experiment designed to answer a specific question: what does it actually look like when engineers treat AI as the primary implementation force — and what separates the teams that do it well from the ones that struggle?

Thirty-seven people — engineers and engineering leads — formed five teams and spent two days building the same thing: a complete rewrite of a real internal system with real performance requirements and real architectural complexity. Teams chose their own languages, their own architectural approaches, and their own AI workflows. The only constraint was the spec and the deadline.

The Setup: A Real Problem, Not a Toy

The system we chose to rewrite was selected precisely because it is not simple. It evaluates financial assets by orchestrating calls to multiple external data sources — each with different reliability characteristics, different latency profiles (ranging from milliseconds to over ten seconds), and different failure modes. The architecture you choose for that kind of system reveals your instincts about distributed systems design.

We gave each team documented functional and non-functional requirements, a mock API that simulated real production behavior including latency variance, provisioned infrastructure, and a test dataset for validation. The judging criteria were explicit: architecture quality, extensibility, measured performance, and throughput — assessed objectively from test results, not from slides.

One optional bonus criterion was included: configurable evaluation criticality per asset type. It was harder to implement than the core requirements, and teams that delivered it would have had to plan for it from the start — it is not something you bolt on at the end.

What the Outcomes Revealed

Planning is not the opposite of speed — it is the prerequisite for it

The most counterintuitive result of the event came from the team that spent the entire first day in structured planning with AI agents. Full PRD, epics, sprint breakdown — using the BMAD multi-agent framework before writing a single line of production code. From the outside, it looked like they were falling behind.

They were the only team to deliver the bonus criterion. Fully implemented, correctly scoped, working in the demo.

The mechanism is not mysterious in retrospect. A specification that is precise enough — with well-defined acceptance criteria, explicit constraints, and clear boundaries between components — is something agents can execute against with high fidelity. A vague spec produces confident, well-formatted, wrong code. The team that invested in precision up front did not lose time. They eliminated the rework that imprecision creates.

This is the BMAD insight made concrete: the planning agents are not overhead on the development process. They are the development process. Code generation is the easy part.

Language expertise is no longer a prerequisite for language excellence

The winning team used Go. Not one of them had written Go before the hackathon. In 48 hours, they delivered the most technically mature solution — with dynamic external service routing, circuit breakers, concurrency controls, and production-grade observability — in a language they learned during the event.

This is worth sitting with. We are not saying language expertise is irrelevant. Deep knowledge of a language’s idioms, ecosystem, and performance characteristics still matters. What we are saying is that the cost of acquiring enough fluency to build production-quality software in an unfamiliar language has dropped to 48 hours when AI is doing the implementation.

The implication for how we make technical decisions is significant. Choosing a language based on what the team already knows — rather than what fits the problem — is a weaker argument than it used to be. What the winning team demonstrated is that the constraint is no longer familiarity. It is the quality of the reasoning behind the specification.

Treating external dependencies as untrusted is a production instinct, not an advanced technique

The architectural decision that most clearly separated the top solutions from the rest was how teams handled the external data sources. The sources have wildly variable latency characteristics — some respond in milliseconds, one averages over ten seconds in production. Any architecture that calls them sequentially, or assumes they will behave predictably, fails under real load.

The winning team built dynamic routing with continuous health checking, isolated failure domains, and concurrency controls as first instincts — not as features added after the core was working. They did not need the production failures to teach them this. They reasoned from the spec to the failure modes before writing the code.

Teams that struggled treated the external sources as reliable internal services. When the slow source degraded the test runs, they had no architectural response.

The gap was not technical knowledge. Both groups knew about circuit breakers. The gap was the habit of designing for failure from the first line — and that habit is what we want to see become universal at KYP.

Product thinking shows up spontaneously when the environment rewards it

One of the most noted moments in the final presentations was a debugging flow graph that the winning team had built into their observability setup — a visual, end-to-end trace of how an evaluation request moved through the system, which source calls fired, what they returned, and where time was spent.

Nobody asked for it. The judging criteria did not reward it. The team built it during the hackathon because they wanted to understand what was happening inside their own system.

That is the difference between engineering for the demo and engineering for production. It is also what we mean when we say we are building an AI-native organization — not one where AI generates code faster, but one where the engineers directing the AI are thinking about what it means to operate what they are building, not just to ship it.

What We Got Wrong

Domain understanding cannot be delegated to the AI. The team that struggled most was candid in their retrospective: they started writing prompts before they understood the problem. The result was sequential calls to external sources, an architecture optimized for happy-path scenarios, and a system that could not handle the pressure of the actual requirements. AI amplifies the quality of your understanding — it does not substitute for it. Building a precise spec is not a task you skip to get to the “real” work faster. It is the real work.

We did not make load testing a formal evaluation criterion. The team with the cleanest architecture — hexagonal design, clear separation of concerns, well-structured domain model — did not validate it under stress. They may have had the right architecture and not known it. Or they may have had a design that would have cracked under load. We did not find out. Future editions will include objective load test results as a scored criterion, not optional.

The bonus criterion needed to be framed as a signal from day one. Teams that learned about the optional criticality customization late in the process treated it as a stretch goal. The team that delivered it had planned for it from the beginning — it was not an add-on, it was part of their spec. The lesson: in future hackathons, optional criteria will be presented as signals of product completeness, not as extra credit, so teams weigh them at architecture time.

What This Says About How We Work

The hackathon was not an exception to how we build software at KYP. It was an accelerated, observable version of the principles behind our day-to-day engineering model.

We believe the most important engineering skill in 2026 is not proficiency in a specific language or framework. It is the ability to reason clearly about a problem, decompose it into a specification that is precise enough for agents to execute against, and direct that execution with good judgment about architecture, failure modes, and operational reality. That skill compounds. Every well-specified system produces a better knowledge base for the next one. Every agent workflow that delivers correctly tightens the feedback loop that improves the next specification.

The hackathon also demonstrated something about the kind of engineers we are trying to build and attract: people who are curious about the problem before they are confident in the solution, who build observability for themselves and not for the demo, who say “we did not understand the domain well enough” out loud and treat that as the starting point for improvement, not a failure to hide.

This is what AI-native engineering looks like in practice. Not engineers who use AI tools. Engineers who think about how to work with AI agents effectively — as a craft, with rigor, with honest retrospectives about where the approach broke down and why.

What Comes Next

The hackathon produced five working implementations of a system we are going to actually rewrite. That is not incidental — the solutions are now reference implementations for the architectural tradeoffs we will face in the real project. The best decisions across all five will inform the production design.

We are also carrying the methodology forward:

The BMAD planning-first approach will become a reference workflow for engineering teams beyond the hackathon context
The smart external service routing patterns from the winning solution will be shared as reusable design templates
Load testing will be a formal criterion and first-class deliverable in future editions
We will run a Tech On Tap session specifically on what the planning-first team learned from their BMAD workflow, to make that practice accessible across the organization

The broader goal is not to run better hackathons. It is to reduce the gap between what we demonstrated in 48 hours and what our standard engineering practice looks like on any given Tuesday. That gap is closing. The pace at which it closes depends on how seriously we take the lessons — including the uncomfortable ones.

CERC operates Brazil’s financial market infrastructure for receivables registration. KYP is one of our core product engineering teams, building the AI-native operating model that makes engineering at financial system scale possible. If this kind of environment — high standards, honest retrospectives, agents as first-class engineering participants — sounds like where you want to work, we are hiring.

This post was written by Juliano Pereira — technology leader at KYP/CERC building the infrastructure for AI-native engineering.