Engineering and the Future Operating Model for AI-Driven Enterprises

Much of the current discussion around AI in engineering is still focused on capability: which model performs best, which coding agent is improving fastest, which benchmark moved again this month.

That is understandable. It is also increasingly the wrong question.

The deeper lesson emerging from agentic software delivery and harness engineering is that structure beats capability. Once models are good enough, the constraint is no longer raw intelligence. It is the operating environment around that intelligence: the workflows, controls, context architecture, verification layers, execution boundaries, and human decision rights that determine whether agents produce reliable outcomes at scale. OpenAI’s own write-up on harness engineering makes this point directly: the bottleneck in its internal Codex work was not model capability but the surrounding tools, abstractions, internal structure, and feedback loops required to make agentic work reliable in practice.

We are seeing the same pattern at Star. As we implement harness engineering and run agentic delivery processes internally, the productivity gains are becoming material: in some workflows, we are seeing 2x to 3x improvements in speed to value and throughput. The operating model is changing as well. What previously required a traditional “two-pizza” sprint team can increasingly be handled by a smaller “one-pizza” team of three to four people working alongside agents. The gain is not simply that agents write more code faster. It is that the right harness allows smaller teams to move with greater precision, stronger feedback loops, and less operational drag.

There is a business lesson here. CTOs should pay attention because harness engineering is exposing something much broader about enterprise transformation: the organizations that will benefit most from agentic AI will not be those with access to the best models alone. They will be those that redesign their operating model to make intelligence executable, governable, and compounding over time. McKinsey has framed the same shift in organizational terms, arguing that in agentic organizations humans will increasingly sit above the loop to steer outcomes and selectively inside the loop where human contact matters most.

What follows are the business lessons I believe leaders should take from agentic SDLC and harness engineering as they rethink enterprise structure, infrastructure, and talent composition.

Custom software development for complex enterprise systems

Whether you have a defined spec or a complex problem that needs scoping, Star is here to help.

Learn more

1. Prioritize infrastructure over model capabilities

The most consistent lesson from harness engineering is that the most capable agent still fails in a badly designed environment.

This is visible both in production engineering and in enterprise AI adoption more broadly. OpenAI found that early Codex progress was constrained not by the model itself but by missing structure around it. Vercel found that simplifying the operating environment around an internal agent dramatically improved results: after removing 80% of the agent’s tools, success improved from 80% to 100%, execution became 3.5 times faster, and token usage dropped by 37%.

The implication for business leaders responsible for their organizations' AI transformation is significant. In multi-step work, small failure rates compound quickly. That is why many AI pilots look promising in isolation but fail under operational load. The issue is rarely the model itself. More often, it is the environment around it that is underspecified. This is also why so many enterprise AI programs stall between pilot and production. Success in production depends on a structured implementation environment. So instead of asking how capable a model is, leaders should ask: what structure will this capability operate inside?

2. Humans must move to the highest-leverage points

One of the clearest operating principles to emerge from harness engineering is humans steer, agents execute. The role of the engineer is shifting upwards toward designing environments, specifying intent, and building feedback loops. The same logic applies across the enterprise. In an agentic operating model, humans sit above the loop to direct outcomes, and step inside the loop only where human contact or judgment materially changes the result.

Many organizations are still automating tasks without redesigning jobs. A recent Deloitte research found that 84% of companies have not redesigned jobs to fit AI, even as automation expectations continue to rise. That makes talent redesign a critical part of any serious AI transformation.

Automating tasks without rethinking roles simply inserts AI into an existing operating model rather than creating a new one. If humans remain the default catch-all for approvals, exceptions, and manual rework, then the organization has not actually built an agentic operating model. Leaders need to define much more explicitly where human judgment creates differentiated value: which decisions remain human-owned, which can be delegated to agents, where escalation is required, and what governance mechanisms make that delegation safe.

3. Knowledge is only an asset if it is machine-legible

In an agentic enterprise, a data strategy is no longer enough. You also need a knowledge strategy. A business cannot run effectively on a mix of structured data and unstructured institutional memory. Both have to be made usable by the system itself. That means knowledge must be codified, discoverable, and machine-legible. Without that, agents are operating against only part of the organization, while the rest remains invisible to them.

Harness engineering exposes a more uncomfortable truth behind this. In most enterprises, a large share of institutional knowledge is already operationally invisible. One of the early lessons from agentic engineering is that a single monolithic reference document fails predictably: it overwhelms context, decays quickly, and cannot be validated effectively. The more durable pattern is a structured knowledge system that can be loaded progressively, scoped to the task, and verified automatically. Put simply, what an agent cannot access effectively in context may as well not exist.

That is a serious challenge for most enterprises. Much of the knowledge that actually governs execution still lives in slide decks, email threads, chat messages, undocumented exceptions, and, most critically, in the heads of experienced employees. While tacit knowledge is valuable, in an agentic environment, context, documentation, and governance are essential to shaping agent execution. This is why knowledge architecture is becoming infrastructure and must be treated as part of the production system.

4. The future operating model is federated: central boundaries, local autonomy

Another important design pattern in harness engineering is that boundaries are enforced centrally while execution remains flexible locally.

OpenAI describes this explicitly in architectural terms: enforce invariants mechanically and avoid micromanaging implementation details. Their repositories use structural rules, custom linters, and dependency constraints to preserve correctness while still allowing agents to solve problems flexibly within those boundaries.

Stripe’s emerging blueprint pattern points in the same direction. Their Minions system combines deterministic nodes with agentic subtasks so that critical steps remain governed while creative or ambiguous work can still be delegated to agents.

For enterprise leaders, this points toward a federated governance model for agentic operations.

You cannot control everything centrally, and you cannot trust everything locally. That means the architecture must be designed to make local freedom safe. Business domains should increasingly own the day-to-day governance of agent-enabled workflows while central teams provide the shared platforms, controls, and oversight.

5. Engineer correction into the system

Correction, learning, and improvement have to be designed into the operating model itself, or they will not happen fast enough to matter. This is one of the most important lessons from harness engineering. Agentic systems do not just accelerate output. They accelerate pattern replication. They learn from what surrounds them and from what gets reinforced in the workflow. That means they compound good practices and bad ones alike.

This changes the economics of control. In a human-centric operating model, organizations can often rely on managerial oversight, experience, and manual review to catch drift over time. In an agentic operating model, by the time a human spots the issue, the pattern may already be embedded across dozens of workflows, outputs, or decisions.

That is why correction has to be engineered into the system itself. Failures need to trigger structured diagnosis, rule updates, validation changes, and stronger enforcement automatically. Feedback has to arrive at the earliest possible point and improvement has to become part of the mechanism of execution.

OpenAI encountered this directly. Their teams found that high-throughput agentic systems introduced drift and low-quality pattern replication that could not be managed through manual cleanup alone. Their answer was continuous “garbage collection” for the codebase: recurring background tasks that identify deviations from golden principles and open targeted refactoring actions. The principle is broader than software. In any enterprise system, once execution becomes machine-scaled, governance has to become continuous and machine-assisted as well.

What’s next: Infrastructure and talent strategy must now be designed together

The most important implication is that infrastructure and talent can no longer be treated as separate transformation tracks.

The shape of work itself is changing. Over the next few years, a growing share of execution will be performed by agents operating as specialist domain experts. Within their scope, they will be fast, scalable, and tireless. But they will not resolve ambiguity, weigh competing stakeholder interests, or decide what actually matters. That work remains human. It will increasingly concentrate around those who can exercise judgment, navigate context, and apply the kind of emotional intelligence that no model can replicate.

That is why infrastructure and talent must be designed in the same conversation. You cannot build an agentic operating model by upgrading tooling on one side and reskilling people on the other. The harness, meaning the environment, controls, and knowledge architecture, and the human operating model, meaning who decides, who coordinates, and who owns judgment, shape each other. When they are aligned, every AI investment compounds. When they are not, powerful agents end up operating inside an organization that is not designed to use them.