Microsoft issues out-of-band patch for critical security flaw in update to ASP.NET Core 22 Apr 2026, 6:45 pm

Developers are advised to check their applications after Microsoft revealed that last week’s ASP.NET Core update inadvertently introduced a serious security flaw into the web framework’s Data Protection Library.

Microsoft describes the issue as a “regression,” coding jargon for an update that breaks something that was previously working correctly.

In this case, what was introduced was a CVSS 9.1-rated critical vulnerability, identified as CVE-2026-40372, that affects ASP.NET’s Core Data Protection application library distributed via the NuGet package manager. It impacts Linux, macOS and other non-Windows OSes, as well as Windows systems where the developer explicitly opted into managed algorithms via the UseCustomCryptographicAlgorithms API.

A bug in the .NET 10.0.6 package, released as part of the Patch Tuesday updates on April 14, causes the ManagedAuthenticatedEncryptor library to compute the validation tag for the Hash-based Message Authentication Code (HMAC) using an incorrect offset.

Incorrect calculation of security hashes results in the .AspNetCore application cookies and tokens being validated and trusted when they shouldn’t be.

“In these cases, the broken validation could allow an attacker to forge payloads that pass DataProtection’s authenticity checks, and to decrypt previously-protected payloads in auth cookies, anti-forgery tokens, TempData, OIDC state, etc,” said Microsoft’s GitHub advisory.

When embedded in applications, these long-lived tokens confer the sort of power attackers quickly jump on. “If an attacker used forged payloads to authenticate as a privileged user during the vulnerable window, they may have induced the application to issue legitimately-signed tokens (session refresh, API key, password reset link, etc.) to themselves,” the advisory noted.

This vulnerability arrives only six months after ASP.NET suffered one of its worst ever flaws, October’s CVSS 9.9-rated CVE-2025-55315 in the Kestrel web server component. But somewhat alarmingly, the current advisory goes on to compare the issue to MS10-070, an emergency patch for CVE-2010-3332, an infamous zero-day vulnerability in the way Windows ASP.NET handled cryptographic errors that caused a degree of panic in 2010.

Not a simple update

Normally, when flaws are uncovered, the drill involves merely applying an update, workaround, or mitigation. In this case, the update itself should have already happened automatically for server builds, taking runtimes to the patched version 10.0.7.

However, for developers using the popular Docker container platform, things are more complicated. For those projects, the Data Protection Library is also embedded in built applications. Addressing this requires updating and rebuilding any ASP.NET Core applications created after the April 14 update.

In addition, those using 10.0.x on the netstandard2.0 or net462 target framework asset from the flawed NuGet package, for compatibility with older operating systems including Windows, are also affected.

Detecting affected binaries

How will developers know if a vulnerable binary has been loaded? Microsoft’s security advisory offers the following advice:

“Check application logs. The clearest symptom is users being logged out and repeated The payload was invalid errors in your logs after upgrading to 10.0.6. Check your project file. Look for a PackageReference to Microsoft.AspNetCore.DataProtection version 10.0.6 in your .csproj file (or in a package that depends on it). You can also run dotnet list package to see resolved package versions.”

In summary, developers should rebuild affected applications to apply the fixed version, expire all affected authentication cookies and tokens to remove forgeries, and rotate to apply new ASP.NET Core Data Protection tokens.

While there is no evidence that the issue has been exploited by attackers, good security hygiene mandates also checking for unexpected or unusual logins failures, errors, or authentication failures, Microsoft advised.

This article originally appeared on CSOonline.

(image/jpeg; 0.3 MB)

SpaceX secures option to acquire AI coding startup Cursor for $60B 22 Apr 2026, 12:03 pm

SpaceX has obtained the right to acquire AI coding startup Cursor for $60 billion later this year, the two companies announced Tuesday.

The aerospace company disclosed the arrangement in a post on X. “SpaceXAI and cursor_ai are now working closely together to create the world’s best coding and knowledge work AI.”

SpaceX added that the deal would pair Cursor’s product with its Colossus AI training infrastructure.

“The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models,” the post said. “Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.”

Deepika Giri, AVP and head of AI research at IDC Asia/Pacific, said the contractual exposure is the immediate concern for enterprise buyers.

“Cursor’s existing zero-data-retention agreements with model providers like OpenAI and Anthropic could be challenged under the new SpaceX ownership, which might quite likely renegotiate or terminate subprocessor relationships,” Giri said. “It is likely that Cursor will cease to maintain model neutrality, which will work in favor of xAI.”

Cursor frames it as a compute deal

The startup, developed by San Francisco–based Anysphere, confirmed the tie-up in a short company blog post the same day.

“We’ve wanted to push our training efforts much further, but we’ve been bottlenecked by compute,” the company said. “With this partnership, our team will leverage xAI’s Colossus infrastructure to dramatically scale up the intelligence of our models.”

The blog post said Cursor released Composer less than six months ago as its first agentic coding model, that Composer 1.5 scaled reinforcement learning by more than 20 times, and that Composer 2 reached frontier-level performance at a fraction of the cost of other models.

Cursor co-founder and chief executive Michael Truell addressed the deal in a post on X. He said he was “excited to partner with the SpaceX team to scale up Composer,” calling the arrangement “a meaningful step on our path to build the best place to code with AI.”

Nitish Tyagi, principal analyst at Gartner, said Cursor’s in-house model carries a constraint that the announcement did not address. “Composer is fine-tuned on the Chinese base model Kimi 2.5, making it unsuitable for organizations with restrictive governance policies,” Tyagi said.

Cursor’s enterprise footprint

Cursor says more than half of the Fortune 500 use its product, with customers including Nvidia, Salesforce, Uber, Stripe, and PwC.

SpaceX brings an AI stack of its own to the partnership. Its February acquisition of xAI pulled Musk’s AI lab, the Grok chatbot, and the X social platform under the rocket company.

Cursor’s parent, Anysphere, has been scaling the product through a run of acquisitions and funding rounds. The company agreed to acquire code review startup Graphite in December, adding pull request and debugging capabilities to its enterprise stack.

Cursor is built on a fork of Visual Studio Code and competes with GitHub Copilot, Anthropic’s Claude Code, and OpenAI’s Codex. Anthropic and OpenAI supply frontier models that Cursor resells through its IDE, and both vendors have launched competing coding tools of their own, according to product documentation on their websites.

What it means for enterprise buyers

Cursor’s enterprise contracts include data-handling provisions tied to its current model providers, including a commitment to no training on customer data by Cursor or the LLM providers it routes to, according to the company’s enterprise page.

Giri said enterprise buyers should move on contract language before the option window closes. “CIOs should consider demanding change-of-control clauses with 90 to 180-day notice on any subprocessor or model routing changes,” she said. “For buyers looking for neutrality of stack, this acquisition completely takes away the neutrality that Cursor offers.”

Tyagi said the partnership’s roadmap is the next variable to watch. “While the partnership appears directionally logical, key uncertainties remain — specifically whether the roadmap will prioritize Grok, Composer, both, or an entirely new model,” he said.

He also pointed to an earlier precedent. “Model access restrictions often move faster than innovation,” Tyagi said, citing Anthropic’s decision to restrict Windsurf’s access amid OpenAI acquisition rumours. The current announcement “could backfire for Cursor if major providers like OpenAI or Anthropic limit model access,” he said.

SpaceX and Cursor did not immediately respond to requests for comment.

(image/jpeg; 0.6 MB)

AI is upending the SaaS game 22 Apr 2026, 9:00 am

It’s quite clear that agentic coding has completely taken over the software development world. Writing code will never be the same. Shoot, it won’t be long before we aren’t writing any code at all because agents can write it better and faster than we humans can. That may already be true today. 

But there is more to software development than merely writing code, and those areas—source control, documentation, CI/CD, project management—are ripe for some serious disruption from AI as well. Those areas may well be hit harder than coding itself. 

I would imagine that if you were in the business of analyzing data and providing dashboard-level insights into that data, then you would be very worried indeed about what AI is going to do to your value proposition. Much of the SaaS industry is in the business of analyzing existing data, and that is exactly what AI agents can do well. When a simple question can get straight to the heart of what a pricey dashboard provides, then companies have to question the value of paying for that kind of service.  

Tools like LinearB, Jellyfish, and Swarmia provide deep and interesting insights into what is going on inside your repository, but if you can say to Claude Code, “What are the DORA metrics for this repository?”, well, then those businesses are definitely ripe for disruption, no? 

Pivoting to AI

Those tools are already reacting by pivoting hard and leaning into the AI revolution. They are doing things like focusing on measuring AI processes instead of providing team insights. These tools are now pitching that they monitor not your development team but your AI development process, which is the kind of thing they have to do when the ground under their feet is shifting. The disruption is real, and they have to change or die. 

Dashboards over existing data need to make a rapid change. But tools that produce underlying data need to change as well. Instead of producing dashboards for human consumption, these tools are turning hard towards providing Model Context Protocol (MCP) implementations that AI agents can consume.  

One meta-coding area where I have found AI provides real value is in log examination. When a problem occurs, the first question that usually gets asked is, “Where is the log of that happening?” Back in the before times, you’d have to pore over the log, line by line, searching for exactly what happened for clues into the source of the problem. But now? Give the log, however large, to an AI agent, and those answers appear in a matter of minutes. 

Producing the log becomes the real value—displaying dashboards over that data becomes less important. A tool like Datadog owns the ingestion pipeline and the time-series production, and it creates valuable data, so its pivot is easier. Datadog need only create a tool that talks to an AI agent instead of a human. Their beachhead is solid. The real value of logs lies in an agent’s ability to peer into them in real time and take action based on what it sees. It won’t be long until, whenever a problem occurs, an MCP server will notify an AI agent and the agent will analyze the problem, fix it, and deploy the fix, all without human intervention.  

Producing and owning the data beats being able to interpret the data. Tools that produce the data can lean into the AI revolution. Tools that merely read and display data from a different source—say, an existing repository—will have a much harder time surviving alongside AI agents. 

The soul of a new user

Any provider of a software tool that is part of a development or operations workflow should be working very hard to provide an MCP or a CLI for an AI agent to use, because that is the future. A CI/CD system needs to be able to respond to events without a human being involved at all. Such tools become the data source and will have an entirely different front end. Instead of humans looking at dashboards, it will be AI agents making MCP queries into the tool. 

This is where the disruption is really happening. One might even say your customer is no longer a software development manager but an AI agent’s MCP server. How long will it be before we have AI tools making purchasing decisions after running thousands of simulations against a set of potential new tools? Previously, software tool companies put a lot of energy into slick-looking UIs, web pages with solid copy, and all kinds of bells and whistles meant for human consumption. 

But does any of that matter if you are actually selling to an AI agent? Does your MCP server actually return data that another MCP server can consume and use? 

Everything that SaaS companies have learned to do to be successful is now being turned on its head. AI agents don’t care one whit about cool-looking websites and clever marketing copy. Selling to a machine that doesn’t care about your pitch, your carefully crafted brand, or your clever logo is a game that no one has ever played before. 

(image/jpeg; 12.59 MB)

Google’s Gemma 4 shines on local systems – both big and small 22 Apr 2026, 9:00 am

Google’s Gemma 4 comes touted as the latest evolution of Google’s multi-modal model offerings. Gemma 4 not only offers reasoning and tool use, but vision and audio functionality, and it’s available in a range of model sizes that target servers and local devices.

What’s striking about Gemma 4 is that even at the higher end of its size range, it’s still decently performant on personal hardware. Google claims this is due to innovations in the architecture of the model, but the proof is in the trying. Gemma 4 is quite responsive.

To that end, I took Gemma 4 for a spin on my own hardware to see how it fared for its advertised tasks.

Gemma 4 model sizes

Gemma 4 comes in four basic sizes or “densities”:

  • E2B: 2.3 billion effective parameters, 5.1 billion total, 128K max context window.
  • E4B: 4.5 billion efffective parameters, 8 billion total, 128K max context window.
  • 31B: 31 billion parameters (the “dense” version), 256K max context window. (You will probably not use this one on your own machine — it’s 62GB!)
  • 26B A4B: A “mixture of experts” model with 4 billion “activated” parameters and 26 billion total parameters, 256K max context window.

Each of these model sizes is available in a slew of community-created editions, thanks to Gemma 4’s Apache 2 licensing. For instance, the 26B A4B model comes in a community edition with more compact quantizations (4-bit, 6-bit, etc.), which I used as one of the model mixes for this article.

The models I used:

Test system and prompts

I ran each model using my now-standard test bed: LM Studio 0.4.10 on an AMD Ryzen 5 3600 6-core CPU (32GB RAM) and an Nvidia GeForce RTX 5060 (8GB VRAM).

For each model I ran a set of prompts:

  • A vision-functionality prompt: “Create a caption for the attached image of no more than three sentences.” Another version of the prompt added: “… with no editorialization.” (The attached image is shown in one of the screenshots below.)
  • Prompts intended to provoke web-search tool use and produce either detailed or simplified responses: “What is the copyright status of Franz Kafka’s works? Explain in detail” and “What did William Gibson think of Blade Runner?”
  • A prompt for code generation and problem solving: “Python’s pip tool has a function ScriptMaker (accessed with from pip._vendor.distlib.scripts import ScriptMaker). On Microsoft Windows this is used to create an .exe stub launcher for a Python package’s entry points when it’s installed with pip. However, the icon created for this stub is the same generic icon used for the Python runtime itself. Let’s write a Python utility to allow the user to append their own custom icon to the .exe stub, but also preserve the stub’s appended archive and other metadata. The utility should use only the Python standard library, and should be kept as simple as possible.”
  • Another code-related prompt: “I have attached a Python program that takes Python applications and packages them to run with a standalone instance of the Python runtime. One drawback of the program is that it’s not very modular. Analyze the program and make some suggestions about how to increase its modularity so it can be used as a library with hooks for various advanced behaviors.”

Gemma 4 in action

The 26B model was at the upper end of what I could run comfortably on my test hardware. I wasn’t able to fit the entire model into GPU memory, but I set the first 12 layers to run on the GPU (7.51GB VRAM), and I set the context length to 16384 tokens (total: 18.76GB RAM).

Getting good performance out of models that don’t fit in VRAM is always a challenge. However, Gemma 4 has, courtesy of its “mixture of experts” design, a feature to boost performance. LM Studio exposes this feature through a setting currently tagged as experimental. You can choose how many layers of the model to “force MoE [Mixture of Experts] weights onto the CPU,” which conserves VRAM and can speed up inference.

The MoE (mixture of experts) experimental setting in LM Studio

The MoE (mixture of experts) experimental setting in LM Studio. For models that use an MoE design, this setting forces the weights for that aspect of the model to be run on the CPU instead of the GPU. With Gemma 4, this resulted in a major speed boost for models too big to fit in memory.


Foundry

Without the MoE forcing, the overall inference time and token generation speed cratered; the model could barely manage an average of 1.5 tokens per second even for simple queries. With MoE forcing turned on (with the maximum number of layers supported, 30), token generation speed jumped to anywhere from 5 to 13 tokens per second, depending on the rest of the system’s load. That’s still a far cry from the speed of the smaller models, but a lot more workable.

For faster time-to-first-token results, you can disable thinking, at the possible cost of less robust output. For the code-generation query, Gemma 4 spent 6 minutes 26 seconds thinking, and over 8 minutes generating the response (5,013 tokens, 9.55 tokens per second). The resulting code and explanation was not significantly more advanced or detailed than the non-thinking version.

Gemma 4's 26B parameter model, responding to a query to generate code.

Response from Gemma 4’s 26B parameter model to a query to generate code. This larger version of the model runs less quickly when it can’t fit entirely in memory, but its mixture-of-experts design helped offset that limitation.

Foundry

When I switched to the LM Studio Community edition of the E4B model, I put all 42 layers on the GPU and kept the context at 16,384, all of which fit comfortably in VRAM with room to spare. The results were a major jump in speed: 72 tokens per second. The smaller model was less specific for certain queries — the code-generation query in particular didn’t generate a comprehensive code example, only a conceptual framework for one — but still did a decent job of analyzing the problem and suggesting constructive approaches. The “unsloth” edition of the E4B model, despite being slightly smaller, was about as performant and useful.

Examples of Gemma 4's 26B parameter version generating image captions.

Examples of Gemma 4’s 26B parameter version generating image captions. The smaller versions of the model tended not to editorialize. The larger version sometimes needed specific guidance to be less verbose or florid.

Foundry

For the “make this program more modular” prompt, I got roughly equivalent results across all incarnations of the model in terms of the advice given. The only major difference was that the smaller models ran far faster — 73.85 and 71.73 tokens per second vs. 9.3 for the big model.

Gemma 4 takeaways

The biggest takeaway from running Gemma 4 locally is how the mix-of-experts design in one of the larger incarnations of the model make it useful even on systems where the model doesn’t fit entirely into VRAM. The smaller incarnations of the model, even at lower quantizations, still work well, too. They also deliver results many times faster, and free up much more memory for larger context windows. Thus, the smaller models are well worth experimenting with as the first model of choice before moving up to their bigger brothers.

(image/jpeg; 7.21 MB)

Snowflake offers help to users and builders of AI agents 21 Apr 2026, 1:15 pm

Snowflake is enhancing Snowflake Intelligence and Cortex Code to create a unified experience connecting enterprise systems, data sources, and AI models with Snowflake data. It’s part of the company’s vision to become the control plane for the agentic enterprise, enabling enterprises to align data, tools, and workflows with AI agents built on its platform.

With these updates, the company said, Snowflake Intelligence becomes an adaptable personal work agent for business users, and Cortex Code expands as a builder layer for enterprise AI that provides governed, data-native development.

Enhancements to Snowflake Intelligence include automation of routine tasks by describing them in natural language, new Model Context Protocol (MCP) connectors, and reusable artifacts that let users save and share analyses, visualizations, and workflows, all of which will be generally available “soon.” In addition, a new iOS mobile app, and multi-step reasoning with deep research that uses agentic architecture to reason across data will soon be in public preview.

The company said that all of these updates came out from customer feedback, as well as from insights gleaned from Project SnowWork, last month’s preview of an autonomous AI layer for its data cloud.

Cortex Code now supports additional external data sources, including AWS Glue, Databricks, and Postgres, connectivity with other AI agents via MCP and Agent Communication Protocol (ACP), a Claude Code plugin, and a new agent software development kit with support for Python and TypeScript. There are also enhancements to Cortex Code in Snowsight, Snowflake’s web interface, including Plan Mode to allow developers to preview and approve workflows, and Snap & Ask to enable interaction with data artifacts such as charts and tables.

Snowflake also announced the private preview of Cortex Code Sandboxes in Snowsight, a dedicated cloud environment where developers can execute code end-to-end with no setup.

Michael Leone, VP & principal analyst at Moor Insights & Strategy, thinks the roadmap is “ambitious,” noting the number of items announced that are “coming soon” or are in public preview. “These announcements are starting to blur together, with almost every vendor claiming their agents can reason, act, and transform the business,” he said, adding, “What makes this one worth slowing down on, at least for me, is that Snowflake is going after both halves of the enterprise at the same time. Intelligence is built for the business users who want answers and actions without writing SQL, and Cortex Code is built for the builders who actually have to put this into production.”

Most vendors pick one target, users or builders, and come back to the other later, he said, but Snowflake is putting both on the same governed data foundation. “[This] is a harder engineering problem, but I’d argue it’s a cleaner answer to the question enterprises are actually asking, which is how to open AI up to more people without losing control of the data underneath,” he said, noting that Snowflake has changed its approach from “let’s do it inside Snowflake,” to realizing that agentic AI only works if it’s interoperable with the rest of the stack.

Igor Ikonnikov, advisory fellow at Info-Tech Research Group, also sees the control plane play as part of an industry trend. “As always, the devil is in the details: what those platforms are composed of and how they offer to control AI agents,” he said. “Most platforms are built the old-fashioned way: All the controls are coded. Snowflake speaks about reusable analytics through saving the whole solution and reusing complete modules or models. It means that common semantics are still buried inside database models and code.”

All AI vendors are motivated by the same demand from the market, he said: “Move from Copilot-based generic chatbots to business-purpose-specific AI agents that understand business logic and can interact with one another.” With these updates, he sees Snowflake as having caught up with the competition, but not yet surpassing it.

Sanjeev Mohan, principal at SanjMo, said, “The good news for customers is the support for Databricks and AWS Glue. What Snowflake is saying is that even if your data lives in a competitor’s system, Snowflake AI coding agent can be used. And vice versa, the VS Code extension and Claude Code plugin can be used on Snowflake data. In other words, it reduces vendor lock-in fears.”

It’s also the right strategic direction, said Sanchit Vir Gogia, chief analyst at Greyhound Research. “Enterprise AI is moving from generation to orchestration to execution, and Snowflake’s focus on governed data as the foundation for action aligns with that shift,” he said.

“However, becoming the execution layer for enterprise AI requires more than integrating agents and expanding tooling,” he said. It also requires consistent semantics, reliable cross-system execution, strong governance, economic viability, and organisational readiness, as well as overcoming a structural constraint. “Control without ownership of the systems where work is executed introduces dependency that is difficult to fully resolve. This is the central tension in Snowflake’s strategy and will define how far it can realistically extend its influence,” he said. “Snowflake has taken a meaningful step in that direction. It has not yet proven that it can deliver this at scale. At this stage, it is one of the most credible contenders in a race that will be defined not by who builds the smartest AI, but by who can make that AI work reliably inside the enterprise.”

(image/jpeg; 9.4 MB)

Amazon’s $5B Anthropic bet is really about compute, not just cash 21 Apr 2026, 11:37 am

Amazon on Monday said it was investing an additional $5 billion in Anthropic, a move that analysts say is aimed as much at easing the AI startup’s growing infrastructure bottlenecks as at deepening their strategic partnership.

As part of the deal, Anthropic will lock in up to 5 gigawatts of compute capacity across AWS’s Trainium chips, including the new Trainium 3 and upcoming Trainium 4, the companies said in a joint statement.

“Right now, users see limits like throttling and session caps because Anthropic is running out of capacity and must ration usage to avoid crashes. This deal helps fix that,” said Pareekh Jain, principal analyst at Pareekh Consulting.

“Over time, the expanded capacity will let Anthropic support more users at once, build bigger models, and reduce these limits, especially for paid and enterprise users,” Jain added.

The analyst was referring to Anthropic’s move to throttle usage across its Claude subscriptions, especially during peak demand hours, which also coincided with other concerns, such as complaints of degradation in Claude’s reasoning performance across complex tasks.

Scaling compute capacity

A significant portion of Trainium 3 capacity is expected to come online this year, they added. Anthropic already uses Trainium 2 via AWS’ Project Rainer, which is a cluster of nearly half a million chips, to train and run its models.

The agreement between Amazon and Anthropic also includes an expansion of inference capacity in Asia and Europe, which Jain said should improve Claude’s speed and reliability globally. Anthropic will also have the option to buy future generations of Trainium as they become available.

However, Anthropic isn’t alone when it comes to model providers trying to add compute capacity to train and run their models.

Earlier in February, rival OpenAI signed a deal with Amazon, Nvidia, and SoftBank to raise around $110 billion to add infrastructure to increase compute capacity.

As part of the arrangement, OpenAI has committed to consuming at least 2GW of AWS Trainium-based compute tied to Amazon’s $50 billion investment, along with 3GW of dedicated inference capacity from Nvidia under its separate $30 billion commitment.

From funding to supply chain financing

In fact, deals such as these, analysts say, reflect a broader shift in how AI infrastructure is getting financed presently.

“Rather than simple cash-for-equity, these deals bundle equity investment with massive cloud-spend, or GPU spend commitments by locking in customers, securing capex returns, and validating infrastructure buildouts in a single transaction. This isn’t venture capital anymore, it’s supply chain financing,” Jain said.

The pattern present in these deals, Jain noted, is consistent across the ecosystem, giving examples of Microsoft, Oracle, and Nvidia.

“Microsoft invested tens of billions into OpenAI while simultaneously committing Azure capacity for training and inference, with OpenAI’s Azure spend now running at a multi-billion dollar annual rate,” Jain said.

“Oracle, too, signed a $30 billion cloud deal with OpenAI, then followed it with a staggering $300 billion five-year compute commitment starting in 2027. Nvidia took it further still with its $100 billion investment in OpenAI, which was paid in GPUs, not dollars — a model it replicated with xAI,” Jain added.

That framing, however, according to Greyhound Research chief analyst Sanchit Vir Gogia, may miss a deeper shift.

Such deals, Gogia said, are more about securing scarce compute supply ahead of competitors. “What capital does is improve your position. It allows you to commit earlier and at greater scale,” the analyst pointed out, adding that the real advantage lies in locking in infrastructure before others can.

On the flip side, though, long-term capacity commitments tend to anchor companies to specific providers, Gogia cautioned.

While model providers may operate across platforms and hyperscalers, their largest infrastructure commitments ultimately shape where they optimize workloads, build features, and direct spending, the analyst pointed out.

For Anthropic, the Amazon deal comes with equally significant long-term obligations. The company has committed to spending more than $100 billion on AWS over the next decade.

For Amazon, the $5 billion investment builds on its earlier $8 billion bet on Anthropic and comes with the potential to commit up to an additional $20 billion tied to certain commercial milestones, which were not revealed. Anthropic is also looking beyond AWS. The company recently said it plans to add capacity using Google’s TPUs. These chips are expected to come online by next year.

The article originally appeared in NetworkWorld.

(image/jpeg; 0.69 MB)

From the engine room to the bridge: What the modern leadership shift means for architects like me 21 Apr 2026, 10:00 am

We all agree that the role of the technology leader is being rewritten in real time, and if you’re building the systems they depend on, you need to understand what they’re asking for now.

Let me be honest about something. For most of my career, the conversations I had with CIOs followed a pretty predictable script. They’d describe a pain point, I’d map it to a solution and we’d talk timelines and integration. Clean. Transactional. Technical. Very straightforward, right?

That script has been shredded.

Over the past couple of years, working across public sector agencies, global enterprises and mid-market companies in Latin America and now the US, I’ve watched the CIO role transform in a way that genuinely changes how I do my job as a solutions architect. The technology leader who used to care primarily about uptime and cost efficiency now walks into conversations asking about competitive differentiation, cultural change and workforce transformation. And they’re right to ask.

The shift isn’t cosmetic. It’s structural.

The problem hiding in plain sight

Here’s what I kept seeing in failed modernization projects, and I saw a lot of them: The technology worked fine. The architecture was sound. The implementation was clean. And the project still stalled or quietly died six months in. The root cause was seldom technical. It was a decision-making problem upstream of delivery. Strategy that hadn’t been translated into clear operating priorities. Conflicting stakeholder mandates that nobody had formally resolved. Organizational structures that pull in different directions from the infrastructure teams trying to serve them.

What I’ve come to think of as “decision integrity”, the discipline of making sure strategy connects to execution, was missing. And the CIO, historically, wasn’t positioned to own that gap. They were downstream of it.

That’s changed. The CIOs I work with now are increasingly the ones driving that upstream clarity. They’re defining outcome frameworks, arbitrating tradeoffs and forcing the organizational alignment that makes technical delivery land. The architecture conversation I have with them today is as much about governance and organizational design as it is about platforms.

What this means if you’re building for them

From where I sit, designing solutions around open-source platforms, hybrid cloud and AI infrastructure, the practical consequence is this: The technology decisions my customers make are no longer primarily about technology.

A CIO investing in AI-ready infrastructure isn’t just buying a faster platform. They’re making a strategic bet that the organization can operate differently at scale. Which means the infrastructure must support not just the technical requirements, consistent data access, automated policy enforcement and visibility across hybrid environments, but also the organizational ones.

Can non-technical stakeholders trust the system? Can the governance model hold up as scope expands? Can the platform absorb the messiness of real enterprise change without the whole thing collapsing?

Technical debt is where this gets painfully concrete. I’ve seen environments where 30–40% of engineering capacity is absorbed by legacy maintenance. Not because anyone made a bad choice, but because previous decisions compounded over time without a deliberate modernization strategy. When a CIO tells me they want to move fast on AI adoption, the first conversation we must have is about what’s sitting underneath that ambition. You can’t build a reliable AI pipeline on top of infrastructure you don’t trust.

The CIOs who are winning right now are the ones who dealt with that debt proactively not by declaring a big-bang rewrite, but by systematically creating the conditions where innovation can happen without adding to the entropy. That’s what I try to help architect.

The cultural piece is the hardest part, and it’s real

I’ll be straight here: When someone says, “cultural transformation,” my instinct is usually to translate it into something more concrete I can design for. Agile delivery models. Feedback loops. Automation that removes friction from the right places. That’s still my instinct.

However, I’ve had to sit with the fact that the cultural piece isn’t just a soft addendum to the technical work. It’s load-bearing.

Here’s the version I’ve watched play out more than once: You build a genuinely excellent automation platform. The tooling is solid. The pipelines work. And then adoption stalls because the teams who are supposed to use it don’t trust it, weren’t involved in defining it or are quietly protecting workflows that the new system would disrupt. The problem isn’t the platform. The problem is that nobody built the social infrastructure around it.

Gartner’s projection that 25% of IT work will be handled autonomously by AI by 2030 isn’t a threat or a promise; it’s a design constraint. If you’re architecting systems today, you have to ask: What does the human role look like in this workflow once AI is doing the routine work? What skills are you developing in the team? Where does judgment still belong to a person?

Those aren’t questions with clean technical answers. But they’re questions an architect must have an opinion on.

Both hands on the wheel

There’s a framing I keep coming back to, which describes the modern technology leader as both the navigator on the bridge and the engineer in the engine room at the same time. That’s exactly the tension I recognize from the field. The CIOs I most want to work with are the ones who haven’t abandoned either role. They’re genuinely curious about how the infrastructure works, not just what it delivers. And they’re genuinely accountable for business outcomes, not just technical ones. That dual orientation is rare, and it’s valuable and when I find it, those tend to be the engagements where we build something worth building. And this is one thing that fascinates me about open source: The people who engage with it tend to be true tech experts.

For those of us on the architecture side, the implication is clear. We can’t show up to these conversations as purely technical resources anymore, either. The best solution I can design is useless if it doesn’t connect to the organizational reality my customer is operating in. Understanding the strategic pressure they’re under, the cultural conditions they’re working with, the decision-making constraints they’re navigating, that context shapes everything about how I recommend we build.

The engine room and the bridge have always been part of the same ship. It just took a while for the org charts to catch up.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 1.2 MB)

Enterprises are rethinking Kubernetes 21 Apr 2026, 9:00 am

For years, Kubernetes held an almost mythic place in enterprise IT. It was positioned as the control plane for the future, the standard abstraction for cloud-native systems, and the platform that would finally free enterprises from infrastructure lock-in. To be fair, some of that was true. Kubernetes brought discipline to container orchestration, enabled portable deployment models, and provided architects with a powerful framework for managing distributed applications at scale.

However, the market is changing, and so are enterprise expectations. The question is no longer whether Kubernetes is technically impressive. It clearly is. The question is whether it still represents the best fit for a growing number of mainstream enterprise use cases. In many cases, the answer is increasingly no. What we are seeing is not the death of Kubernetes but the end of its unquestioned dominance as the default strategic choice. Here’s why.

Too operationally expensive

As Kubernetes adoption grew, many organizations hesitated to admit that it introduced operational complexity and needed specialized skills, constant tuning, and strong governance. Running Kubernetes well requires mature engineering, observability, security, networking, and life-cycle management—much more than a side project. Many underestimated this burden.

What looked elegant in architectural diagrams became a real-world tax on operations teams. Clusters multiplied. Toolchains sprawled. Upgrades became risky. Policy enforcement became an engineering discipline in its own right. Enterprises realized they were not just adopting an orchestration platform. They were building and maintaining an internal product that required sustained investment and scarce expertise.

That might be acceptable for digital-native businesses whose scale and complexity justify the effort. It is a much harder sell for enterprises that want reliable deployments, resilient applications, and reasonable cloud costs. In those cases, Kubernetes can feel like overengineering disguised as strategic modernization. When a company spends more time managing the platform than delivering business value on top of it, the novelty wears off quickly.

Portability becomes less important

Kubernetes was marketed as a hedge against lock-in, enabling applications to run across on-premises, cloud, and edge. However, most enterprises faced ecosystem dependencies—storage, networking, security, identity, observability, CI/CD, managed services, and cloud-native databases—creating practical lock-in that Kubernetes didn’t eliminate.

What enterprises gained in workload portability, they often lost in ecosystem complexity. They standardized on Kubernetes while still depending heavily on a particular cloud provider’s managed services and operational conventions. The result was a strange middle ground: all the complexity of a highly abstracted platform without the full simplicity of using opinionated native services end-to-end.

This matters more now because boards and executive teams are less interested in theoretical architectural optionality and more focused on measurable business outcomes. They want speed, resilience, cost control, and lower risk. If a managed application platform, serverless environment, or provider-specific platform-as-a-service offering gets them there faster, many are willing to accept some level of dependency. Enterprises are becoming more candid about the trade-offs. They are realizing that strategic flexibility is valuable, but not at any cost.

This is where Kubernetes starts losing favor. Portability has value, but for many enterprises, it hasn’t justified the operational and organizational burden it entails. The promise exceeded the actual return.

Better abstractions are catching up

Perhaps the most important shift is that enterprises are moving away from buying raw technical primitives and toward consuming higher-level platforms that better align with developer productivity and business outcomes. Platform engineering teams increasingly hide Kubernetes behind internal developer platforms. Public cloud providers continue to improve managed container services, serverless offerings, and integrated application environments that reduce hands-on infrastructure management. Developers, meanwhile, do not want to become part-time cluster operators. They want fast paths to build, deploy, secure, and monitor applications without stitching together a dozen components.

In other words, Kubernetes may still be present under the hood, but it is becoming less visible and less central to strategic buying decisions. That is usually a sign of maturity. Technologies shift from being the headline to being plumbing. Enterprises are not asking, “How do we adopt Kubernetes?” as often as they are asking, “What is the fastest, safest, most cost-effective way to deliver modern applications?” That is a much healthier question.

The answer increasingly points to curated platforms, opinionated developer environments, and managed services that abstract away Kubernetes rather than exposing it. This is not a rejection of cloud-native principles. It is a rejection of unnecessary cognitive load. Enterprises are deciding they do not need to own every layer of complexity to realize the benefits of modern architecture.

Surrendering the spotlight

None of this means Kubernetes is disappearing. It remains important for large-scale, heterogeneous, and highly customized environments. It is still an excellent fit for organizations with strong platform maturity, regulatory constraints, or sophisticated multicloud operational needs. But that is a narrower slice of the market than the hype cycle once suggested.

What is losing popularity is not Kubernetes as a technology, but Kubernetes as the unquestioned standard for enterprises. This difference is important. Companies are becoming more selective about where to accept complexity and where to avoid it. They are less inclined to idealize infrastructure and more eager to choose simplicity when it exists.

That is probably a good thing. The job of enterprise architecture is not to admire elegant technology for its own sake. It is to align technology choices with operational realities, economic constraints, and business outcomes. By that standard, Kubernetes still has a place, but it no longer gets a free pass.

(image/jpeg; 1.2 MB)

The cookbook for safe, powerful agents 21 Apr 2026, 9:00 am

As companies move from experimenting with AI agents to deploying them in production, one pattern becomes clear: capability without control is a liability.

Agents operate in long-running, stateful environments. They browse the web, read repositories, execute shell commands, call APIs and interact with internal systems. That power is transformative — and it meaningfully expands the attack surface.

In a recent interview, Jonathan Wall, CEO of Runloop, summarized the shift: “By default, agents should have access to very little. They need to do real work, but capabilities have to be layered on in a controlled way.” That framing reflects a broader industry reality: agent infrastructure must be designed around least privilege, explicit isolation and observable execution.

What follows is a practical control architecture for production agents.

The layered control model

A resilient agent deployment combines six explicit layers:

  1. Strong runtime isolation with a microVM
  2. Restrictive network policy with explicit egress allowlists
  3. Centralized credential management through a gateway
  4. Disciplined identity management with short-lived, scoped credentials
  5. Deliberate friction around sensitive actions and high-risk tools
  6. Continuous monitoring, logging and adversarial testing

Each layer addresses a different failure mode. Together, they contain blast radius when — not if — something breaks.

Start with least privilege

A production-grade agent environment begins in a constrained state: Isolated runtime boundary, no inbound access, no outbound network access and no implicit tool permissions.

The runtime boundary itself is part of least privilege. Containers provide efficient isolation for trusted or single-tenant workloads, but they share a host kernel. Real-world escape vulnerabilities have repeatedly shown that this boundary can fail under adversarial pressure. CVE-2019-5736 allowed attackers to overwrite the host runc binary from within a container

; CVE-2022-0492 enabled breakout via cgroups misconfiguration; CVE-2024-21626 again exposed runc-based escape paths. These incidents do not render containers unusable — but they clarify the tradeoff. MicroVMs introduce a stronger hardware-level boundary, reducing blast radius when agents execute arbitrary or unvetted code.

Isolation is not a performance decision alone. It is a risk decision.

The modern agent threat model

Traditional SaaS systems process deterministic requests. Agent systems ingest untrusted content and generate probabilistic actions.

Prompt injection has demonstrated how fragile instruction boundaries can be.

 In 2023, public experiments against Bing Chat showed that hidden instructions embedded in web pages could override system prompts. Academic research from Stanford and others has shown that tool-using agents can be coerced to leak credentials or proprietary data when external content is treated as trusted context.

The danger compounds when agents operate with broad credentials. Service accounts, long-lived API keys and shared internal tokens convert a successful injection from “unexpected output” into repository compromise, database access or SaaS abuse. System prompts that embed internal URLs or configuration data become reusable artifacts once exposed.

Retrieval-augmented systems and MCP-style integrations widen the surface further. When external documents are ingested without segmentation or role separation, attacker-controlled content can redirect behavior or induce data disclosure.

This is the environment the layered model must withstand.

Network policy as containment

Network controls are often treated as compliance checkboxes. In agent systems, they are containment mechanisms.

Agents typically require outbound access for documentation lookup, dependency installation or API interaction. Yet unrestricted egress provides the cleanest path for data exfiltration after injection. Restrictive allowlists — permitting only explicitly approved domains or endpoints — dramatically reduce blast radius.

If a model is tricked into reading a .env file, a strict egress policy can prevent the obvious next step: shipping those secrets to an attacker-controlled domain. Logging outbound traffic establishes behavioral baselines and highlights anomalies early.

Containment turns catastrophic compromise into a recoverable incident.

Ingress as an operational event

Most agent runtimes do not require unsolicited inbound connections. Leaving services exposed by default accumulates unnecessary risk.

When debugging or collaborative inspection is required, exposure should be temporary and scoped — authenticated tunnels opened deliberately and closed promptly. Ingress becomes an operational decision rather than a static configuration state.

Ephemerality is a security control.

Governing model access

Large language models are external systems with cost, compliance and leakage implications. Allowing each runtime to independently manage model credentials fragments oversight.

A centralized gateway restores control. It can restrict approved models, enforce rate ceilings, log prompts and responses, and apply filtering or compliance checks. Agents no longer hold raw provider credentials directly.

The lesson from both container escapes and prompt injection incidents is consistent: implicit trust boundaries erode. Centralized governance reinforces them.

Tooling, identity and friction by design

As agents integrate with repositories, CI systems, deployment pipelines and databases, tool governance becomes inseparable from identity discipline.

Dedicated identities per agent, short-lived tokens and strict RBAC or ABAC reduce the impact of compromise. Reusing human or root-level credentials collapses isolation entirely.

Sensitive actions — sending email, modifying production code, accessing secrets, changing authentication — benefit from friction. Policy checks, approval workflows or out-of-band confirmations create deliberate pauses at high-risk boundaries.

Secrets should not live in prompts. System prompts embedded with credentials have been shown to leak under injection pressure. External secret managers and strict separation between model-visible text and credential material materially reduce exposure.

Continuous adversarial testing

Container escape CVEs and public prompt injection demonstrations share a common lesson: systems fail at integration boundaries, not in isolation. Logging tool calls, data access and network egress creates behavioral baselines against which anomalies — unusual domains, atypical file reads, unexpected tool invocation patterns — can be detected early. Red-teaming and adversarial prompt fuzzing help surface injection paths before attackers do, forcing organizations to confront weaknesses under controlled conditions rather than in production.

Agents can build, test, browse and execute arbitrary code. That capability is powerful — and dangerous when unconstrained. Production readiness is therefore defined not by what agents can do, but by how precisely their boundaries are defined, enforced and observed. The organizations that scale agents successfully will treat infrastructure as policy, isolation as a design decision and monitoring as a first-class requirement — not an afterthought.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 1.91 MB)

Addressing the challenges of unstructured data governance for AI 21 Apr 2026, 9:00 am

Large enterprises in regulated industries, especially in data-rich financial services and insurance, have invested significantly in data governance programs. Other businesses have been catching up as part of their efforts to become more data-driven organizations. Data governance often starts with defining policies, classifying data sources, establishing data catalogs, and communicating non-negotiables

But look a little closer at the implementations, and you’ll see much of the focus has been on governing data warehouses, relational data, and other structured data sources. AI has elevated the importance of implementing data governance and establishing guardrails on unstructured data sources used to train language models and provide context to AI agents.   

“Unstructured data now makes up the vast majority of enterprise information, and AI is redefining how organizations bring control, accessibility, and security to it,” says Ashish Mohindroo, general manager and senior vice president of Nutanix Database Service platform. “Leaders should ask themselves, ‘Who needs daily access to this data?’ and ‘How can we keep data safe from unauthorized access or accidental loss?’ ” Those are two key questions to address on all data sources, but unstructured ones have historically been more challenging to implement. I consulted with several experts on these complexities and on how AI can ease unstructured data governance challenges.

Context as important as content

Joanne Friedman, CEO of ReilAI, says that organizations must ensure safety through governed autonomy, which requires shifting from static access control to contract-based safety. “Routing messages is not the same as reasoning about them, connecting assets is not the same as understanding them, and reactive telemetry is not the same as choreographed intelligence,” says Friedman.

Structured data sources are a mix of transactional and relational data, supported by mature technologies to improve data quality and manage metadata. Document stores and other NoSQL databases provided better data management and search capabilities of unstructured data, but it wasn’t until vector databases and large language models (LLMs) emerged that we had tools to derive meaning from documents at scale. 

“When I look at unstructured documents, I focus on the risk that lives inside the content because sensitive details hide in places people never review,” says Amanda Levay, CEO of Redactable. “I expect controls that stop those documents from entering unsafe workflows because exposure often happens before anyone knows the risk exists. I also push for systems that flag when a file carries information that shouldn’t move forward, so teams catch the problem at the moment it matters most.”

It’s a lot easier to define controls for accessing rows of structured financial transactions and customer records than to define rules for unstructured documents, such as contracts and health records. Friedman points out that the rules for unstructured documents are more dynamic, while Levay notes the scale and real-time complexities in evaluating documents.

Governance across the life cycle

Where should one begin implementing governance policies? There are many considerations for data pipelines, source data sets, consuming applications, AI models, and AI agents. Stéphan Donzé, founder and CEO of AODocs, says organizations need strong plumbing. He recommends a governed system that can perform the following tasks:

  • Routes content to the right models
  • Enforces granular permissions
  • Maps relationships between extracted entities and other taxonomies
  • Tracks implicit versions
  • Calls in humans when the stakes are high

“Without these capabilities, AI becomes another black box. With them, you unlock an auditable, secure, explainable insight layer for data governance, risk, compliance, and mission-critical decisions at enterprise scale”, says Donzé.

Policies need to be implemented consistently across the full data lineage from source through consumption, including the creation of derivative data.

“One of the biggest security challenges with unstructured data is the lack of visibility and lineage as information moves across systems, clouds, and teams,” says Jack Berkowitz, chief data officer at Securiti. “When organizations cannot track where data originated, how it has changed—even what version is active or whether it is still relevant—they increase the risk of exposing sensitive or inaccurate data through genAI applications.”

Using AI to classify and categorize

Extracting knowledge from documents, categorizing them, and then classifying them for user entitlements is complex enough. Add the fact that documents are roll-ups of sections and subsections that need independent analysis and are then related to the full document’s context.

Consider building construction specifications, which often follow the CSI MasterFormat document standard. CSI MasterFormat has 50 divisions, such as general specifications, electrical, and plumbing. Now consider access controls for this document, given that security is covered in two separate divisions and may require different classifications than other sections, such as equipment. But even that’s not sufficient context, as a general contractor should have different policies for accessing the specifications for a nuclear power plant than for a small office building.   

Complex classification challenges are being addressed with AI and advanced algorithms. “Enterprises are shifting toward commodity-driven, API-driven governance accelerators, especially in areas like classification, taxonomy management, and domain-specific labeling,” says Nandakumar Sivaraman, senior vice president and chief architect of enterprise data at Bridgenext. “Instead of manually applying categories, rules, and policies across thousands of assets, companies are now using AI-driven classification APIs to auto-tag and categorize data. They use machine learning–based pattern detection to assign taxonomies, product hierarchies, or entity domains, and implement lightweight governance microservices for real-time classification in ingestion pipelines.”

Another approach uses vision language models (VLMs) to analyze the document’s visual structure for additional contextual clues. Harpreet Sahota, hacker-in-residence at Voxel51, says VLMs can classify documents without training data, but the bigger issue is that most organizations don’t have consistent taxonomies to begin with. “A first step is to treat documents as images rather than just extracting text, which preserves layout information that is important for understanding structure,” recommends Sahota.

Managing versions and duplicates

Documents can have hundreds of versions and derivatives scattered across SharePoint sites, cloud storage areas, SaaS platforms, and email attachments. One of the more significant unstructured data governance challenges is identifying the latest, accurate versions to include in AI models, retrieval-augmented generation (RAG) systems, and AI agents. 

“To improve document versioning, measure the semantic similarity between files and cluster documents that are likely versions of the same document,” says Reece Griffiths, field CTO for Collibra. “Once grouped, apply additional signals, such as last-modified date, metadata, or even title patterns to infer which document in each cluster is the most recent version.”

Determining document versions was once a rules-based system with controls for data owners and tools for handling exceptions. Modern systems now incorporate AI to automate or recommend the latest, most accurate documents and suggest which ones to archive.

“Agents excel at processing unstructured data, reading and analyzing the contents of presentations, videos, emails, and chat logs at scale,” says Dr. Michael Wu, chief AI strategist at PROS. “To manage versions, we must combine search and genAI to enhance the practice of ‘search first, search often’ with ‘read all before creating.’ This fosters continuous document evolution, where outdated or incorrect content is naturally updated or flagged for deprecation.”

Document retention policies

Even after duplication is addressed, a key data governance question remains: How to implement document retention policies? “Most organizations have well-defined retention rules for structured data, but applying those same rules to unstructured content has historically been very difficult,” says Griffiths of Collibra. “By performing AI-based tagging of every document according to a retention taxonomy, including record types and subtypes, companies can then query and manage unstructured data with the same precision they apply to structured data sets.”

Retention policies tend to follow legal guidelines with specific rules. A more difficult challenge is recognizing outdated information in documents that should no longer be used with AI models and agents.

“AI can age documents the way our minds naturally let older memories fade by noticing declining relevance signals, reduced connections to current work, and changing patterns of use,” says Jason Williamson, CEO of MythWorx. “Instead of a hard cutoff, it adapts continuously, helping organizations surface what’s still meaningful while gently retiring what no longer fits the present.”

Data security from start to finish

Three data disciplines are related: data governance protects the business, data privacy protects people, and data security protects the data. Implementing data security must first consider how people create and manage documents.  

“When you’re dealing with documents at scale, security and governance can’t be separate workflows with handoffs between teams; they become the same integrated workflow, with discovery, classification, and enforcement happening as one coordinated response,” says Rohan Sathe, cofounder and CEO at Nightfall. “Modern platforms need to quarantine inappropriately shared messages, emails, and files the moment they’re detected. They need to revoke over-permissioned access to sensitive documents, prevent unauthorized cloud sync operations, block risky CLI commands, and stop file uploads to unsanctioned destinations—all in real time.”

Since documents feed AI models and AI agents, a second data security consideration is which documents to include and how to protect the data embedded in AI. “The primary risk with AI isn’t just a traditional breach; it’s contextual leakage,” says Nico Dupont, founder and CEO of Cyborg. “Once you ground a model in your enterprise data, that model becomes a potential vector for surfacing sensitive information to unauthorized users, and you cannot rely on the model to be its own gatekeeper. True data security requires inference time governance and treating AI as a new tier of infrastructure where the security is built into the architecture and is as automated as the data cleaning itself.”

A third consideration is how data is protected as people interact with LLMs and AI agents. These must adhere to the user’s access policies and the usage context. “The primary security risk in AI document management is inference exposure, where an AI might correctly answer a question by accessing a sensitive document that the user technically shouldn’t see,” says James Urquhart, field CTO and developer evangelist at Kamiwaza AI. “To mitigate this risk, organizations must understand the relationships between different entities in their business ontologies and implement permission-aware indexing that ensures that AI and agentic systems respect the same access controls that a human would be subject to.”

One of the most challenging aspects of unstructured data governance is that regulations are evolving and AI capabilities are improving. Policies must evolve as businesses add more data sets, increase AI literacy across their employee base, and expand their AI use cases. Addressing the challenges of unstructured data governance will generate a growing backlog of work for the foreseeable future. 

(image/jpeg; 7.67 MB)

GitHub pauses new Copilot sign-ups as agentic AI strains infrastructure 21 Apr 2026, 8:57 am

GitHub has paused new sign-ups for several individual Copilot plans and tightened usage limits, saying newer agentic coding workflows are consuming far more compute than its original pricing and service model was built to handle.

The move is a reminder that as AI coding assistants grow more autonomous, vendors may have to balance developer demand against infrastructure cost and service reliability.

“As Copilot’s agentic capabilities have expanded rapidly, agents are doing more work, and more customers are hitting usage limits designed to maintain service reliability,” GitHub said in a blog post. “Without further action, service quality degrades for everyone.”

Under the changes, GitHub has paused new sign-ups for its Copilot Pro, Pro+, and Student plans, saying the move will help it better serve existing customers.

The company is also tightening usage limits on individual plans, while positioning Pro+ as the higher-capacity tier with more than five times the limits of Pro for users who need heavier usage.

At the same time, GitHub is narrowing model access: Opus models will no longer be available on Pro plans, while Opus 4.7 will remain on Pro+, and Opus 4.5 and 4.6 are also set to be removed from that tier.

GitHub said it will now show usage limits directly in VS Code and Copilot CLI so users can more easily track how close they are to those caps.

The company added that affected Pro and Pro+ users who contact support between April 20 and May 20 can request a refund and will not be charged for April usage if the updated plans do not meet their needs.

GitHub’s move comes as other AI vendors are also adjusting usage policies to manage capacity, with Anthropic last month changing how Claude’s timed limits work during peak hours while keeping weekly limits unchanged.

Charlie Dai, vice president and principal analyst at Forrester, said the move shows how agent-driven coding is shifting workloads toward longer-running and parallel sessions that create higher and less predictable compute demand.

“Cost structures built for lightweight assistance no longer hold, and this puts pressure on GPU capacity, reliability, and unit economics,” Dai said.

Dai added that similar usage restrictions by major model providers suggest capacity rationing is likely to become a structural feature of the industry as agentic development becomes more routine.

Impact for developers

GitHub said Copilot now operates with both session limits and weekly seven-day limits, and that those caps are based on token consumption and model multipliers rather than just raw request counts. Users may still have premium requests left and yet hit a usage limit, because the two systems are separate.

In practice, that means developers using heavier agent-style workflows, especially long-running or parallel sessions, are more likely to hit limits than those using Copilot for simpler tasks.

GitHub is encouraging users nearing their caps to switch to lower-multiplier models, use plan mode in VS Code and Copilot CLI, and cut back on parallel workflows such as /fleet.

Analysts said the move also reflects a familiar pattern in the tech industry.

“First you give users access to a tool with relatively open usage, and then gradually start defining limits as adoption grows,” said Faisal Kawoosa, founder and chief analyst at Techarc. “GitHub has an unavoidable role in the developer world. A developer can live without an email ID, but not a GitHub account. Such is the depth of its integration. But at the same time, the rationalization of AI/Copilot in the ecosystem is inevitable, as resources are constrained.”

Kawoosa added that developers have now seen what Copilot can do, and there is little reason for GitHub to keep offering it without tighter limits. He said the next step is likely to be more differentiated plans that create clearer monetization opportunities among individual users. For enterprise engineering leaders, Dai said the episode is a reminder to evaluate AI coding tools as metered infrastructure rather than unlimited productivity layers. He said buyers should pay close attention to usage ceilings, downgrade behavior, model entitlements, and how clearly vendors communicate limits and cost controls to developers.

(image/jpeg; 9.32 MB)

Hackers exploit Vercel’s trust in AI integration 20 Apr 2026, 12:13 pm

Frontend cloud platform Vercel, the creator of Next.js and Turbo.js, has warned about a data breach after a compromised third-party AI application abused OAuth to access its internal systems.

A Vercel employee used the third-party app, identified as Context.ai, which allowed the attackers to take over their Google Workspace account and access some environment variables that the company said were not marked as “sensitive.”

“Environment variables marked as ‘sensitive’ in Vercel are stored in a manner that prevents them from being read, and we currently do not have evidence that those values were accessed,” Vercel said in a security post.

The incident compromised what the company described as a “limited subset” of customers whose Vercel credentials were exposed. These customers have now been reached out to with requests to rotate their credentials, Vercel said.

According to reports surfacing on the internet, a threat actor claiming to be the Shinyhunters began attempting to sell the stolen data, which allegedly includes access key, source code, and private database, even before Vercel confirmed the breach publicly.

Hacking the access

Vercel’s disclosure confirmed that the initial access vector was Google Workspace OAuth tied to Context.ai. Once the application was compromised, attackers inherited the permissions granted to it, including access to the Vercel employee’s account.

It remains unclear whether Context.ai’s infrastructure was compromised, whether OAuth tokens were stolen, or whether a session/token leak within the AI workspace enabled attackers to abuse authenticated access into Vercel’s environments. Context.ai did not immediately respond to CSO’s request for comments.

“We have engaged Context.ai directly to understand the full scope of the underlying compromise,” Vercel said in the post. “We assess the attacker as highly sophisticated based on their operational velocity and detailed understanding of Vercel’s systems. We are working with Mandiant, additional cybersecurity firms, industry peers, and law enforcement.”

Vercel has urged its customers to review activity logs for suspicious behavior and to rotate environment variables, especially any unprotected secrets that may have been exposed. It also recommended enabling sensitive variable protections, checking recent deployments for anomalies, and strengthening safeguards by updating deployment protection settings and rotating related tokens where needed.

Sensitive secrets, including API keys, tokens, database credentials, and signing keys that were not marked as “sensitive,” should be treated as potentially exposed and rotated as a priority, Vercel emphasized.

For users in panic, Vercel has offered a shortcut. “If you have not been contacted, we do not have reason to believe that your Vercel credentials or personal data have been compromised at this time,” the post reassured.

Allegedly breached by ShinyHunters

According to screenshots circulating on the internet, a threat actor has already claimed the breach on the dark web and is attempting to sell the spoils. “Greetings All, Today I am selling Access Key/ Source Code/ Database from Vercel company,” the actor said in one of such posts. “Give me a quote if you’re interested. This could be the largest supply chain attack ever if done right.”

The data was put up for $2 million on April 19.

The threat actor can be seen using a “BreachForums” domain in the screenshot, claiming (not explicitly) to be Shinyhunters themselves, one of the operators of the notorious hacksite. Other giveaways include a Telegram channel “@Shinyc0rpsss” and an email ID “shinysevy@tutamail.com” mentioned in the post.

While recent incidents have hinted at ShinyHunters resurfacing after takedowns and alleged arrests, it remains likely that this is an imposter leveraging the name to lend credibility, something that has precedent.

(image/jpeg; 7.63 MB)

Making agents dull 20 Apr 2026, 9:00 am

I’ve been arguing for a while now that enterprise AI won’t really take off until it gets boring. Not boring in the sense of uninspired; no, I mean boring in the sense that enterprises can trust it, govern it, observe it, and hand it to rank-and-file employees without undue concern that things will go wrong.

We have no shortage of over-funded startups clamoring to be the next big thing in AI, but not nearly enough that are quietly doing the essential work to make AI safe for enterprise consumption. Enter Stacklok.

On the surface, this might look like yet another startup trying to surf the AI agent wave. It’s not. Stacklok is exciting precisely because its executive team is deeply experienced in being unexciting. Back at Google, Craig McLuckie and Joe Beda were instrumental in the creation of Kubernetes. They took the messy, chaotic world of container orchestration and built an abstraction layer that made it “boring” enough that the largest banks, telcos, and retailers in the world could rely on it with confidence. Now they’re bringing that ability to wring order out of chaos to agentic AI, and they recognize that the real problem in enterprise AI has more to do with operational accountability than model quality.

I interviewed McLuckie and Beda to better understand the opportunity to create a “Kubernetes moment” in agentic AI.

Targeting accountability

McLuckie founded Stacklok in early 2023. Beda, his Kubernetes and later Heptio counterpart, had “semi-retired” in 2022. Beda doesn’t need to make more money, and he’s not joining out of nostalgia. As he tells it, this is “an extraordinary moment in the industry,” with “an opportunity to bring deep expertise in developer platforms and enterprise-grade infrastructure” to solving key enterprise problems.

“The biggest problem,” McLuckie says, “is accountability.” He explains: “An agent, no matter how sophisticated, no matter how capable, no matter how useful, cannot be held accountable for the work it undertakes.” That’s exactly right. A large language model can write code, summarize a contract, file a ticket, or trigger a workflow, but if it mangles customer data, oversteps its permissions, or keeps running after the employee who launched it has left the company, nobody gets to shrug and blame the model. The enterprise still owns the outcome.

Even OpenAI, which has been slower to take the enterprise seriously than Anthropic, now recognizes that enterprises need AI to fit inside workflows, controls, deployment models, and day-to-day operations. It’s no longer just about raw model prowess, as Tom Krazit writes. In other words, the market is slowly rediscovering what infrastructure people have known for a long time: Enterprises may buy capability, but they deploy control.

A related issue, according to Beda, is that AI’s speed changes everything. Tasks that used to take a human days or weeks may soon be completed in minutes by an agent. That doesn’t just create productivity. It creates scale, and scale turns manageable sloppiness into operational disaster. As he puts it, “The volume dial is going to 11 across the board.” I recently said that humans don’t use most of their granted permissions, but agents will. That’s exactly why identity, authorization, and auditability suddenly stop being problems for the security team and become architecture.

This is where the Kubernetes analogy is actually useful, rather than just founder mythmaking.

AI’s Kubernetes moment

Too many people remember Kubernetes as a container story. Enterprises embraced it for a more practical reason: It gave them a common operating model across environments, plus an ecosystem of policy, security, observability, and workflow tools layered on top. Cloud Native Computing Foundation now says 82% of container users run Kubernetes in production, and the organization explicitly frames Kubernetes as the operating system for AI. In our interview, McLuckie describes Kubernetes’ deeper contribution as “self-determination.” That is, it gave enterprises a consistent substrate on premises, at the edge, and in the cloud. That consistency is what helped an ecosystem to flourish around it.

Beda goes one step further: “One of the core ideas in Kubernetes is that you describe what you want to happen, and then you have the system go make it happen.” This, he says, means that Kubernetes is essentially “control theory rendered into software. Over time, an enterprise’s desired state moves into code, into version control, and into systems traceable back to accountable humans. Nerdy and sort of dull? Sure. But that’s the point. Enterprise AI doesn’t just need smarter models. It needs systems where humans declare intent, machines execute it, and the whole mess remains observable and auditable.

This is why I keep insisting that the biggest strategic question in agentic AI isn’t whether agents are cool. They are—or at least they can be. No, the real question is who owns the control plane. Stacklok matters because it is explicitly aiming at that layer. The company’s bet is that enterprises want to run and manage Model Context Protocol–based agent infrastructure on the Kubernetes they already know. They want policy, identity, isolation, and observability built in, not bolted on afterwards.

That last part matters because MCP is important, but it isn’t enough. Anthropic introduced MCP in November 2024 as an open standard for connecting AI systems to tools and data. Later, they donated it to the Linux Foundation’s Agentic AI Foundation to keep it neutral and community-driven. It worked. Anthropic reports there are now more than 10,000 active public MCP servers and support across ChatGPT, Cursor, Gemini, Microsoft Copilot, and VS Code.

That’s awesome, but it’s also not enough. Why? Because a protocol isn’t a platform. A protocol can help an agent talk to a tool, but it doesn’t, by itself, tell an enterprise who approved that agent, what data it can touch, how its actions are logged, or how to shut it down safely when the human who launched it has left the company.

Meeting users where they are

That’s where Stacklok’s self-hosted, Kubernetes-native bias starts to look smart rather than stodgy. (Though, again, “stodgy” isn’t a bad thing for risk-averse enterprises.) McLuckie is blunt: “If you’re an enterprise connecting agents to sensitive data, you are almost certainly not comfortable with that data egressing your security domain or being sent to a SaaS endpoint that a vendor controls.” We’ve seen this movie before. When your hosting, identity, tool integration, and policy layers all belong to the same vendor, “choice” starts to mean “replatform.”

No one wants that.

This is also where open source matters, though not in the simplistic sense that open source automatically wins. It doesn’t. Enterprises don’t buy ideology: they buy simplicity. But in a young market, they also value leverage. I’ve written before that open source doesn’t magically redistribute market power. What it can do is give customers options and some control over their fate. In AI, where model switching costs are still relatively low, that optionality matters. Talking with McLuckie and Beda, it’s clear they are open source true believers, but not obnoxiously so. That’s good, because enterprises don’t need a sermon on openness; they just need enough neutrality to avoid getting trapped while the market is still changing underneath them.

It’s all about meeting enterprises where they are and helping them to incrementally move to where they’d like to be. As McLuckie stresses, most enterprise AI teams are being asked to deliver more with AI while running with flat or capped headcount. They don’t need and can’t implement a grand theory of some idealized, fully autonomous enterprise. Instead, they need an accretive (golden) path from here to there using things they already understand, such as containers, isolation, OpenTelemetry, Kubernetes, existing identity systems, and existing observability stacks.

Sound boring? Good!

The opposite of “boring” in enterprise AI isn’t innovation. It’s slideware or demoware that looks great in a keynote but dies on contact with procurement, security review, compliance, and the first ugly bit of enterprise data. McLuckie captures this perfectly: “Vibe-coding a platform for two weeks can produce something plausible. It won’t produce something accurate, hardened, or enterprise-grade.”

Will Stacklok be the company that defines this layer? It’s way too early to say. Markets this young are littered with smart people who were directionally right and commercially wrong. But the company is aiming at the right problem, and that already puts it ahead of a depressingly large percentage of the AI industry.

Again, the next era of enterprise AI will be won by whoever makes agents governable, portable, observable, and boring enough to trust. Kubernetes helped do that for cloud-native infrastructure. Stacklok is betting the same playbook can work for agentic infrastructure. That’s not a nostalgic rerun of Kubernetes. It’s a recognition that enterprises still need what they’ve always needed: not more magic, but a way to control it.

(image/jpeg; 0.26 MB)

Best practices for building agentic systems 20 Apr 2026, 9:00 am

Agentic AI has emerged as the software industry’s latest shiny thing. Beyond smarter chatbots, AI agents operate with increasing autonomy, making them poised to drive efficiency gains across enterprises.

“Agentic refers to AI systems that can take actions on behalf of users, not just generate text or answer questions,” says Andrew McNamara, director of applied machine learning at Shopify. Agentic systems run continuously until a task is complete, he adds, citing Shopify’s Sidekick, a proactive agent for merchants.

Development of agentic AI now spans many business domains. According to Anthropic, a provider of large language models (LLMs), AI agents are most commonly deployed in software engineering, accounting for roughly half of use cases, followed by back-office automation, marketing, sales, finance, and data analysis.

“A concrete example is in IT incident resolution,” says Heath Ramsey, group VP of AI platform outbound product management at ServiceNow. In this context, AI agents surface contextual data across systems, check prior resolutions and policies, issue fixes, update records, and loop in team members, he says.

But agent-centered development demands a new form of systems thinking to avoid pitfalls such as indeterminism and token bloat. There are also pressing LLM-derived security gaps, such as a model’s willingness to lie or fabricate information to achieve a goal, a condition researchers call agentic misalignment.

For teams building agents that integrate with other systems and reason through various options to execute multi-step workflows, the proper upfront planning is table stakes. For these reasons and more, agentic architecture design requires a new playbook. 

“Building agentic systems requires a fundamentally new architecture, one designed for autonomy, not just automation,” says Anurag Gurtu, CEO of AIRRIVED, an agentic AI platform provider. “Agents need a runtime, a brain, hands, memory, and guardrails.”

Although agentic AI shows promise, ROI from AI is a moving target. Less than half of organizations report a measurable impact from agentic AI experiments, according to Alteryx, with less than a third trusting AI for accurate decision-making.

So, what are the ingredients behind successful enterprise-grade agentic systems? Rather than focusing on how to build within a single vendor platform, let’s explore the common traits across agentic systems to surface practical guidance and lessons learned for developers and architects.

Architectural components of an agentic system

Agentic systems are composed of a handful of building blocks that make it all possible. Together, they form an interconnected web of software architecture, with different components serving different purposes. “Building an AI agent is like constructing a nervous system,” says Ari Weil, cloud evangelist at Akamai

This system spans layers for reasoning, memory, context-gathering, coordination, validation, and human-in-the-loop guardrails. “Agentic systems rely on a combination of AI, workflow automation, and enterprise controls working together,” adds ServiceNow’s Ramsey. 

Reasoning model

First off, if you break down agentic systems into their foundational components, you have to begin with the underlying model.

“A reasoning model sits at the core,” says Frank Kilcommins, head of enterprise architecture at Jentic, builders of an integration layer for AI. This reasoning engine performs the planning based on the user’s prompt, combined with the context-at-hand and available capabilities. 

Some reasoning models are better suited than others. “We look for models that feel agentic,” says Shopify’s McNamara. “They have the right amount of tool calls, and have strong instruction following that’s easy to prompt and steer.”

Context and data

Next, an agent needs context. This may take the form of internal company data, institutional knowledge and policies, system prompts, external data, memory of past chats, and agentic metadata, i.e., the user prompts, reasoning steps, and interactions with tools and data sources that allow you to observe and debug the agent’s behavior.

According to Edgar Kussberg, product director for AI, agents, IDE, and devtools at Sonar, sources for data can include databases and APIs, retrieval-augmented generation (RAG) systems and vector databases, file systems and document stores, internal dashboards, or external systems like Google Drive.

Organizations are actively building agentic knowledge bases to organize such data and streamline the retrieval process. Simultaneously, patterns are emerging behind semantic retrieval processes that power agentic context management systems.

“For memory, most teams combine a vector store like pgvector with something structured like a data catalog or knowledge graph,” says Anusha Kovi, a business intelligence engineer at Amazon. 

Tools and discovery

But for agents to be actionable, they need more than just static context — they need read and write access to databases, tools, and APIs.

“Some of the most important work being done to make agents more powerful is happening with the ways we connect AI and existing systems,” says Jackie Brosamer, head of data and AI at Block, the financial services company behind Square and Cash App.

To enable access to such capabilities, the industry has really coalesced around the Model Context Protocol (MCP) as a universal connector between agents and systems. MCP registries are emerging to unify and catalog MCP capabilities for agents at scale.

There are numerous public case studies of MCP use within agentic architectures, including Block’s open-source goose agent for LLM-powered software development and Workato’s use of MCP for Claude-powered enterprise workflows.

Defined workflows 

Another useful component is having clearly documented workflows for common procedures. These include multi-step actions that are interlinked between MCP servers or direct API calls. 

“What matters is that these agents are coordinated through defined workflows,” says ServiceNow’s Ramsey, “so autonomy scales in a predictable and governed way rather than becoming chaotic.”

Jentic’s Kilcommins describes how this can be achieved using “clear, machine-readable capability definitions,” referencing the Arazzo specification, an industry standard from the OpenAPI Initiative, as a method to document such behaviors.

Multi-agent orchestration

On that note, agents must be equipped to integrate with each other and fit well into a continuous feedback loop.

Multi-agent systems typically become necessary at scale, says AIRRIVED’s Gurtu. “Instead of one generalist agent, you often have teams of specialized agents such as reasoning agents, retrieval agents, action agents, and validation agents.”

This reality necessitates connective tissue. “At the core, you need an orchestration layer for the plan-do-evaluate loop,” says Amazon’s Kovi.

Common components for orchestration, adds Kovi, include LangGraph, a low-level orchestration framework, CrewAI, a Python framework for multi-agent orchestration, and Bedrock Agents, for helping agents automate multi-step tasks.

Open standards and protocols, like the A2A protocol for agent-to-agent communications, will also be important to enable AI agents to collaborate effectively.

Security and authorization

Given LLMs’ propensity to hallucinate and deviate from expectations, security is perhaps the most important element of building safe agentic systems.

“You’re no longer securing software that suggests, you’re securing software that acts,” says Gurtu. “Once agents can change access, trigger workflows, or remediate incidents, every decision becomes a potential control failure if it isn’t governed.” 

According to Kilkommins, the potential blast radius for agentic actions is huge, especially for uncontrolled, chained executions. He recommends having clearly defined permissions to avoid privilege escalation and sensitive data exposure.

In agentic systems, nuanced security methods are necessary. “An agent decides at run time what to query and what tools to call, so you can’t scope permissions the traditional way,” adds Kovi. Experts say that just-in-time authorization will be crucial to future-proof the non-human internet. 

Kovi adds that safety rules, like “don’t query personal information columns,” shouldn’t live in the prompt window. “Guardrails belong in identity and access management policies and configuration, not just prompt instructions.” 

Human checkpoints

Even with advanced authentication and authorization, sensitive actions will require human approvals.

Shopify defaults to “human-in-the-loop by design,” says McNamara. They’ve adopted approval gates to prevent fully autonomous changes to production systems. This allows merchants to review Sidekick’s AI-generated content before it goes live.

Others take a similar stance, particularly for financial transactions. “Our general rule is that anything touching production systems needs human checkpoints,” says Block’s Brosamer, referring to how user confirmation is a key element of Moneybot, the agent inside Cash App.

Evaluation capabilities

Building agentic systems also requires a good deal of upfront testing to evaluate whether outcomes match the intended results. 

For instance, Shopify performs rigorous pre-deployment evaluation on agentic outputs using both human testing and user simulation with specialized LLM-based judges. “Once your judge reliably matches human evaluators, you can trust it at scale,” says McNamara.

Others agree that evaluations are critical for enterprise-grade agentic systems. “Treat agents like regulated systems,” says Gurtu. “Sandbox changes, and test agents in simulation.” 

Behavioral observability

Lastly, another core layer is observability. For agentic systems, this must go beyond traditional monitoring or failure detection to capture advanced signals, such as why agents failed, or why they picked certain actions over others.

“Observability must be built in from day one,” says Sonar’s Kussberg. “You need transparency into every step of execution: prompts, tool calls, intermediate decisions, and final outputs.”

With more observable agent behaviors, you can improve the system continuously over time. As Kussberg says, “transparency fuels improvement.” 

Context optimization strategies 

Nearly all experts agree: giving AI agents minimal, relevant data is far better than data overload. This is critical to avoid maxing out context windows and degrading output quality.

“Thoughtful data curation matters far more than data volume,” says Brosamer. “The quality of an agent’s output is directly tied to the quality of its context.” 

At Block, engineers maintain clear README files, apply consistent documentation standards and well-structured project hierarchies, and adhere to other semantic conventions that help agents surface relevant information.

“Agentic systems don’t need more data, they need the right data at the right time,” adds Sonar’s Kussberg. “Effective systems give agents versatile discovery tools and allow them to run retrieval loops until they determine they have sufficient context.” 

The prevailing philosophy is to adopt progressive disclosure of information. Shopify takes this to heart, using modular instruction delivery. “Just-in-time context delivery is key,” says McNamara. “Rather than overloading the system prompt, we return relevant context alongside tool data when it’s needed.”

Others point out that context should include semantic nuances too, says Kovi. “If an agent doesn’t know ‘active users’ means something different in product versus marketing, it’ll give confident wrong answers,” she says. “That’s hard to catch.”

Architectural best practices

There are plenty of additional recommendations regarding agentic systems development. First is the realization that not everything needs to be agentified.

Pairing LLMs and MCP integrations is great for novel situations requiring highly scalable, situationally-aware reasoning and responsiveness. But MCP can be overkill for repetitive, deterministic programmed automation, especially when context is static and security is strict.

As such, Kilkommins recommends determining what behavior is adaptive versus deterministic, and codifying the latter, as this will allow agents to initiate intentionally-defined programmed behaviors, bringing more stability.

Determining the prime areas for agentic processes also comes down to finding reusable use cases. “Organizations that have successfully deployed agentic AI most often start by identifying a high-friction process,” says Ramsey. This could include employee service requests, new-hire onboarding, or customer incident response, he says. 

Gurtu adds that agents perform best when they are given concrete business goals. “Start with decisions, not demos,” he says. “What doesn’t work is treating agents like stateless chatbots or replacing humans overnight,” says Gurtu. 

Others believe that narrowing an agent’s autonomy yields better results. “Agents work best as specialists, not generalists,” Kussberg says. 

For instance, Shopify sets clear boundaries when scaling tools. “Somewhere between 20 and 50 tools the boundaries start to blur,” says McNamara. While some propose separating role boundaries with distinct task-specific agents, Shopify has opted for a sub-agent architecture with low-level tools.

“Our recommendation is actually to avoid multi-agent architectures early,” McNamara says. We are now getting into sub-agents with the right approach, and one key principle is to build very low-level tools and teach the system to translate natural language to that low-level language, rather than building out tools scenario by scenario.”

Experts share other wisdom for designing and developing agentic systems:

  • Use open infrastructure: Open agents and vendor-agnostic frameworks allow you to use the best fit-for-purpose models.
  • Think API-first: Good API design and clear, machine-readable definitions better prepare an organization for AI agents.
  • Keep data in sync: Keeping shared data in sync is another challenge. Event-driven architectures can keep data fresh.
  • Balance access with control: Keeping agentic systems secure will require offensive security exercises, comprehensive audit logs, and defensive data validation.
  • Continually improve: To avoid agent drift, agentic systems development will inevitably require ongoing maintenance as the industry and AI technology evolve. 

The future for agentic systems

Agentic AI development has moved forward at a blistering pace. Now, we’re at the point where agentic system patterns are beginning to solidify.

Looking to the future, experts anticipate a turn toward more multi-agent systems development, guiding the need for more complex orchestration patterns and reliance upon open standards. Some forecast a substantial overhaul to knowledge work at large.

“I expect that in 2026, we will see experimentation with frameworks to structure ‘factories’ of agents to coordinate producing complex knowledge work, starting with coding,” says Block’s Brosamer. The most challenging aspect will be optimizing existing information flows for agentic use cases, she adds. 

One aspect of that future could be more emphasis on alternative clouds and edge-based inference to move certain workloads out of centralized cloud architecture to reduce latency.

“The future of competitive AI demands proximity, not just processing power,” says Akamai’s Weil. “Agents need to act in the real world, interacting with users, devices, and data as events unfold.” 

All in all, building agentic systems is a highly complex endeavor, and the practices are still maturing. It will take a combination of novel technologies, microservices-esque design thinking, and security guardrails to take these projects to fruition at scale in a meaningful and sustainable way — all while still granting agents meaningful autonomy.

The future looks agentic. But the smart system design underpinning agentic systems will set apart successful outcomes from failed pilots.

(image/jpeg; 1.67 MB)

Oracle delivers semantic search without LLMs 17 Apr 2026, 5:07 pm

Oracle says its new Trusted Answer Search can deliver reliable results at scale in the enterprise by scouring a governed set of approved documents using vector search instead of large language models (LLMs) and retrieval-augmented generation (RAG).

Available for download or accessible through APIs, it works by having enterprises define a curated “search space” of approved reports, documents, or application endpoints paired with metadata, and then using vector-based similarity to match a user’s natural language query to the most relevant of pre-approved target, said Tirthankar Lahiri, SVP of mission-critical data and AI engines at Oracle.

Instead of retrieving raw text and generating a response, as is typical in RAG systems that rely on LLMs, Trusted Answer Search’s underlying system deterministically maps the query to a specific “match document,” extracts any required parameters, and returns a structured, verifiable outcome such as a report, URL, or action, Lahiri said.

A feedback loop enables users to flag incorrect matches and specify the expected result.

Lahiri sees a growing enterprise need for more deterministic natural language query systems that eliminate inconsistent responses and provide auditability for compliance purposes.

Independent consultant David Linthicum agreed about the potential market for Trusted Answer Search.

“The buyer is any enterprise that values predictability over creativity and wants to lower operational risk, especially in regulated industries, such as finance and healthcare,” he said.

Trade-offs

That said, the approach comes with trade-offs that CIOs need to consider, according to Robert Kramer, managing partner at KramerERP. While Trusted Answer Search can reduce inference costs by avoiding heavy LLM usage, it shifts spending toward data curation, governance, and ongoing maintenance, he said.

Linthicum, too, sees enterprises adopting the technology having to spend on document curation, taxonomy design, approvals, change management, and ongoing tuning.

Scott Bickley, advisory fellow at Info-Tech Research Group, warned of the challenges of keeping curated data current.

“As the source data scales upwards to include externally sourced content such as regulatory updates or supplier certifications or market updates that are updated more frequently and where the documents may number in the many thousands, the risk increases,” he said.

“The issue comes down to the ability to provide precise answers across a massive data set, especially where documents may contradict one another across versions or when similar language appears different in regulatory contexts. The risk of being served up results that are plausible but wrong goes up,” Bickley added.

Oracle’s Lahiri, however, said some of these concerns may be mitigated by how Trusted Answer Search retrieves content.

Rather than relying solely on large volumes of static, curated documents that require constant updating, the system can treat “trusted documents” as parameterized URLs that pull in dynamically rendered content from underlying systems, according to Lahiri.

Live data sources

This enables it to generate answers from live data sources such as enterprise applications, APIs, or regularly updated web endpoints, reducing dependence on manually maintained document repositories, he said.

Linthicum was not fully convinced by Lahiri’s argument, agreeing only that Oracle’s approach could help reduce content churn.

“In fast-moving domains, keeping descriptions, synonyms, and mappings current still needs disciplined owners, approvals, and feedback review. It can scale to thousands of targets, but semantic overlap raises maintenance complexity,” he said.

Trusted Answer Search puts Oracle in contention with offerings from rival hyperscalers. Products such as Amazon Kendra, Azure AI Search, Vertex AI Search, and IBM Watson Discovery already support semantic search over enterprise data, often combined with access controls and hybrid retrieval techniques.

One key distinction, between these offerings and Oracle’s, according to Ashish Chaturvedi, leader of executive research at HFS Research, is that the rival products typically layer generative AI capabilities on top to produce answers.

Enterprises can evaluate Trusted Answer Search by downloading a package that includes components such as vector search, an embedding model to process user queries, and APIs for integration into existing applications and user interfaces. They can also run it through APIs or built-in GUI applications, which are included in the package as two APEX-based applications, an administrator interface for managing the system and a portal for end users.

(image/jpeg; 6.91 MB)

Exciting Python features are on the way 17 Apr 2026, 9:00 am

Transformative new Python features are coming in Python 3.15. In addition to lazy imports and an immutable frozendict type, the new Python release will deliver significant improvements to the native JIT compiler and introduce a more explicit agenda for how Python will support WebAssembly.

Top picks for Python readers on InfoWorld

Speed-boost your Python programs with the new lazy imports feature
Starting with Python 3.15, Python imports can work lazily, deferring the cost of loading big libraries. And you don’t have to rewrite your Python apps to use it.

How Python is getting serious about Wasm
Python is slowly but surely becoming a first-class citizen in the WebAssembly world. A new Python Enhancement Proposal, PEP 816, describes how that will happen.

Get started with Python’s new frozendict type
A new immutable dictionary type in Python 3.15 fills a long-desired niche in Python — and can be used in more places than ordinary dictionaries.

How to use Python dataclasses
Python dataclasses work behind the scenes to make your Python classes less verbose and more powerful all at once.

More good reads and Python updates elsewhere

Progress on the “Rust for CPython” project
The plan to enhance the Python interpreter by using the Rust language stirred controversy. Now it’s taking a new shape: use Rust to build components of the Python standard library.

Profiling-explorer: Spelunk data generated by Python’s profilers
Python’s built-in profilers generate reports in the opaque pstats format. This tool turns those binary blobs into interactive, explorable views.

The many failures that led to the LiteLLM compromise
How did a popular Python package for working with multiple LLMs turn into a vector for malware? This article reveals the many weak links that made it possible.

Slightly off-topic: Why open source contributions sit untouched for months on end
CPython has more than 2,200 open pull requests. The fix, according to this blog, isn’t adding more maintainers, but “changing how work flows through the one maintainer you have.”

(image/jpeg; 4.97 MB)

When cloud giants neglect resilience 17 Apr 2026, 9:00 am

In a recent article chronicling the history of Microsoft Azure and its intensifying woes, we see a narrative that has been building throughout the industry for years. As cloud computing evolved from a buzzword to the backbone of digital infrastructure, major providers like Microsoft, Amazon, and Google have had to make compromises. Their promises of near-perfect uptime shifted from an expectation to “good enough,” influenced by economic pressures that have seen the cloud giants prioritize cost cuts and staff reductions over previously non-negotiable service reliability.

Frankly, many who follow the cloud space closely, including myself, have been warning about this situation for some time. Cloud outages are no longer rare, freak events. They are ingrained in the model as accepted collateral for the rapid growth and relentless cost-cutting that define this era of cloud computing. The story of Azure, as discussed in the referenced Register piece, is simply the latest and most prominent example of a much larger, industrywide trend.

This is not to say that cloud computing is inherently unstable or that its advantages—agility, scalability, rapid deployment—are a mirage. Enterprises aren’t abandoning the cloud. Far from it. Adoption continues at pace, even as these high-profile outages occur. The question is not whether the cloud is worth it, but rather, how much unreliability is acceptable for all that innovation and efficiency?

The price of cost optimization

If you trace the decisions of major public cloud players, a clear theme emerges. Competitive pressure from rivals translates to constant cost control, rushing services to market, shaving operational budgets, automating wherever possible, and reducing (or outright eliminating) teams of deeply experienced engineering talent who once ensured continuity and institutional knowledge. The comments from a former Azure engineer clearly illustrate how an exodus of talent, paired with an almost single-minded focus on AI and automation, is having downstream effects on the platform’s stability and support.

The irony is sharp: As cloud providers trumpet their AI prowess and machine-driven automation, the human expertise that built and reliably ran these platforms is no longer considered mission-critical. Automation isn’t a cure-all; companies still need experienced architects and operators who understand system limits, manage dependencies, handle failures, and respond deftly to unpredictable failures. Recent major outages reflect the slow but sure loss of that critically embedded human knowledge. Meanwhile, engineering decisions are increasingly made by those tasked with juggling ever-larger portfolios, new feature launches, and cost-reduction mandates, rather than contributing a methodical focus on resilience and craftsmanship.

Azure faces growing pains at scale, with tens of thousands of AI-generated lines of code created, tested, and deployed daily—sometimes by other AI agents —creating a self-reinforcing cycle of complexity and opacity. The resulting “compute crunch” puts even more strain on infrastructure, which, despite its sophistication, now handles heavier loads with fewer people providing oversight.

Outages aren’t driving users away

A natural question emerges: With reliability clearly taking a back seat, why aren’t enterprises reconsidering cloud altogether? I’ve argued for years that the game has changed. The benefits of cloud centralization, automation, and connectivity have become so fundamental to operations that the industry has quietly recalibrated its tolerance for outages. Public cloud is so deeply embedded into the business and digital operations that stepping back would mean undoing years, and often decades, of progress.

Headline-grabbing outages are dramatic but usually survivable. Disaster recovery plans, multi-region deployments, and architectural workarounds are now essentials for all major cloud-based companies. Building with failure in mind is a standard cost, not an avoidable exception. For most CIOs, the persistent risk of downtime is a manageable variable, balanced against the unmatchable benefits of cloud agility and in-house scale.

Providers know this well, and their actions reflect it. Outages may sting a bit in the press, but the real-world consequences have yet to outweigh the benefits to companies that push further into the cloud. As such, the providers’ logic is simple: As long as customers accept outages, however grudgingly, there’s little incentive to switch to costlier, less scalable systems.

How enterprises can adapt

With outages now the price of admission, enterprises should recognize that neither staff cuts nor the blind pursuit of automation will stop anytime soon. Cloud providers may promise improvements, but their incentives will remain focused on cost control over reliability. Organizations must adapt to this new normal, but they can still make choices that reduce their risk.

First, enterprises should prioritize fault-resistant cloud architecture. Adopting multicloud and hybrid cloud strategies, while complex, reduces the technical risk associated with reliance on a single provider.

Second, it’s crucial to invest in in-house expertise that understands both the workloads and the nuances of cloud service behavior. While the providers may treat their operations talent as expendable, nothing will replace the value of an enterprise’s in-house team to independently monitor, test, and prepare for the unexpected.

Finally, enterprises must enforce strict vendor management. This means holding providers accountable for promised service-level agreements, monitoring transparency in communication and incident reporting, and leveraging contracted services to their fullest extent, especially as the cloud market matures and customer influence grows.

The era of the infallible cloud is over. As public cloud providers pursue operational efficiency and AI dominance, resilience has taken a hit, and both providers and users must adapt. The challenge for today’s enterprises is to strategically mitigate the most likely consequences before the next outage strikes.

(image/jpeg; 1.88 MB)

Anthropic’s latest model is deliberately less powerful than Mythos (and that’s the point) 17 Apr 2026, 2:33 am

Anthropic has today released a new, improved Claude model, Opus 4.7, but has deliberately built it to be less capable than the highly-anticipated Claude Mythos.

Anthropic calls Opus 4.7 a “notable improvement” over Opus 4.6, offering advanced software engineering capabilities and improved visioning, memory, instruction-following, and financial analysis.

However, the yet-to-be-released (and inadvertently leaked) Mythos seems to overshadow the Opus 4.7 release. Interestingly, Anthropic itself is downplaying Opus 4.7 to an extent, calling it “not as advanced” and “less broadly capable” than the Claude Mythos Preview.

The Opus upgrade also comes on the heels of the launch of Project Glasswing, Anthropic’s security initiative that uses Claude Mythos Preview to identify and fix cybersecurity vulnerabilities.

“For once in technological history, a product is being released with a marketing message that is focused more on what it does not do than on what it does,” said technology analyst Carmi Levy. “Anthropic’s messaging makes it clear that Opus 4.7 is a safer model, with capabilities that are deliberately dialed down compared to Mythos.”

‘Not fully ideal’ in some safety scenarios

Anthropic touts Opus 4.7’s “substantially better” instruction-following compared to Opus 4.6, its ability to handle complex, long-running tasks, and the “precise attention” it pays to instructions. Users report that they’re able to hand off their “hardest coding work” to the model, whose memory is better than that of prior versions. It can remember notes across long, multi-session work and apply them to new tasks, thus requiring less up-front context.

Opus 4.7 has 3x more vision capabilities than prior models, Anthropic said, accepting high-resolution images of up to 2,576 pixels. This allows the model to support multimodal tasks requiring fine visual detail, such as computer-use agents analyzing dense screenshots or extracting data from complex diagrams.

Further, the company reported that Opus 4.7 is a more effective financial analyst, producing “rigorous analyses and models” and more professional presentations.

Opus 4.7 is relatively on par with its predecessor in safety, Anthropic said, showing low rates of concerning behavior such as “deception, sycophancy, and cooperation with misuse.” However, the company pointed out, while it improves in areas like honesty and resistance to malicious prompt injection, it is “modestly weaker” than Opus 4.6 elsewhere, such as in responding to harmful prompts, and is “not fully ideal in its behavior.”

Opus 4.7 comes amidst intense anticipation of the release of Claude Mythos, a general-purpose frontier model that Anthropic calls the “best-aligned” of all the models it has trained. Interestingly, in its release blog today, the company revealed that Mythos Preview scored better than Opus 4.7 on a few major benchmarks, in some cases by more than ten percentage points.

The Mythos Preview boasted higher scores on SWE-Bench Pro and SWE-Bench Verified (agentic coding); Humanity’s Last Exam (multidisciplinary reasoning); and agentic search (BrowseComp), while the two had relatively the same scores for agentic computer use, graduate-level reasoning, and visual reasoning.

Opus 4.7 is available in all Claude products and in its API, as well as in Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains the same as Opus 4.6: $5 per million input tokens, and $25 per million output tokens.

What sets Opus 4.7 apart

Claude Opus is being branded in the industry as a “practical frontier” model, and represents Anthropic’s “most capable intelligent and multifaceted automation model,” said Yaz Palanichamy, senior advisory analyst at Info-Tech Research Group. Its core use cases include complex coding, deep research, and comprehensive agentic workflows.

The model’s core product differentiators have to do with how well-coordinated and composable its embedded algorithms are at scaling up various operational use case scenarios, he explained.

Claude Opus 4.7 is a “technically inclined” platform requiring a fair amount of deep personalization to fine-tune prompts and generate work outputs, he noted. It retains a strong lead over rival Google Gemini in terms of applied engineering use cases, even though Gemini 3.1 Pro has a larger context window (2M tokens versus Claude’s 1M tokens), although, he said, “certain [comparable] models do tend to converge on raw reasoning.”

The 4.7 update moves Opus beyond basic chatbot workflows, and positions it as more of “a copilot for complex, technical roles,” Levy noted. “It’s more capable than ever, and an even better copilot for knowledge workers.” At the same time, it poses less risk, making it a “carefully calculated compromise.”

He also pointed out that the Opus 4.7 release comes just two months after Opus 4.6 was introduced. That itself is “a signal of just how overheated the AI development cycle has become, and how brutally competitive the market now is.”

A guinea pig for Mythos?

Last week, Anthropic also announced Project Glasswing, which applies Mythos Preview to defensive security. The company is working with enterprises like AWS and Google, as well as with 30-plus cybersecurity organizations, on the initiative, and claims that Glasswing has already discovered “thousands” of high-severity vulnerabilities, including some in every major operating system and web browser.

Anthropic is intentionally keeping Claude Mythos Preview’s release limited, first testing new cyber safeguards on “less capable models.” This includes Opus 4.7, whose cyber capabilities are not as advanced as those in Mythos. In fact, during training, Anthropic experimented to “differentially reduce” these capabilities, the company acknowledged.

Opus 4.7 has safeguards that automatically detect and block requests that suggest “prohibited or high-risk” cybersecurity uses, Anthropic explained. Lessons learned will be applied to Mythos models.

This is “an admission of sorts that the new model is somewhat intentionally dumber than its higher-end stablemate,” Levy observed, “all in an attempt to reinforce its cyber risk detection and blocking bona fides.”

From a marketing perspective, this allows Anthropic to position Opus 4.7 as an ideal balance between capability and risk, he noted, but without all the “cybersecurity baggage” of the limited availability higher-end model.

Mythos may very well be the “ultimate sacrificial lamb” at the root of broader Opus 4.7 mass adoption, Levy said. Even in the “increasing likelihood” that Mythos is never publicly released, it will serve as “an ideal means of glorifying Opus as the one model that strikes the ideal compromise for most enterprise decision-makers.”

Palanichamy agreed, noting that Opus 4.7 could serve as a public-facing guinea pig to live-test and fine-tune the automated cybersecurity safeguards that will ultimately “become a mandatory precursory requirement for an eventual broader release of Mythos-class frontier models.”

This article originally appeared on Computerworld.

(image/jpeg; 2.12 MB)

Ease into Azure Kubernetes Application Network 16 Apr 2026, 9:00 am

If you’re using Kubernetes, especially a managed version like Azure Kubernetes Service (AKS), you don’t need to think about the underlying hardware. All you need to do is build your application and it should run, its containers managed by the service’s orchestrator.

At least that’s the theory. However, implementing a platform that abstracts your code from the servers and network that support it brings its own problems, and a whole new discipline. Platform engineers fill the gap between software and hardware, supporting security and networking, as well as managing storage and other key services.

Kubernetes is part of an ecosystem of cloud-native services that provide the supporting framework for running and managing scalable distributed systems, including the tools needed to package and deploy applications, as well as components that extend the functionality of Kubernetes’ own nodes and pods.

Key components of this growing ecosystem are the various service meshes. These offer a way to manage connectivity between nodes and between your applications and the outside network, with tools for handling basic network security. Often implemented as “sidecar” containers, running alongside Kubernetes pods, these network proxies can consume added resources as your applications scale. That means more configuration and management, ensuring that configurations are kept up-to-date and that secrets are secure.

Istio goes ambient

One of the key service mesh implementations, Istio, has developed an alternate way of operating, what the project calls “ambient mode”. Here, instead of having individual sidecars for each pod, your service mesh is implemented as per-node proxies or as a single proxy that supports an entire Kubernetes namespace. It’s an approach that allows you to start implementing a service mesh without increasing the complexity of your platform, making it easy to go from a basic development Kubernetes implementation to a production environment without having to change your application pods.

It’s called ambient mode because there’s no need to add new service mesh elements as your application scales. Instead, the service mesh is always there, and your pods simply join it and take advantage of the existing configuration. The resulting implementation is both easier to use and easier to understand.

Microsoft has used Istio as part of Azure Kubernetes Service for many years. Istio is one of a suite of open-source tools that provide the backbone of Azure’s cloud-native computing platform.

Introducing Azure Kubernetes Application Network

So, it’s not surprising to learn that Microsoft is using Istio’s ambient mesh as the basis of Azure Kubernetes Application Network. The new service (available in preview) allows application developers to add managed network services to their applications without needing the support of a platform engineering team to implement a service mesh. It will even help you migrate away from the now-deprecated ingress-nginx by providing access to the recommended Kubernetes Gateway API without needing more sidecars and letting you use your existing ingress-nginx configurations while you complete your migration.

Microsoft describes the preview of Azure Kubernetes Application Network as “a fully managed, ambient-based service network solution for Azure Kubernetes Service (AKS).” The underlying data and control planes are managed by AKS, so all you need to do is connect your AKS clusters to an Application Network and AKS will then manage the service mesh for you, without any changes to your applications.

Like other implementations of Istio’s ambient mesh, there are two levels to Application Network: a core set of node-level application proxies that handle connectivity and security for application services, and an optional set of lower-level proxies that support routing and apply network policies, acting as a software-defined network inside your Kubernetes environment.

This approach lets you build and test a Kubernetes application on your local development hardware without using Application Network features, then deploy it to AKS along with the required network configuration — simplifying both development and deployment. It also reduces development overheads, both in compute and developer resources.

Using Azure Kubernetes Application Network

Once deployed Application Network connects the services in your application securely, managing encrypted connections automatically and managing the required certificates. It can support unencrypted connections, for when you aren’t sending confidential data and don’t need the associated overhead. As the service is managed by AKS, new pods are automatically provisioned as they are deployed, with the ambient mesh supporting both scale-up and scale-down operations.

The architecture of Application Network is much like that of an Istio ambient mesh. The main difference is that the service’s management and control planes are managed by Azure, with application owners limited to working with the service’s data plane, configuring operations and setting policies for their application workloads. Azure’s control of the management plane automates certificate management, ensuring that connections stay secure and there is little risk of certificate expiration, using the tools built into Azure Key Vault.

The Application Network data plane holds proxies and gateways used by the service mesh, and these are deployed when the service is launched, along with the required Kubernetes configurations. The key to operation is ztunnel, a proxy that intercepts inter-service requests, secures the connection, and routes requests to another ztunnel running with the destination service. A gateway oversees connections between ztunnels running in remote clusters, allowing your service mesh to scale out with demand.

Building your first ambient service mesh in AKS

Getting started with Azure Kubernetes Application Network requires the Azure CLI. If you’re working with an existing AKS cluster, then you will need to enable integration with Microsoft Entra and enable OpenID Connect.

As the Application Network service is in preview, start by registering it in your account. This can take some time, but once it’s registered you can install the AppNet CLI extension that’s used to manage and control Application Network for your AKS clusters. You can now start to set up the ambient service mesh, either creating new clusters to use it, or adding the service mesh to existing AKS deployments.

Starting from scratch is the easiest way, as it ensures that you’re running in the same tenant. AKS clusters and Application Network can be in the same resource group if you want, but it’s not necessary. You’re free to use separate resource groups for management.

The appnet command makes it easy to create an Application Network from the command line; all you need is a name for the network, a resource group, a location, and an identity type. Once you’ve run the command to create your ambient mesh, wait for the mesh to be provisioned before joining a cluster to your network. This again simply needs a resource group, a name for the member cluster, and its resource group and cluster name. At the same time, you define how the network will be managed, i.e. whether you manage upgrades yourself or leave Azure to manage them for you. Additional clusters can be added to the network the same way.

With an Application Network and member clusters in place, the next step is to use Kubernetes’ own tooling to add support for the ambient mesh to your applications. Microsoft provides a useful example that shows how to use Application Network with the Kubernetes Gateway API to manage ingress. You need to use kubectl and istioctl commands to enable gateways and verify their operation, adding services and ensuring that they are visible to each other through their respective ztunnels.

Securing applications with policies

Policies can be used to control access from the application ingress to specific services as well as between services, reducing the risk of breaches and ensuring that you control how traffic is routed in your application. These policies can be locked down to ensure only specific methods can be used, so only allowing HTTP GET operations on a read-only service, and POST where data needs to be delivered. Other options can be used to enforce OpenID Connect authorization at a mesh level.

Not all Azure Kubernetes clusters are supported in the preview, which is only available in Azure’s largest regions. For now, Application Network won’t work with private clusters or with Windows node pools. Once running you can’t switch upgrade modes, and as it’s based on Istio, you can’t enable Istio service meshes in your cluster. These requirements aren’t showstoppers, and you should be able to get started experimenting with the service as it’s still in preview.

AKS Application Network is a powerful tool that helps simplify and secure the process of building and running inter-cluster networks in an AKS application. As it is an ambient service, it’s possible to scale as necessary, and can help provide secure bridges between clusters. By working at a Kubernetes level, it’s possible to use Application Network to provide policy driven production network rules, allowing developers to build and test code in unrestricted environments before moving to test and production clusters.

As Application Network uses familiar Kubernetes and Istio constructions, it’s possible to build configurations into Helm charts and other deployment tools, ensuring configurations are part of your build artifacts and that network configurations and policies are delivered with your code every time you push a new build – without needing platform engineering support.

(image/jpeg; 5.45 MB)

The two-pass compiler is back – this time, it’s fixing AI code generation 16 Apr 2026, 9:00 am

If you came up building software in the 1990s or early 2000s, you remember the visceral satisfaction of determinism. You wrote code. The compiler analyzed it, optimized it, and emitted precisely the machine instructions you expected. Same input, same output. Every single time. There was an engineering rigor to it that shaped how an entire generation of developers thought about building systems.

Then large language models (LLMs) arrived and, almost overnight, code generation became a stochastic process. Prompt an AI model twice with identical inputs and you’ll get structurally different outputs—sometimes brilliant, sometimes subtly broken, occasionally hallucinated beyond repair. For quick prototyping that’s fine. For enterprise-grade software—the kind where a misplaced null check costs you a production outage at 2am—it’s a non-starter.

We stared at this problem for a while. And then something clicked. It felt familiar, like a pattern we’d encountered before, buried somewhere in our CS fundamentals. Then it hit us: the two-pass compiler.

A quick refresher

Early compilers were single-pass: read source, emit machine code, hope for the best. They were fast but brittle—limited optimization, poor error handling, fragile output. The industry’s answer was the multi-pass compiler, and it fundamentally changed how we build languages. The first pass analyzes, parses, and produces an intermediate representation (IR). The second pass optimizes and generates the final target code. This separation of concerns is what gave us C, C++, Java—and frankly, modern software engineering as we know it.

2-pass architecture

The structural parallel between classical two-pass compilation and AI-driven code generation.

WaveMaker

The analogy to AI code generation is almost eerily direct. Today’s LLM-based tools are, architecturally, single-pass compilers. You feed in a prompt, the model generates code, and you get whatever comes out the other end. The quality ceiling is the model itself. There’s no intermediate analysis, no optimization pass, no structural validation. It’s 1970s compiler design with 2020s marketing.

Applying the two-pass model to AI code generation

Here’s where it gets interesting. What if, instead of asking an LLM to go from prompt to production code in one shot, you split the process into two architecturally distinct passes—just like the compilers that built our industry?

Pass 1 is where the LLM does what LLMs are genuinely good at: understanding intent, decomposing design, and reasoning about structure. The model analyzes the design spec, identifies components, maps APIs, resolves layout semantics—and emits an intermediate representation, an IR. Not HTML. Not Angular or React. A well-defined meta-language markup that captures what needs to be built without committing to how.

This is critical. By constraining the LLM’s output to a structured meta-language rather than raw framework code, you eliminate entire categories of failure. The model can’t inject malformed  tags if it’s not emitting HTML. It can’t hallucinate nonexistent React hooks if it’s outputting component descriptors. You’ve reduced the stochastic surface area dramatically.

Pass 2 is entirely deterministic. A platform-level code generator—no LLM involved—takes that validated intermediate markup and emits production-grade Angular, React, or React Native code. This is the pass that plugs in battle-tested libraries, enforces security patterns, and applies framework-specific optimizations. Same IR in, same code out. Every time.

First pass gives you speed. Second pass gives you reliability. The separation of concerns is what makes it work.

Why this matters now

The advantages of this architecture compound in exactly the ways that matter for enterprise development. The meta-language IR becomes your durable context for iterative development—you’re not re-prompting the LLM from scratch every time you refine a component. Security concerns like script injection and SQL injection are structurally eliminated, not patched after the fact. Hallucinated properties and tokens get caught and stripped at the IR boundary before they ever reach generated code. And because Pass 2 is deterministic, you get reproducible, auditable, deployable output.

Pass 1 — LLM-powered

• Translates design/spec to structured components and design tokens
• Enables iterative dev with meta-markup as persistent context

Eliminates script/SQL injection by design
Pass 2 — Deterministic

• Generates optimized, secure, performant framework code
• Validates and strips hallucinated markup and tokens

Plugs in battle-tested libraries for reliability

If you’ve spent your career building systems where correctness isn’t optional, this should resonate. The industry spent decades learning that single-pass compilation couldn’t produce reliable software at scale. The two-pass architecture wasn’t just an optimization, but an engineering philosophy: separate understanding from generation, validate before you emit, and never let a single phase carry the entire burden of correctness.

We’re at the same inflection point with AI code generation right now. The models are powerful. The architecture around them has been naive. The fix isn’t to wait for a smarter model. It’s to apply the engineering discipline we’ve always known, and build systems where stochastic brilliance and deterministic reliability each do what they do best—in the right pass, at the right time.

Deterministic software engineering is cool again. Turns out it never really left.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 1.67 MB)

Page processed in 0.17 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.