AWS targets AI agent sprawl with new Bedrock Agent Registry 10 Apr 2026, 12:22 pm

AWS is expanding its Amazon Bedrock AgentCore portfolio with a new managed service, Agent Registry, designed to help enterprises catalog, manage, and govern their growing fleets of AI agents and their associated tools.

The service provides a unified directory of agents, capturing metadata such as capabilities, identities, and integrations, and is designed to support agents built with different models and frameworks, AWS wrote in the offering’s documentation.

The move addresses a growing structural gap around AI agent sprawl in enterprises that has emerged as they try to move pilot cases to production, analysts say.

“Agent Sprawl is an emerging structural problem. What we see consistently across enterprises is that agents proliferate much faster than traditional applications because they are easier to build. As a result, ownership becomes ambiguous as they move into production while increasing costs, risks, and duplication until finance, security, or related incidents force attention,” said Gaurav Dewan, research director at Avasant.

With the addition of a centralized Agent Registry-like offering, discovery, orchestration, governance, lifecycle management, and standardization of agents become easier, Dewan added, noting that in effect, a registry transforms agents from isolated artifacts into managed, composable enterprise assets.

A control plane play with trade-offs?

The service comes with what analysts describe as a “strategic” limitation: While it can track agents interacting with external systems, the registry itself operates within AWS.  

This approach, according to Forrester principal analyst Charlie Dai, reinforces the company’s push to position Bedrock as the control plane for enterprise AI agent deployment and oversight.

More so because other cloud providers are taking a similar approach. 

Google Cloud, for instance, is extending Vertex AI with capabilities to orchestrate and monitor agents, including a governance layer within Vertex AI Agent Builder and integrations with its Agent Registry via Apigee.

Microsoft, meanwhile, is positioning Azure AI and Copilot Studio as a unified platform for building and governing enterprise AI agents, complemented by Agent 365 and Entra Agent ID for discovery and identity management.

In fact, Dewan cautioned enterprise teams planning to embrace AWS’ Agent Registry because of its close integration with AWS-native services, particularly in areas such as identity and runtime.

“As a result, while the service will natively index and manage agents deployed within AWS environments, integration with external or on-prem agents will likely require manual registration. Cross-cloud or federated discovery capabilities are not yet clearly established,” Dewan said.

This limitation, in itself, could introduce a new risk: registry sprawl across hyperscalers, Dewan noted, adding that enterprises adopting AWS, Microsoft, and Google registries in parallel could end up recreating the very fragmentation these tools are meant to solve.

Accessing the Agent Registry and adding agents

The registry is accessible through multiple entry points, including the AgentCore console, APIs, SDKs, and even as a Model Context Protocol (MCP) server, allowing compatible clients and developer tools to query it directly.

This multi-access design, according to an AWS blog post, is deliberate as it allows teams to integrate the registry into existing development environments or build custom discovery interfaces using OAuth-based authentication, without being tied strictly to AWS-native tooling.

When it comes to adding agents and related resources, AWS provides two primary approaches.

The first is a manual registration path, where developers or platform teams create records via the console, APIs, or SDKs by supplying structured metadata such as capabilities, ownership, compliance attributes, and usage documentation.

The second is a more automated ingestion route, where teams can point the registry to an MCP or Agent2Agent (A2A) endpoint, allowing it to pull in agent details automatically, AWS wrote in the blog post.

Agent Registry is currently available in preview across five AWS Regions, including US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Sydney), Europe (Ireland), and US East (N. Virginia).

The service is expected to start supporting external registries soon, AWS said.

That, the hyperscaler added, will help enterprises connect multiple registries and search across them as one.

“You will be able to define categories and taxonomies that match how your organization thinks about agents, backed by structured metadata schemas capturing ownership, compliance status, cost center, and whatever else your governance model requires,” it further detailed in the blog post.

(image/jpeg; 9.45 MB)

AI agents aren’t failing. The coordination layer is failing 10 Apr 2026, 9:00 am

Our multi-agent AI system was impressive in demos. One agent handled customer inquiries. Another managed scheduling. A third processed documents. Each worked beautifully in isolation.

In production, they fought each other. The scheduling agent would book appointments while the inquiry agent was still gathering requirements. The document agent processed files using outdated context from a conversation that had moved on two turns ago. End-to-end latency crept from roughly 200 milliseconds to nearly 2.4 seconds as agents waited on each other through ad-hoc API calls that nobody had designed for scale.

The problem was not the agents. Every individual agent performed well within its domain. The problem was the missing coordination infrastructure between them, what I now call the “Event Spine” that enables agents to work as a system rather than a collection of individuals competing for the same resources.

Among my peers running production AI at enterprise scale across telecommunications and healthcare, the same pattern keeps surfacing. Agent proliferation is real. The tooling to coordinate those agents is not keeping up.

Multi-agent coordination: Before vs after.
Figure 1: Multi-Agent Coordination — Before (N² connections) vs After (Event Spine)

Sreenivasa Reddy Hulebeedu Reddy

Why direct agent-to-agent communication breaks down

The intuitive approach to multi-agent systems is direct communication. Agent A calls Agent B’s API. Agent B responds. Simple point-to-point integration. It mirrors how most teams build microservices initially, and it works fine when you have two or three agents.

The math stops working quickly. As agent count grows, connection count grows quadratically. Five agents need 10 connections. Ten agents need 45. Twenty agents need 190. Each connection is a potential failure point, a latency source and a coordination challenge that someone must maintain.

Worse, direct communication creates hidden dependencies. When Agent A calls Agent B, A needs to understand B’s state, availability and current workload. That knowledge couples the agents tightly, defeating the entire purpose of distributing capabilities across specialized components. Change B’s API contract, and every agent that calls B needs updating.

We have seen this movie before. Microservices went through the same evolution — from direct service-to-service calls to message buses, to service meshes. AI agents are following the same trajectory, just compressed into months instead of years.

The Event Spine pattern

The Event Spine is a centralized coordination layer with three properties designed specifically for multi-agent AI systems.

First, ordered event streams. Every agent action produces an event with a global sequence number. Any agent can reconstruct the current system state by reading the event stream. This eliminates the need for agents to query each other directly, which is where the latency was hiding in our system.

Second, context propagation. Each event carries a context envelope that includes the originating user request, current session state and any constraints or deadlines. When an agent receives an event, it has the full picture without making additional calls. In our previous architecture, agents were making three to five round-trip calls just to assemble enough context to act on a single request. Third, coordination primitives. The spine provides built-in support for common patterns: sequential handoffs between agents, parallel fan-out with aggregation, conditional routing based on confidence scores and priority preemption when urgent requests arrive. These patterns would otherwise need to be implemented independently by each agent pair, duplicating logic and introducing inconsistency.

from collections import defaultdict
 import time
  class EventSpine:
      def __init__(self):
          self.sequence = 0
          self.subscribers = defaultdict(list)
       def publish(self, event_type, payload, context):
          self.sequence += 1
          event = Event(
              seq=self.sequence,
              type=event_type,
              payload=payload,
              context=context,
              timestamp=time.time()
          )
          for handler in self.subscribers[event_type]:
              handler(event)
          return event
      def subscribe(self, event_type, handler):
            self.subscribers[event_type].append(handler)
Event spine architecture request flow
Figure 2: Event Spine Architecture — Request Flow with Ordered Events and Context Propagation

Sreenivasa Reddy Hulebeedu Reddy

3 problems the Event Spine solves

Problem one: race conditions between agents. Without coordination, our scheduling agent would book meetings before the inquiry agent had finished collecting requirements. Customers received calendar invitations for appointments that were missing critical details. The Event Spine solved this by enforcing sequential processing for dependent operations. The scheduling agent subscribes to requirement-complete events and only acts after receiving confirmation that the inquiry agent has gathered everything needed.

Problem two: context staleness. Agents making decisions based on outdated information was our second most common failure mode. A customer would correct their phone number during a conversation, but the document agent — which had pulled context three turns earlier — would generate paperwork with the old number. The Event Spine solved this by attaching the current context envelope to every event. When the inquiry agent publishes an update, the attached context reflects the latest state. Downstream agents never operate on stale data.

Problem three: cascading failures. When one agent in a direct-call chain fails, downstream agents either hang waiting for a response or fail themselves. A single document processing timeout would cascade into scheduling failures and inquiry timeouts. The Event Spine introduced dead-letter queues, timeout policies and fallback routing. When the document processing agent experienced latency spikes, events automatically rerouted to a simplified fallback handler that queued work for later processing instead of failing the entire pipeline.

Three problems the event spine solves
Figure 3: Three Problems the Event Spine Solves — Before and After Comparison

Sreenivasa Reddy Hulebeedu Reddy

Results

The combined impact reshaped our system’s performance profile. End-to-end latency dropped from approximately 2.4 seconds back to roughly 180 milliseconds. The improvement came primarily from eliminating the cascading round-trip calls between agents. Instead of five agents making point-to-point requests to build context, each agent receives exactly what it needs through the event stream.

Agent-related production incidents dropped 71 percent in the first quarter after deployment. Most of the eliminated incidents were race conditions and stale context bugs that are structurally impossible with event-based coordination.

Agent CPU utilization decreased approximately 36 percent because agents stopped performing redundant work. In the old architecture, multiple agents would independently fetch the same customer data from shared services. With context propagation through the spine, that data is fetched once and shared through the event envelope.

Duplicate processing was eliminated entirely. Our previous architecture had no reliable way to detect when two agents were acting on the same request simultaneously. The Event Spine’s global sequence numbering provides natural deduplication. Developer productivity improved as well, though this is harder to quantify. Adding a new agent to the system now requires subscribing to relevant events rather than integrating point-to-point with every existing agent. Our most recent agent addition took two days from prototype to production, compared to the two weeks our previous additions required.

# Example: Sequential handoff with fallback
  class AgentCoordinator:
      def __init__(self, spine: EventSpine):
          self.spine = spine
          spine.subscribe('inquiry.complete', self.on_inquiry_complete)
          spine.subscribe('scheduling.complete', self.on_scheduling_complete)
          spine.subscribe('agent.timeout', self.on_agent_timeout)
 
      def on_inquiry_complete(self, event):
          # Sequential: scheduling only starts after inquiry
          self.spine.publish(
              'scheduling.request',
              payload=event.payload,
              context=event.context  # Fresh context propagated
            )
        def on_agent_timeout(self, event):
          # Fallback: route to dead-letter queue
          self.spine.publish(
              'fallback.process',
              payload=event.payload,
              context={**event.context, 'fallback_reason': 'timeout'}
	}

What this means for enterprises scaling AI agents

Multi-agent AI is not a future concept. It is a present reality in any enterprise running more than two AI-powered capabilities. If you have a chatbot, a recommendation engine and a document processor, you have a multi-agent system, whether you designed it as one or not.

The coordination challenge will only grow as agent counts increase. Every enterprise I speak with is adding AI capabilities faster than they are adding coordination infrastructure. That gap is where production incidents live.

The Event Spine pattern provides the architectural foundation that prevents agent proliferation from becoming agent chaos. It is the same lesson the industry learned with microservices a decade ago: distributed systems need explicit coordination infrastructure, not just well-intentioned API contracts between teams.

The enterprises that will scale AI successfully are the ones investing in coordination architecture now, before the complexity becomes unmanageable. The ones that wait will eventually build it anyway — just under more pressure, with more incidents and at higher cost.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 13.43 MB)

Cloud degrees are moving online 10 Apr 2026, 9:00 am

A recent article on transfer-friendly online cloud computing degree programs highlights a shift in how cloud professionals are educated, credentialed, and hired. This trend moves well beyond a few schools experimenting with digital delivery. An impressive number of accredited colleges and universities now offer online cloud-focused degrees, allowing students to enter the profession without traditional college programs. They earn valid degrees, develop real skills, and gain recognized credentials. The process is becoming more efficient, focused, and far less costly.

A move that mirrors cloud itself

Cloud computing is not a profession tied to physical locations. It relies on digital platforms, distributed systems, virtual infrastructure, remote management, automation, and architecture that exist beyond the confines of any single building. The work is done in the cloud, through the cloud, and increasingly for businesses that operate in hybrid or fully distributed models.

In fact, the online format is typically better aligned with the job itself. Students in these programs learn in environments that more closely resemble where they will eventually work. They gain exposure to cloud consoles, lab simulations, collaborative tools, remote problem-solving, and digital workflows that mirror real enterprise practice. That exposure matters. Education is always stronger when the delivery model reinforces the substance of what is being taught, and cloud computing may be one of the clearest examples of that principle.

Lower costs mean broader access

The most immediate advantage is expense, and this is where the old model is increasingly difficult to defend. A traditional college education includes lecture halls, dormitories, housing, transportation, facilities, and indirect costs that often don’t relate to cloud skills. While some find value in this experience, many, especially working adults and career-focused learners, see it as a financial burden they don’t need.

Online cloud programs offer a different model. Students can often work while studying, avoid relocating, cut incidental costs, and transfer credits to prevent paying again for completed coursework. This good news enables more people to join the cloud computing field without incurring the heavy student debt that has long been associated with higher education.

It also allows students to be more intentional. They can pursue a college education focused on what they actually want to do. If their goal is cloud engineering, cloud architecture, cloud operations, or cloud security, they no longer have to buy the entire traditional college package to earn a credible degree. It’s a better use of both time and money.

Career-relevant curriculum

The more effective online programs are aligned with the skills that businesses need. For years, one of the biggest frustrations in enterprise technology was the gap between academic training and real-world operations. Graduates lacked enough familiarity with the tools, disciplines, and patterns that define modern IT. Cloud computing quickly exposed this weakness because the market moved faster than many institutions could adapt.

That is beginning to change. Online cloud degree programs are now often designed with a clearer understanding of the profession itself. Students learn about cloud architecture, networking, security, governance, automation, systems design, and platform-specific thinking in ways that relate more directly to actual job roles. Training is integrated into the curriculum—exactly how it should be.

That integration creates value on multiple levels. In addition to their degree, students also gain preparation that is more immediately practical. They develop familiarity with the technologies and operating models that define real cloud environments. They start thinking less like passive learners and more like future practitioners. This makes the transition from school to work smoother and much more effective.

The benefits to employers

Businesses should see this development as positive news, too. Many organizations already invest in some form of employee education, but not all educational spending offers the same value. When a company funds an employee’s accredited online cloud degree, the business will benefit in a tangible way. That’s important because cloud skills are expensive to hire on the open market. It’s still hard to find enough experienced professionals in architecture, operations, migration, security, and platform management.

A smart approach is to develop that expertise internally. When a company helps a systems administrator, network engineer, developer, or support professional pursue a cloud-focused degree, the benefits to the business can be substantial. The employee continues working and can apply their new knowledge to real projects even before the degree is completed. This makes education funding appear less like a perk and more like a strategic investment in modernization, retention, and workforce development. At a time when businesses are pressured to do more with the talent they already have, those are powerful outcomes.

A market with real options

This trend would be less meaningful if students had only one or two isolated options. That is no longer the case. Purdue Global, Franklin University, and Thomas Edison State University all offer a Bachelor of Science degree in cloud computing. Strayer University offers a cloud computing concentration within its online information technology degree. Western Governors University has a cloud and network engineering program that reflects how closely cloud and network infrastructure now intersect in the real world.

These programs collectively mark the rise of a new category within higher education. Accredited institutions are recognizing that students want flexible, online, career-oriented degrees connected to cloud computing, and they are developing programs to meet that demand. This indicates that the education market is maturing. For many students, this path may soon become the preferred route to their careers. The economics are too favorable, the employer demand is too persistent, and the subject matter is too naturally suited to online delivery. Students want lower-cost pathways that lead directly to careers. Employers want workers who can contribute practical cloud knowledge sooner rather than later. Colleges and universities, however slowly they move, eventually respond to such market pressure.

Cloud computing sits at the center of all these forces. It is a modern profession built on digital access, continual learning, and applied knowledge. Accredited online education is the logical evolution.

(image/jpeg; 6.09 MB)

Microsoft’s reauthentication snafu cuts off developers globally 10 Apr 2026, 1:10 am

Microsoft officials have confirmed, and are trying to correct, a reauthentication snafu with developers in its Windows Hardware Program which has blocked an unknown number of independent software vendors (ISVs) from access to Microsoft systems. That in turn has interrupted operations for the their customers globally.

The process started in October, when Microsoft began account verification for its Windows Hardware Program. Notices were sent to corporate email accounts, or at least they were supposed to have been, and account holders were suspended if they didn’t respond to the request by the deadline. Suspended accounts included a mix of businesses that never received the Microsoft notices, those that received the email but either didn’t notice it or didn’t act on it, and some ISVs who claim they were fully reauthenticated but had services cut off anyway.

Microsoft executives communicating with customers on the X social media platform were quick to confirm that glitches had occurred, but noted that the company wasn’t entirely at fault. 

Scott Hanselman, a Microsoft VP overseeing GitHub, posted on X: “Hey, I love dumping on my company as much as the next guy, because Microsoft does some dumb stuff, but sometimes it’s just ‘check emails and verify your accounts.’ Not every ‘WTF micro$oft’ moment is a slam dunk. I’ve emailed [one major ISV] personally and we’ll get him unblocked. Not everything is a conspiracy. Sometimes it’s literally paperwork.”

At one point in the discussion, Hanselman seemed frustrated with users complaining that Microsoft enforced the deadline it had been telling people about since October. “It’s almost like deadlines are date based,” he said. 

Hanselman also said the flood of urgent requests made the reinstatement process seem to move more slowly. 

“In all these scenarios, [the ISVs] either didn’t see emails or didn’t take action on emails going back to October of last year and until now. Spam folder, didn’t see them, lots of valid reasons that can be worked on. Then they open tickets and the tickets don’t move fast enough–days or weeks, not hours,” Hanselman said. “Once the deadlines hit, then folks complain on social and then folks have to manually unblock accounts with urgency. Things become urgent, but were not always urgent.”

A more senior Microsoft executive, Pavan Davuluri, the EVP overseeing Windows and Devices, also weighed in on X. “We worked hard to make sure partners understood this was coming, from emails, banners, reminders. And we know that sometimes things still get missed,” Davuluri said. “We’re taking this as an opportunity to review how we communicate changes like this and make sure we’re doing it better. If anyone needs help with reinstatement, they can request support here.”

Making the problem worse was the cascading effect on global businesses. As the developer companies were locked out, their customers would also feel the pain as their operations were also disrupted due to reliance on the vendors.

Developers also complained about the limited Microsoft support available to unravel the mess. The company told visitors on X that they could use that app to message it and ask to be reinstated.

Onus on both vendors and ISVs

Consultant Brian Levine, executive director of FormerGov, said some of the onus has to fall on the ISVs.

“Developers should treat vendor recertification as a mission‑critical dependency and implement redundant monitoring, such as multiple emails, portal checks, and automated reminders, to avoid silent lockouts,” Levine said. “This poses real operational risk because a sudden vendor lockout can break integrations, halt workflows, and create cascading outages that look like internal failures rather than upstream policy triggers.”

He noted that vendors should surface critical compliance alerts directly inside their portals and consoles, where developers actually work, “so no one’s business hinges on whether a single automated email landed in [the] spam [folder].”

Carmi Levy, an independent technology analyst, said enterprises often give insufficient attention to their suppliers’ software suppliers. Enterprise IT and developers “need to be asking the hard questions” about vendor dependencies. “Ideally, vendor relations capabilities would be far more proactive,” he noted.

Asked if that means that enterprise IT should be asking their suppliers’ suppliers questions such as “Have you recertified with Microsoft yet? The deadline is almost here,” Levy said that might be asking too much. “Most organizations do not communicate at that level, unfortunately,” Levy said. 

“Summarily having an account terminated after years of regular and proper use is an unthinkable outcome for a developer whose very lifeblood relies on access to that very same account,” Levy said. “Likewise, the countless customers of this developer, who rely on [their ISV] for their own careers and businesses, are potentially left in the dark because Microsoft either can’t or won’t implement better development management technologies and protocols. This case reinforces the power imbalance between major tech platformers like Microsoft and the independent developers who rely on them to keep their own lights on.”

Implicit trust

Another complicating factor is the increasing reliance that systems have on other systems and executables, said Flavio Villanustre, CISO for the LexisNexis Risk Solutions Group. That is what forces Microsoft to be so strict in re-authenticating the players that control these software elements. 

There is “implicit trust put on those organizations providing computing components that must be executed before the operating system loads. Since all anti-malware controls are part of and start with the loading of the operating system, anything that executes before [them] could potentially jeopardize the integrity of the entire system,” Villanustre said. “To do this, UEFI requires those components executed at boot time, including the operating systems, to be cryptographically signed with private keys whose certificates are known and can be validated by the UEFI system.”

This is what puts so much power in the hands of the OS vendor, he noted. “Unfortunately, developers have little recourse. If their software component relies on pre-boot execution, they will need a key signature, and that’s tightly controlled by the UEFI/OEM manufacturers and Microsoft,” Villanustre said. “Even Linux distributions rely on Microsoft for key signature. This situation effectively creates a monopoly, where Microsoft controls what runs at boot time through their Certificate Authority.”

However, he observed, “it would probably require regulatory pressure to force that responsibility to be split among more organizations, but you could argue that doing so could potentially weaken the security of the overall system.” 

(image/jpeg; 7.87 MB)

Anthropic rolls out Claude Managed Agents 9 Apr 2026, 9:33 pm

Anthropic has announced Claude Managed Agents, a suite of composable APIs for building and hosting cloud-hosted agents. The intent is to give any company building on the Claude Platform the full production stack for shipping AI agents at scale.

Launched April 8 in a public beta on the Claude Platform, Claude Managed Agents is purpose-built for Claude, enabling better agent outcomes with less effort by providing sandboxed code execution, checkpointing, credential management, scoped permissions, and end-to-end tracing, according to Anthropic. Users define an agent’s tasks, tools, and guardrails and deploy it on Anthropic infrastructure. A built-in orchestration harness decides when to call tools, how to manage context, and how to recover from errors, the company said.

Claude Managed Agents includes the following:

  • Production-grade agents with secure sandboxing, authentication, and tool execution handled for the user.
  • Long-running sessions that operate autonomously, with progress and outputs that persist through disconnections.
  • Multi-agent coordination that allows agents to spin up and direct other agents to parallelize complex work.
  • Trusted governance that gives agents access to real systems with scoped permissions, identity management, and execution tracing built in.

Notion, Rakuten, and Sentry are already building on Claude Managed Agents, Anthropic said. In internal testing around structured file generation, Claude Managed Agents improved outcome task success by as much as 10 points over a standard prompting loop, with the largest gains on the hardest problems, Anthropic said.

(image/jpeg; 5.96 MB)

Meta’s Muse Spark: a smaller, faster AI model for broad app deployment 9 Apr 2026, 4:10 pm

Meta’s new “small and fast” AI model, Muse Spark, is an acknowledgement that as enterprises scale AI systems beyond millions of users and for use on a greater variety of devices, they must make things more efficient and more application-specific.

Muse Spark now powers the Meta AI assistant on the web and in the Meta AI app, and the company plans to roll it out across WhatsApp, Instagram, Facebook, Messenger, and the company’s smart glasses. It will also offer select partners access to the underlying technology through an API, initially as a private preview.  “We hope to open-source future versions of the model,” it said in a blog post announcing Muse Spark.

While Meta did not disclose the model’s size or much about its architecture, it described Muse Spark as being capable of balancing capability with speed.

That positioning, even without explicit enterprise deployment guidance, aligns with priorities CIOs and developers are increasingly grappling with as they move generative AI from pilots to production, focusing on efficiency, responsiveness, and seamless integration into user-facing software.

The model’s other capabilities, including support for multimodal inputs, multiple reasoning modes, and parallel sub-agents for complex queries, could help enterprises build faster, task-focused AI for customer support, automation, and internal copilots without relying on heavier models.

Meta said it has worked with physicians to improve responses to common health-related questions, underscoring the model’s applicability across a range of use cases, including reasoning tasks in science, math, and healthcare.

It said it had conducted extensive pre-deployment safety evaluations, with particular attention to higher-risk domains such as health and scientific reasoning. The company also touted said it had made improvements in refusal behavior and response reliability, aimed at reducing harmful or unsupported outputs.

It published the results of 20 AI benchmarks for Muse Spark, positioning it as competitive in several areas while not claiming across-the-board leadership. In particular, it highlighted strong performance on health-related assessments, reflecting its focus on improving responses in that domain through targeted training and evaluation.

The model also scored well on multimodal and reasoning-oriented benchmarks, sometimes a little ahead of rivals such as Claude Opus 4.6, Gemini 3.1 Pro, GPT 5.4 or Grok 4.2, sometimes a little behind.

Meta frames the model as part of a broader roadmap, with future models expected to extend capabilities further, suggesting a staged approach rather than a single model designed to lead on all benchmarks.

(image/jpeg; 3.67 MB)

How Agile practices ensure quality in GenAI-assisted development 9 Apr 2026, 9:00 am

Generative AI has revolutionized the space of software development in such a way that developers can now write code at an unprecedented speed. Various tools such as GitHub Copilot, Amazon CodeWhisperer and ChatGPT have become a normal part of how engineers carry out their work nowadays. I have experienced this firsthand, in my roles from leading engineering teams at Amazon to working on large-scale platforms for invoicing and compliance, both the huge boosts in productivity and the equally great risks that come with GenAI-assisted development.

With GenAI, the promise of productivity is very compelling. Developers who use AI coding assistants talk about their productivity going up by 15% to 55%. But most of the time, this speed comes with hidden dangers. To name a few, AI-generated software without good guardrails could open up security issues, lead to technical debt and introduce bugs that are difficult to detect through traditional code reviews. According to McKinsey research, while GenAI tools allow developers to be more productive at a higher level but also require rethinking of software development practices to maintain code quality and security.

The answer is not to abandon these awesome tools altogether. In fact, it is about combining them with reliable engineering practices that the teams already know and trust. In fact, the proper application of traditional Agile methodologies generates the precise guidelines that allow you to benefit from GenAI while also controlling its hazards. In this article, I consider the five basic Agile methodologies: Test-driven development (TDD), behavior-driven development (BDD), acceptance test-driven development (ATDD), pair programming and continuous integration together provide the guardrails to GenAI development, not just to make it quicker, but also of higher quality.

The GenAI code quality crisis: Real-world issues

Before we jump into solutions, it is worth naming the problem. The issues with AI-generated code aren’t theoretical. They’re appearing in production systems across the industry:

  • Security vulnerabilities: In 2023, researchers at Stanford found that developers using AI assistants were more likely to introduce security vulnerabilities into their code, particularly injection flaws and insecure authentication mechanisms. A study published in IEEE Security & Privacy demonstrated that GitHub Copilot suggested vulnerable code approximately 40% of the time across common security-critical scenarios. At one major financial institution, an AI-generated SQL query bypassed parameterization, creating a critical injection vulnerability that wasn’t caught until penetration testing.
  • Hallucinated dependencies: AI models sometimes generate suggestions for libraries, functions or APIs that don’t exist. A group from a healthcare company invested three days in finding the bug in their microservice compiling issue, only to learn that the AI had suggested a nonexistent AWS SDK method. The code seemed legitimate, went through the first review, but the method signature was completely made up.
  • Subtly incorrect business logic: Most misleading of all are mistakes in the business logic that look good on the surface but have subtle defects in them. For example, we came across a line, item tax calculation on an invoice that was AI-generated, which looked perfect, but upon close inspection, it was discovered that rounding was applied at the level of each item rather than at the level of the subtotal. While a brief inspection of the logic indicated that it was correct, the difference in the sequence of rounding would have resulted in the final invoice totals being different from the legal requirements for tax reporting, thus leading to compliance risks and reconciliation errors from millions of transactions.
  • Technical debt accumulation: AI tools focus on producing working code rather than maintainable code. They frequently recommend very nested conditional logic, duplicated code patterns and excessively complex solutions when simpler alternatives are available. Gartner research warned that without strong governance, early GenAI adopters can accumulate cost, complexity and technical debt.
  • Compliance and licensing issues: AI models trained from public code repositories can, at times, generate code that is basically a copy of the code with certain licenses that are incompatible. For industries that are heavily regulated, such as healthcare and finance, this kind of situation poses very serious risks of noncompliance. A pharmaceutical company, as an example, came across AI-generated codes that were very similar to the GPL-licensed open-source software and if the company relies on such a platform, it would be legally exposed.

The root cause: Speed without clear specification

These problems arise from the same root, which is AI producing code based on patterns it has seen, without real understanding of requirements, context or whether the code is correct. It works on probability, for example, “what code pattern from the prompt is most likely” rather than correctness or suitability for the particular case.

Traditional code review, although essential, is not enough to protect against errors from AI-generated code. Most people find it difficult to spot subtle errors in code that looks legitimate and the volume of AI-generated code can easily overwhelm the review capacity. We must have automated, systematic methods that check correctness and not just quick visual inspection.

Agile practices as GenAI guardrails

One can find the answer in the methods that have been around for a long time, even before GenAI, and yet they are great at fixing its flaws. Every one of these methods provides a different type of safety net:

1. Test-driven development (TDD): The correctness validator

The TDD cycle, Red, Green, Refactor provides the most direct protection against incorrect AI-generated code. By writing tests first, you create an executable specification of correctness before the AI generates any implementation.

How it works with GenAI:

  • Red: Write a failing test that specifies the exact behavior you need. This test becomes your requirement specification in executable form.
  • Green: Ask the AI to generate code that makes the test pass. The AI now has a clear, unambiguous target.
  • Refactor: Use AI to suggest improvements to the working code while ensuring tests still pass.

Real-world impact: We applied very strict TDD alongside GenAI-assisted development. Before developers accept any AI suggestions, they should write extensive unit tests that detail all the aspects. This caught a critical line-item tax calculation error, while the AI suggested a simple multiplication that “looked” correct, the test specifically checked for legal rounding requirements (rounding at the subtotal level rather than the line level). Because the test specified these precision requirements, the AI’s initial code failed immediately. Without TDD, this discrepancy would have reached production, resulting in significant compliance risks and revenue reconciliation failures.

Moreover, TDD solves the problem of hallucination of dependencies. For example, if AI offers a method or a library that does not exist, the test will not be able to compile or run, thus providing immediate feedback instead of finding out the issue after several days.

2. Behavior-driven development (BDD): The business logic guardian

BDD extends TDD by focusing on system behavior from the user’s perspective using Given-When-Then scenarios. This is particularly powerful for GenAI-assisted development because it creates human-readable specifications that bridge the gap between business requirements and code.

BDD scenarios serve two critical functions with AI-generated code:

First, they provide context-rich prompts for the AI. Instead of asking “write a function to calculate tax,” you provide a complete scenario: “Given a customer in California, when they purchase a $100 item, then the tax should be $9.25.” The AI has more context to generate correct code.

Second, they create executable business logic tests that catch subtle errors humans might miss. The scenarios are written in plain language by product owners and domain experts, then automated using frameworks like Cucumber or Cypress.

Real-world impact: Compliance platform processes invoices across multiple tax jurisdictions. When we started using AI assistance, we first created comprehensive BDD scenarios covering all tax rules, edge cases and regulatory requirements. These scenarios, written by our tax compliance specialists, became the specification for AI code generation. The AI-generated code that passed all BDD scenarios was correct 95% of the time, far higher than code generated from vague prompts.

3. Acceptance test-driven development (ATDD): The stakeholder alignment tool

ATDD involves customers and stakeholders early in defining automated acceptance tests before development begins. This practice is crucial when using GenAI because it ensures the AI is solving the right problem, not just generating plausible-looking code.

The ATDD workflow with GenAI:

  • Specification Workshop: Product owners, developers and testers collaborate to define acceptance criteria in a testable format. This creates a shared understanding of “done.”
  • Test Automation: Convert acceptance criteria into automated tests before writing implementation code. These tests represent the customer’s definition of success.
  • AI Assisted Implementation: Use GenAI to implement features that satisfy the acceptance tests. The tests prevent the AI from drifting away from actual requirements.

Real-world impact: For a volume-based discount feature, we held ATDD workshops to define a specific requirement: “Buy 10, Get 10% Off” must apply only to the qualifying line items, not the entire invoice total. These became our automated acceptance tests. When developers used GenAI to implement the logic, the AI suggested a simple, global discount function that subtracted 10% from the final balance, a common coding pattern for retail, but incorrect for our B2B contractual rules. Because the ATDD test validated the discount at the line-item level, the AI’s “perfect-looking” code failed immediately. This prevented a logic error that would have resulted in significant over-discounting and lost revenue across thousands of bulk orders.

4. Pair programming: The human-AI collaboration model

Traditional pair programming involves two developers working together, often one writing tests and the other writing implementation. With GenAI, this model evolves into a powerful three-way collaboration: Developer A writes tests, Developer B reviews AI-generated code and the AI serves as a rapid implementation assistant.

The enhanced pair programming workflow:

  • Navigator Role: One developer focuses on writing comprehensive tests and thinking about edge cases, security implications and architectural fit. They are not distracted by implementation details.
  • Driver Role: The other developer works with the AI to generate implementation code, critically evaluating each suggestion. They serve as the quality filter for AI output.
  • AI Assistant: Generates implementation suggestions based on tests and context, accelerating the coding process while the human pair ensures quality.

Real-world impact: A recent study by GitClear found that code quality metrics declined when developers used AI tools in isolation but improved when used in pair programming contexts. We recommend pair programming for any AI-assisted development of critical systems. The navigator catches security issues and architectural mismatches that the driver, focused on AI output, might miss. We have seen a 60% reduction in post-deployment bugs compared to solo AI-assisted development.

5. Continuous integration (CI): The automated safety net

Continuous integration runs automated test suites every time code is merged. It becomes even more critical with GenAI-assisted development. CI provides the final safety net that catches issues before they reach production.

Enhanced CI pipeline for GenAI code:

  • Comprehensive test execution: Run all unit tests, integration tests, BDD scenarios and acceptance tests on every commit. AI-generated code must pass the entire suite.
  • Static analysis: Include additional static analysis tools that check for common AI-generated code issues like security vulnerabilities, code complexity metrics and licensing compliance.
  • Performance benchmarks: Automated performance tests catch AI-generated code that works correctly but performs poorly at scale.

Real-world impact: Our CI pipeline is configured with specialized checks designed to catch the unique risks of AI-assisted coding. For the invoicing platform, we integrated automated business-rule validators that specifically verify logic like tax rounding and discount applications.

The synergistic effect: Practices working together

The real power emerges when these practices work together. Each creates a different layer of protection:

  • TDD ensures the code works correctly for specified inputs.
  • BDD ensures it implements the right business behavior.
  • ATDD ensures it meets stakeholder expectations.
  • Pair programming ensures human oversight and critical thinking.
  • CI ensures all these checks run automatically and consistently.

Consider a typical user story for an e-commerce platform: “As a customer, I want to apply discount codes so that I can save money on purchases.”

Without Agile practices, a developer might prompt an AI: “Write a function to apply discount codes to shopping carts.” The AI generates plausible-looking code, the developer briefly reviews it and it ships. Hidden issues might include: discount stacking vulnerabilities, floating-point rounding errors, failure to validate expiration dates or SQL injection in the discount code lookup.

With Agile practices:

  • ATDD: Product owner, developer and tester define acceptance criteria: “Given a valid 10% discount code, When applied to a $100 cart, Then the total should be $90.”
  • BDD: Business analyst writes scenarios covering edge cases: expired codes, invalid codes, maximum discount limits, combination rules.
  • TDD: Developer pair writes unit tests first, including security tests for injection attacks, tests for decimal precision and tests for all edge cases.
  • Pair programming: One developer writes tests, the other works with AI to implement, both review the generated code critically.
  • CI: All tests run automatically on commit, plus static analysis for security issues, performance benchmarks and compliance checks.

This multi-layered approach catches issues at different stages: tests catch functional errors, pair programming catches architectural mismatches, CI catches regressions and security issues.

Implementation recommendations

Based on our experience implementing these practices across multiple teams, here are practical recommendations for organizations adopting GenAI development tools:

  • Start with TDD as the foundation: Make test-first development non-negotiable when using AI assistance. This single practice prevents the majority of AI-generated code issues. Invest in training developers on TDD if they’re not already proficient.
  • Enhance code review processes: Traditional code review checklists need updating for AI-generated code. Add specific review criteria: Does the code handle edge cases? Are there obvious security vulnerabilities? Does it match our architectural patterns? Is the complexity appropriate for the problem?
  • Invest in test infrastructure: Strong CI pipelines become even more important. Ensure your pipeline can run comprehensive test suites quickly. Slow test execution discourages frequent commits and reduces the effectiveness of CI as a safety net.
  • Create AI usage guidelines: Document when and how to use AI assistance. Some scenarios might be high risk (security-critical code, financial calculations) and require extra scrutiny. Others might be low-risk (boilerplate code, standard CRUD operations) and benefit most from AI acceleration.
  • Measure and monitor: Track metrics specific to AI-assisted development, such as defect rates in AI-generated vs. human-written code, test coverage trends, time-to-production and post-deployment issues. Use data to refine your practices.

 

Conclusion: Speed with safety

Generative AI is a fundamental change in how we write software that can be compared to the introduction of high-level programming languages or integrated development environments. It brings real and huge productivity gains. But speed without quality is not progress; it is technical debt accumulation at an accelerated rate.

The great thing is that we don’t have to come up with new methods to make use of GenAI safely. The Agile methods that have been used for decades, such as TDD, BDD, ATDD, Pair Programming and CI, are exactly the safety measures we need. These methods have quality at their core through automation, collaboration and continuous verification. They are even more impressive with AI help because they provide objective, automated checks that don’t have the same pattern-matching biases as humans, who are poor at reviewing AI-generated code, have.

Companies that use GenAI tools but keep up with strict software development practices will get the best results. For example, faster development without sacrificing quality, reduced defect rates despite increased velocity and sustainable productivity gains that don’t create future maintenance issues.

Software development in the future is not about just humans or AI. It is humans and AI cooperating within established quality assurance that is protected by proven frameworks. The combination of AI’s speed and the safety of Agile methods allows us to do software development that is both really efficient and of high quality on a large scale.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 7.65 MB)

Rethinking Angular forms: A state-first perspective 9 Apr 2026, 9:00 am

Forms remain one of the most important interaction surfaces in modern web applications. Nearly every product relies on them to capture user input, validate data, and coordinate workflows between users and back-end systems. Yet despite their importance, forms are also one of the areas where front-end complexity tends to accumulate quietly over time.

For simple scenarios, Angular forms feel straightforward to work with. A handful of controls, a few validators, and a submission handler can be implemented quickly and confidently. But the situation changes as applications grow. Nested form groups, dynamic controls, conditional validation, and cross-field dependencies gradually introduce layers of behavior that are difficult to visualize as a single coherent system.

Developers often reach a point where a form technically works but becomes difficult to explain. Adding a new rule or modifying a validation condition can require tracing through observables, validators, and control states that are spread across multiple components. The challenge is rarely a missing feature. Instead, it is the growing difficulty of reasoning about how the system behaves as a whole.

This is not a criticism of Angular forms themselves. The framework has evolved powerful abstractions that solve real problems: keeping view and model synchronized, enforcing validation rules, coordinating asynchronous operations, and maintaining accessibility. These capabilities are essential for production-scale applications.

The more interesting question is architectural rather than technical. What mental model should developers use when reasoning about form behavior in modern Angular applications?

In this article, we step away from specific APIs and instead examine forms from first principles. By looking at forms primarily as state systems rather than event pipelines, we can better understand where complexity originates and why newer reactive primitives such as Angular Signals align naturally with the underlying structure of form logic.

Over the past decade, Angular forms have been shaped primarily by event-driven abstractions and observable streams. As the framework evolves toward signal-based reactivity, it is worth reconsidering whether forms should continue to be modeled primarily around events at all.

Forms are not fundamentally event systems. They are state systems that happen to receive events. This distinction becomes clearer as front-end systems grow larger and validation logic becomes increasingly intertwined with application state.

Many front-end systems have gradually adopted an event-centric mental model, where application behavior is primarily expressed through chains of reactions and emissions. As discussed in my recent InfoWorld article, “We mistook event handling for architecture”, this approach can blur the distinction between reacting to change and representing the underlying state of an application. Forms are one of the areas where that distinction becomes particularly visible.

Why forms became complicated (and why that was reasonable)

To understand why a signal-first approach matters, it is worth briefly revisiting how Angular forms evolved and why complexity was an unavoidable outcome.

Early Angular forms were primarily about synchronization. Input elements need to remain synchronized with the model, and updates must flow in both directions. Template-driven forms relied heavily on two-way binding to achieve this. For small forms, this approach felt intuitive and productive. However, as forms grew larger and more complex, the need for structure became apparent. Validation rules, cross-field dependencies, conditional UI logic, and testability all pushed developers toward a more explicit model.

Reactive forms addressed this need by modeling forms as trees of controls. Each control encapsulated its own value, validation state, and metadata. RxJS observables provided a declarative way to respond to changes over time. Validators, both synchronous and asynchronous, could be attached to controls, and Angular automatically tracked interaction state, such as whether a control was dirty, touched, or pending.

This architecture solved many real problems. It also shifted the dominant mental model from state to events. That shift was reasonable at the time, but it also encouraged developers to think of form behavior primarily as a sequence of reactions rather than as a system defined by state. Developers began reasoning about forms in terms of streams: when a value emits, when a status changes, when a validator runs, and when subscriptions are triggered. In simple cases, this was manageable. In larger forms, it often became difficult to trace why a particular piece of logic executed or why a control entered a specific state.

The deeper issue is not that reactive forms rely on RxJS, but that they often conflate state with coordination. RxJS excels at coordinating asynchronous workflows and reacting to events. It is less well-suited to serve as a primary representation of the state. Forms, however, are overwhelmingly state-driven. At any given moment, a form has a well-defined set of values, validation rules, derived errors, and UI flags. Much of this information can be computed deterministically, without reference to time or event ordering.

As form logic grows, the cost of mixing state representation with event coordination increases. Debugging requires tracing emissions across multiple observables. Understanding behavior requires knowing not only what the state is but also how it arrived at that state. This is the context in which Angular Signals becomes interesting, not as a replacement for RxJS, but as a better fit for modeling form state itself.

Defining form state from first principles

Before introducing any APIs or framework constructs, it is useful to strip the problem down to its essentials and ask a basic question: what is form state?

At its core, a form exists to collect data. This data is typically represented as a plain object composed of strings, numbers, booleans, or nested structures. These values form the canonical source of truth for everything else the form does. Without values, there is no form.

Validation rules operate on those values. They define constraints such as whether a field is required, whether a value conforms to a particular format, or whether multiple fields satisfy a cross-field condition. Importantly, validation rules do not store state. Given the same input values, they always produce the same outcome. They are pure functions of state, not state themselves.

From values and validation rules, we derive validity and error information. A field is either valid or invalid, and specific error messages may apply. At the form level, validity is typically derived by aggregating field-level results. This information is deterministic and can be recalculated at any time from the underlying values.

Forms also track interaction metadata. Whether a field has been touched or modified influences when feedback is shown to the user, but it does not affect the correctness of the data. This metadata exists to improve user experience, not to define business logic.

Finally, there are side effects. Submitting data to a server, persisting drafts, performing asynchronous validation, or navigating to another view are all reactions to state changes. These actions matter, but they are not the state. They are consequences of the state.

Seen through this lens, most of what we consider “form complexity” is not inherent complexity. It is organizational complexity. Derived information is often stored as a mutable state. Validation logic is scattered across imperative callbacks. UI flags are toggled in response to events rather than derived from underlying conditions.

Signals encourage a different organization. They make it natural to treat values as the only mutable input, to express validity and UI state as derived data, and to isolate side effects as explicit reactions. This separation does not introduce new ideas, but it makes existing best practices easier to apply consistently.

Understanding this distinction is essential before adopting any signal-based form API. Without it, signals risk becoming just another abstraction layered on top of existing complexity. With it, they become a tool for simplifying how form behavior is expressed and understood.

The cost of treating the state as events

As reactive forms evolved, the complexity in form logic related to  event coordination soared. Value changes emitted events. Validation status emitted events. Asynchronous validators emitted events. Subscriptions responded to these emissions, producing additional side effects. This model is powerful, but it subtly shifts how developers reason about form behavior.

When form logic is expressed primarily through events, understanding behavior requires temporal reasoning. Developers must ask not only what the current state of the form is, but also how the form arrived at that state. Questions such as “Which emission triggered this validator?” or “Why did this error appear now?” become common. The answers often depend on subscription order, life-cycle timing, or intermediate states that no longer exist.

This event orientation creates an asymmetry in how form behavior can be inspected. Current state, values, errors, and validity can be logged or displayed. The sequence of events that produced that state cannot. Once an emission has passed, it leaves no trace beyond its effects. Debugging becomes an exercise in reconstruction rather than observation.

Over time, the focus on events leads to a common anti-pattern: derived information is promoted to a mutable state. Validation results are stored rather than computed. UI flags are toggled imperatively rather than derived from underlying conditions. These shortcuts reduce immediate friction but increase long-term complexity. The form begins to carry not only its current state but also the historical residue of its manipulation.

The problem becomes more pronounced as forms grow. Cross-field validation introduces dependencies that span multiple controls. Conditional logic ties UI behaviour to combinations of values and interaction states. At this scale, the cost of reasoning in terms of events compounds. Understanding behavior requires tracing emissions across multiple observables, each representing a partial view of the system.

This is not a failure of RxJS or reactive forms. RxJS excels at coordinating asynchronous workflows and reacting to external data streams. The issue arises when event-driven coordination is used as the primary representation of state. Forms, by their nature, are overwhelmingly state-driven. At any given moment, a form has a well-defined configuration of values, rules, and derived outcomes.

Recognizing this mismatch is an important step. It allows us to separate coordination concerns from state representation, and to ask whether some of the complexity we experience is inherent or simply a consequence of the mental model we apply.

Gaining a state-first perspective

Many of the challenges developers encounter when building complex forms are not the result of missing framework features. They arise from how form behavior is structured and reasoned about. When validation rules, UI state, and side effects are coordinated primarily through event flows, understanding the system often requires reconstructing the sequence of events that produced the current state.

A state-first perspective approaches the problem differently. Form values become the central source of truth. Validation rules operate deterministically on that state. Error messages, validity flags, and UI behavior emerge as derived information rather than independently managed pieces of mutable state.

This shift does not invalidate existing Angular Forms patterns, nor does it diminish the usefulness of RxJS where coordination of asynchronous workflows is required. Instead, it clarifies the distinction between two different concerns: representing the state and reacting to events.

Teams that model forms explicitly around state tend to build systems that are easier to inspect, easier to refactor, and easier to reason about as they grow. Angular’s evolving reactivity model opens the door to expressing these ideas more directly.

In the next article in this series, we will examine Angular Signals themselves—what they are, how they differ from observable-driven reactivity, and why their design aligns naturally with the way form state behaves in real applications. From there, the series will explore how signal-driven models can simplify validation, derived state, and large-scale form architecture.

(image/jpeg; 10.03 MB)

Bringing databases and Kubernetes together 9 Apr 2026, 9:00 am

Running databases on Kubernetes is popular. For cloud-native organizations, Kubernetes is the de facto standard approach to running databases. According to Datadog, databases are the most popular workload to deploy in containers, with 45 percent of container-using organizations using this approach. The Data on Kubernetes Community found that production deployments were now common, with the most advanced teams running more than 75 percent of their data workloads in containers.

Kubernetes was not built for stateful workloads originally—the project had to develop multiple new functions like StatefulSets in Kubernetes 1.9 and Operator support for integration with databases later. With that work done over the first 10 years of Kubernetes, you might think that all the hard problems around databases on Kubernetes have been solved. However, that is not the case.

Embracing database as a service with Kubernetes

Today we can run databases in Kubernetes successfully, and match those database workloads alongside the application components that also run in containers. This makes the application development side easier as all the infrastructure is in one place, and can be controlled from one point. While that approach makes the “Day 1” issues around application development easier, it does not deal with many of the “Day 2” issues that still have to be addressed.

Day 2 issues include all the tasks that you need to have running so your application operates effectively over time. That includes looking at resiliency, security, operational management, and business continuity. For developers looking at databases, that means tasks like backup, availability, and failover. Some of these elements are easier in containers. Kubernetes was built to monitor containers in pods and restart images if a problem took place. However, stateful databases require more planning than stateless applications.

Kubernetes Operators can automate some of these processes for you, allowing Kubernetes to work through a database to trigger a cluster to carry out a backup task automatically. But that doesn’t go through the whole process, and it relies on the developer making decisions around how best to follow that process. If you are not an expert in backup or availability, you might be tempted to hand all these concerns over to a third-party provider for them to take care of.

That approach works. Cloud-based database as a service (DBaaS) offerings grew at nearly twice the rate of on-premises deployments according to Gartner — 64% to 36% — as developers went for the easy option. However, this locked them into that particular cloud provider or service. Even when developers might choose an open source database to base their application on, they would still be tied to the provider and their way of doing things. From a mobility perspective, that can represent a serious cost to run a DBaaS rather than doing it yourself.

The future for Kubernetes and database as a service

Automating Kubernetes workloads with Operators can provide the same level of functionality as DBaaS, while still avoiding lock-in to a specific provider. This should fit into how teams want to run internal developer platforms or platform engineering for their developers. However, getting that to work for all databases in a consistent way is its own challenge.

At Percona, we did some of this work around our own project, Everest. But this only supported the databases that we are interested in — namely, MySQL, PostgreSQL and MongoDB. What about other databases that have Operators? How about other systems for managing and observing databases? While the idea of a fully open source database as a service option is great in theory, in practice it needs a community that is willing to get involved and support how that gets built out.

If you really love something, sometimes you have to set it free. Like Woody in Toy Story 3, you just have to say, “So long, partner” with the hope that things go “To Infinity and Beyond” a la Buzz Lightyear. That is what we have done with Everest — or to use its new name, OpenEverest. OpenEverest is now a fully open source project that anyone can get involved in. With the project donated and accepted by the Cloud Native Computing Foundation, this will make running databases on Kubernetes easier for everyone. Over time, OpenEverest will support more databases based on what the community wants to see and where they help with more contributions or support.

For developers, getting Kubernetes and databases together helps them be more productive. But for those running infrastructure or dealing with Day 2 problems around Kubernetes, databases still remain potentially challenging to manage. Dealing with edge cases, automation, and resilience at scale is still a significant hurdle for databases on Kubernetes, yet this approach remains essential if you want to implement platform engineering or internal developer platforms without lock-in. This new open source project is a strong starting point to delivering on that requirement. Making it a community open source software project under a foundation, rather than the preserve of one company, will help this approach to succeed.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 12.74 MB)

Minimus Welcomes Yael Nardi as CBO to Facilitate Strategic Growth 9 Apr 2026, 4:25 am

New York NY: Secure container images startup Minimus, today announced the appointment of Yael Nardi as Chief Business Officer (CBO). In this newly established role, Nardi will lead the company’s next stage of scaling, overseeing growth strategy, operations, and corporate development.

As the market landscape evolves and AI impacts customer acquisition, Minimus is introducing an operational model to scale marketing and strategic alliances, which Nardi will manage.

“We are entering a phase of aggressive expansion that requires rigorous execution and a completely new playbook. Traditional marketing strategies are no longer enough in today’s fast-moving environment. We need an operational powerhouse at the helm. Yael is a world-class operator accustomed to zero-error environments and high-stakes execution. We are choosing intelligence, speed, and strategic alignment, and there is no one I trust more to run this machine.” – Ben Bernstein, CEO at Minimus

Nardi joins Minimus with over 15 years of experience advising startups, global investors, and technology corporations. Most recently, she served as Director at Meitar NY Inc. and Partner at Meitar Law Offices. Nardi was the lead corporate lawyer behind several significant M&A transactions, like Twistlock’s acquisition by Palo Alto Networks (PANW) – a deal in the container image hardening and runtime security space, and others like Wiz, Sales, JFrog Salesforce, and others.

“I have worked with the Minimus team through some of their most critical milestones, and I know firsthand the massive potential of their technology. The demand for near-zero CVE container images and minimal container images with built-in security is only accelerating. Scaling a company in today’s environment requires the same 24/7 rigor, vendor accountability, and strategic precision as closing a major M&A deal. I am thrilled to step into this operational role and build the growth engine that will drive Minimus’s next chapter.” – Yael Nardi, Chief Business Officer, Minimus

Nardi, a Bachelor of Laws (LLB) graduate from Tel Aviv University, will be based at Minimus’s New York City headquarters. In her capacity as CBO, she will work alongside the executive leadership team to achieve the company’s growth targets.

About Minimus

Minimus provides hardened container images and hardened Docker images engineered to achieve near-zero CVE exposure. Built continuously from source with the latest patches and security updates, Minimus images undergo rigorous container image hardening and attack surface reduction, delivering secure container images with seamless supply chain security and built-in compliance for FedRAMP, FIPS 140-3, CIS, and STIG standards. Through automatically generated SBOMs and real-time threat intelligence, Minimus empowers teams to prioritize remediation and avoid over 97% of container vulnerabilities – making it a compelling Chainguard alternative for teams seeking production-hardened, distroless container images at scale. 

For more information, visit minimus.io.

Media Contact

Minimus Public Relations

contact@minimus.io

minimus.io

(image/png; 0.41 MB)

Visual Studio Code 1.115 introduces VS Code Agents app 9 Apr 2026, 2:49 am

Visual Studio Code 1.115, the latest release of the Microsoft’s extensible code editor, previews a companion app called Visual Studio Code Agents, optimized for agent-native development. Additionally, the agent experience in the editor is improved for running terminal commands in the background, according to Microsoft.

Introduced April 8, Visual Studio Code 1.115 can be downloaded from the Visual Studio Code website for Windows, Mac, or Linux.

Available as a Visual Code Insiders early access capability, the VS Code Agents app allows developers to run agentic tasks across projects, by kicking off multiple agent sessions across multiple repos in parallel. Developers can track session progress, view diffs inline, leave feedback for agents, and create pull requests without leaving the app, Microsoft said. Additionally, custom instructions, prompt files, custom agents, Model Context Protocol (MCP) servers, hooks, and plugins all work in the Agents app, along with VS Code customizations such as themes.

VS Code 1.115 also introduces two changes designed to improve the agent experience for running terminal commands in the background. First, a new send_to_terminal tool lets an agent continue interacting with background terminals. For example, if an SSH session times out while waiting for a password prompt, the agent still can send the required input to complete the connection. Previously, background terminals were read-only, with only the get_terminal_output available to the agent to check the terminal’s status. This was particularly limiting when a foreground terminal timed out and moved to the background, because the agent could no longer interact with it.

Second, a new experimental setting, chat.tools.terminal.backgroundNotifications, allows an agent to automatically be notified when a background terminal command finishes or requires user input. This also applies to foreground terminals that time out and are moved to the background. The agent then can take appropriate action, such as reviewing the output or providing input via the send_to_terminal tool. Previously, when a terminal command was running in the background, the agent had to manually call get_terminal_output to check the status. There was no way to know when the command completed or needed input.

Also in VS Code 1.115, when an agent invokes the browser tool, the tool calls now have a more descriptive label and a link to go directly to the target browser tab, Microsoft said. Plus, the Run Playwright Code tool has improved support for long-running scripts. Scripts that take longer than five seconds to run (by default) now return a deferred result for the agent to poll.

VS Code 1.115 follows VS Code 1.114 by a week, with that release featuring streamlined AI chat. Updates to VS Code now arrive weekly instead of monthly, a change in cadence that Microsoft introduced with the VS Code 1.111 release on March 9.

(image/jpeg; 1.44 MB)

Visual Studio Code 1.115 introduces VS Code Agents app 9 Apr 2026, 2:40 am

Visual Studio Code 1.115, the latest release of the Microsoft’s extrensible code editor, previews a companion app called Visual Studio Code Agents, optimized for agent-native development. Additionally, the agent experience in the editor is improved for running terminal commands in the background, according to Microsoft.

Introduced April 8, Visual Studio Code 1.115 can be downloaded from the Visual Studio Code website for Windows, Mac, or Linux.

Available as a Visual Code Insiders early access capability, the VS Code Agents app allows developers to run agentic tasks across projects, by kicking off multiple agent sessions across multiple repos in parallel. Developers can track session progress, view diffs inline, leave feedback for agents, and create pull requests without leaving the app, Microsoft said. Additionally, custom instructions, prompt files, custom agents, Model Context Protocol (MCP) servers, hooks, and plugins all work in the Agents app, along with VS Code customizations such as themes.

VS Code 1.115 also introduces two changes designed to improve the agent experience for running terminal commands in the background. First, a new send_to_terminal tool lets an agent continue interacting with background terminals. For example, if an SSH session times out while waiting for a password prompt, the agent still can send the required input to complete the connection. Previously, background terminals were read-only, with only the get_terminal_output available to the agent to check the terminal’s status. This was particularly limiting when a foreground terminal timed out and moved to the background, because the agent could no longer interact with it.

Second, a new experimental setting, chat.tools.terminal.backgroundNotifications, allows an agent to automatically be notified when a background terminal command finishes or requires user input. This also applies to foreground terminals that time out and are moved to the background. The agent then can take appropriate action, such as reviewing the output or providing input via the send_to_terminal tool. Previously, when a terminal command was running in the background, the agent had to manually call get_terminal_output to check the status. There was no way to know when the command completed or needed input.

Also in VS Code 1.115, when an agent invokes the browser tool, the tool calls now have a more descriptive label and a link to go directly to the target browser tab, Microsoft said. Plus, the Run Playwright Code tool has improved support for long-running scripts. Scripts that take longer than five seconds to run (by default) now return a deferred result for the agent to poll.

VS Code 1.115 follows VS Code 1.114 by a week, with that release featuring streamlined AI chat. Updates to VS Code now arrive weekly instead of monthly, a change in cadence that Microsoft introduced with the VS Code 1.111 release on March 9.

Microsoft announces end of support for ASP.NET Core 2.3 8 Apr 2026, 7:56 pm

Microsoft’s ASP.NET Core 2.3, a version of the company’s open source web development framework for .NET and C#, will reach end of life support on April 7, 2027.

Following that date, Microsoft will no longer provide bug fixes, technical support, or security patches for ASP.NET Core 2.3, the company announced on April 7, exactly a year before the cessation date. ASP.NET Core 2.3 packages—the latest patched versions only—are supported currently on .NET Framework, following the support cycle for those .NET Framework versions. After April 7, 2027, this support will end regardless of the .NET Framework version in use, according to Microsoft. Support for ASP.NET Core 2.3 packages including the Entity Framework 2.3 packages will end on the same date.

Microsoft recommends upgrading to a currently supported version of .NET, such as .NET 10 LTS. To help with the upgrade process, Microsoft recommends using GitHub Copilot modernization, which provides AI-powered assistance in planning and executing migrations to a modern .NET version.

Microsoft detailed the release of ASP.NET 2.3 in February 2025. The company lists the following impacts as a result of its end of support:

  • Applications will continue run; end of support does not break existing applications.
  • No new security updates will be issued for ASP.NET Core 2.3.
  • Continuing to use an unsupported version may expose applications to security vulnerabilities.
  • Technical support will no longer be available for ASP.NET Core 2.3.
  • The ASP.NET Core 2.3 packages will be deprecated.

ASP.NET Core is the open-source version of ASP.NET that runs on macOS, Linux, and Windows. ASP.NET Core first was released in 2016 and is a re-design of earlier Windows-only versions of ASP.NET.

(image/jpeg; 0.64 MB)

AWS turns its S3 storage service into a file system for AI agents 8 Apr 2026, 4:49 pm

Amazon Web Services is making its S3 object storage service easier for AI agents to access with the introduction of a native file system interface. The new interface, S3 Files, will eliminate a longstanding tradeoff between the low cost of S3 and the interactivity of a traditional file system or of Amazon’s Elastic File System (EFS).

“The file system presents S3 objects as files and directories, supporting all Network File System (NFS) v4.1+ operations like creating, reading, updating, and deleting files,” AWS principal developer advocate Sébastien Stormacq wrote in a blog post.

The file system can be accessed directly from any AWS compute instance, container, or function, spanning use cases from production applications to machine learning training and agentic AI systems, Stormacq said.

Analysts saw the change in accessibility as a strategic move by AWS to position S3 as a primary data layer for AI agents and modern applications, moving beyond its traditional use cases in data lakes and batch analytics.

“AWS is aligning S3 with AI, analytics, and distributed application needs where shared, low-latency file access is required on object-resident data. This addresses growing demand from machine learning training, agentic systems, and multi-node workloads that require concurrent read/write access without moving data out of S3,” said Kaustubh K, practice director at Everest Group.

Without a file system in S3, enterprises developing and deploying agentic systems and other modern applications typically had to either use a separate storage system or copy, synchronize, and stage data stored in S3, introducing latency, inconsistency, and operational overhead, said Pareekh Jain, principal analyst at Pareekh Consulting.

Some developers, said Kaustubh, turned to FUSE-based tools such as s3fs or Mountpoint to simulate file systems on top of S3, but these often lacked proper locking, consistency guarantees, and efficient update mechanisms.

In contrast, S3 Files addresses those limitations through native support for file operations, including permissions, locking, and incremental updates, Jain said.

This reduces friction for developers, he said, as they will no longer need to rewrite applications for object storage: existing file-based tools will just work. “Agents also become easier to build, as they can directly read and write files, store memory, and share data. Overall, it reduces the need for extra glue code like sync jobs, caching layers, and file adapters,” Jain said.

This also implications for CIOs, as it simplifies data architecture by bringing everything, including data lakes, file systems, and staging layers, into Amazon S3.

“This approach lowers costs by removing duplication, reducing pipelines, and cutting operational overhead, while also improving governance with a single source of truth and no scattered copies,” Jain said.

S3 Files is now generally available and can be accessed through the AWS Management Console or the Command Line Interface (CLI), where users can create, mount, and deploy file systems.

(image/jpeg; 2.55 MB)

Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours 8 Apr 2026, 10:27 am

Chinese AI company Z.ai has launched GLM-5.1, an open-source coding model it says is built for agentic software engineering. The release comes as AI vendors move beyond autocomplete-style coding tools toward systems that can handle software tasks over longer periods with less human input.

Z.ai said GLM-5.1 can sustain performance over hundreds of iterations, an ability it argues sets it apart from models that lose effectiveness in longer sessions.

As one example, the company said GLM-5.1 improved a vector database optimization task over more than 600 iterations and 6,000 tool calls, reaching 21,500 queries per second, about six times the best result achieved in a single 50-turn session.

In a research note, Z.ai said GLM-5.1 outperformed its predecessor, GLM-5, on several software engineering benchmarks and showed particular strength in repo generation, terminal-based problem solving, and repeated code optimization. The company said the model scored 58.4 on SWE-Bench Pro, compared with 55.1 for GLM-5, and above the scores it listed for OpenAI’s GPT-5.4, Anthropic’s Opus 4.6, and Google’s Gemini 3.1 Pro on that benchmark.

GLM-5.1 has been released under the MIT License and is available through its developer platforms, with model weights also published for local deployment, the company said. That may appeal to enterprises looking for more control over how such tools are deployed.

Longer-running coding agents

Z.ai says long-running performance is a key differentiator for the company when compared to models that lose effectiveness in extended sessions.

Analysts say this is because many current models still plateau or drift after a relatively small number of turns, limiting their usefulness on extended, multi-step software tasks.

Pareekh Jain, CEO of Pareekh Consulting, said the industry is now moving beyond tools that can answer prompts toward systems that can carry out longer assignments with less supervision.

The question, Jain said, is no longer, “What can I ask this AI?” but, “What can I assign to it for the next eight hours?”

For enterprises, that raises the prospect of assigning an agent a ticket in the morning and receiving an optimized solution by day’s end, after it has run hundreds of experiments and profiled the code.

“This capability aligns with real needs such as large refactors, migration programs, and continuous incident resolution,” said Charlie Dai, VP and principal analyst at Forrester. “It suggests that long‑running autonomous agents are becoming more practical, provided enterprises layer in governance, monitoring, and escalation mechanisms to manage risk.”

Open-source appeal grows

GLM-5.1’s release under the MIT License could be significant, especially for companies in regulated or security-sensitive sectors.

“This matters in four key ways,” Jain said. “First, cost. Pricing is much lower than for premium models, and self-hosting lets companies control expenses instead of paying per use. Second, data governance. Sensitive code and data do not have to be sent to external APIs, which is critical in sectors such as finance, healthcare, and defense. Third, customization. Companies can adapt the model to their own codebases and internal tools without restrictions.”

The fourth factor, according to Jain, is geopolitical risk. Although the model is open source, its links to Chinese infrastructure and entities could still raise compliance concerns for some US companies.

Dai said the MIT license makes it easier for companies to run the model on their own systems while adapting it to internal requirements and governance policies. “For many buyers, this makes GLM‑5.1 a viable strategic option alongside commercial models, especially where regulatory constraints, IP sensitivity, or long‑term platform control matter most,” Dai said.

Benchmark credibility

Z.ai cited three benchmarks: SWE-Bench Pro, which tests complex software engineering tasks; NL2Repo, which measures repository generation; and Terminal-Bench 2.0, which evaluates real-world terminal-based problem solving.

“These benchmarks are designed to test coding agents’ advanced coding capabilities, so topping those benchmarks reflects strong coding performance, such as reliability in planning-to-execution, less prompt rework, and faster delivery,” said Lian Jye Su, chief analyst at Omdia. “However, they are still detached from typical enterprise realities.”

Su said public benchmarks still do not capture the messiness of proprietary codebases, legacy systems, and code review workflows. He added that benchmark results come from controlled settings that differ from production, though the gap is closing as more teams adopt agentic setups.

The article originally appeared in ComputerWorld.

(image/jpeg; 4.47 MB)

Microsoft’s new Agent Governance Toolkit targets top OWASP risks for AI agents 8 Apr 2026, 9:38 am

Microsoft has quietly introduced the Agent Governance Toolkit, an open source project designed to monitor and control AI agents during execution as enterprises try, and move them into production workflows.

The toolkit, which is a response to the Open Worldwide Application Security Project’s (OWASP) emerging focus on AI and LLM security risks, adds a runtime security layer that enforces policies to mitigate issues such as prompt injection, and improves visibility into agent behavior across complex, multi-step workflows, Imran Siddique, principal group engineering manager at Microsoft wrote in a blog post.

More specifically, the toolkit maps to OWASP’s top 10 risks for agentic systems, including goal hijacking, tool misuse, identity abuse, supply chain risks, code execution, memory poisoning, insecure communications, cascading failures, human-agent trust exploitation, and rogue agents.

The rationale behind the toolkit, Siddique wrote, stems from how AI systems increasingly resemble loosely governed distributed environments, where multiple untrusted components share resources, make decisions, and interact externally with minimal oversight.

That prompted Microsoft to apply proven design patterns from operating systems, service meshes, and site reliability engineering to bring structure, isolation, and control to these environments, Siddique added.

The result was the Redmond-headquartered giant packaging these principles into the toolkit comprising seven components available in Python, TypeScript, Rust, Go, and .NET.

The cross language approach, Siddique explained, is aimed at meeting developers where they are and enabling integration across heterogeneous enterprise stacks.

As for the components, the toolkit includes modules such as a policy enforcement layer named Agent OS, a secure communication and identity framework named Agent Mesh, an execution control environment named Agent Runtime, and additional components, such as Agent SRE, Agent Compliance, and Agent Lightning, covering reliability, compliance, marketplace governance, and reinforcement learning oversight.

Beyond its modular design, Siddique further wrote that the toolkit is built to work with existing development ecosystems: “We designed the toolkit to be framework-agnostic from day one. Each integration hooks into a framework’s native extension points, LangChain’s callback handlers, CrewAI’s task decorators, Google ADK’s plugin system, Microsoft Agent Framework’s middleware pipeline, so adding governance doesn’t require rewriting agent code.”

This approach, the senior executive explained, would reduce integration overhead and risk, allowing developers to introduce governance controls into production systems without disrupting existing workflows or incurring the cost and complexity of rearchitecting applications.

Siddique even went on to give examples of several framework integrations that are already deployed in production workloads, including LlamaIndex’s TrustedAgentWorker integration.

For those wishing to explore the toolkit, which is currently in public preview, it is available under an MIT license and structured as a monorepo with independently installable components.

Microsoft, in the future, plans to transition the project to a foundation-led model and is already engaging with the OWASP agentic AI community to support broader governance and stewardship, Siddique wrote.

(image/jpeg; 1.57 MB)

The winners and losers of AI coding 8 Apr 2026, 9:00 am

I don’t need to tell you that agentic coding is changing the world of software development. Things are happening so quickly that it’s hard to keep up. Internet years seem like eons compared to agentic coding years.  It seemed like just a few short weeks ago that everyone very suddenly stopped writing code and let Claude Code do all the work because, well, it was a few short weeks ago that it happened. 

It seems like new ideas, tools, and frameworks are popping up every day.

Despite things moving like a cheetah sprinting across the Savannah, I am going to make a few predictions about where the cheetah is going to end up and what will happen when it gets there. 

So long, legacy software

First, legacy software is going to become a thing of the past. You know what I’m talking about—those big balls of mud that have accreted over the last 30 years. The one started by your cousin’s friend who wrote that software for your dad’s laundromat and is now the software recommended by the Coin Laundry Association. The one with seven million lines of hopeless spaghetti code that no one person actually understands, that uses ancient, long-outdated technology, that is impossible to maintain but somehow still works. The one that depends on an entire team of developers and support people to keep running.

Well, someone is going to come along and write a completely fresh, new, unmuddy version of that ball of mud with a coding agent. The perfect example of this is happening in open source with Cloudflare’s EmDash project. Now don’t get me wrong. I have a deep respect for WordPress, the CMS that basically runs the internet. It’s venerable and battle-tested—and bloated and insecure and written in PHP.

EmDash is a “spiritual successor” to WordPress. Cloudflare basically asked, “What would WordPress look like if we started building it today?” Then they started building it using agentic coding, and basically did in a couple of months what WordPress took 24 years to do. Sure, they had WordPress as a template, but it was only because of agentic coding that they were even willing to attempt it. It’s long been thought foolish to say “Let’s rebuild the whole thing from scratch.” Now, with agentic coding, it seems foolish not to.

This is not the last creaky, old-school project that will be re-imagined in the coming days. If your business relies on a big ball of mud, it’s time to start looking at rebuilding it from the ground up before someone else beats you to it.

Ideas, implemented

Second, all those great application ideas you’ve been thinking about but could never find the time to do? Well, now you and millions of other developers can actually do them. I myself am nearing completion on six — six! — of the ideas I’ve been kicking around for years and never found the time to do. Yep, I build them all in parallel, with six different agents running at once. (Thank you, Garry Tan and gstack!)

Now, will there be a lot of slop that comes out of that? Sure. But will there be a huge supply of cool new software that will change the world? Yes, definitely. 

That project you’ve always wanted to do? You can do it now. 

Third, bespoke software will become the norm. Today, a business that needs accounting software will buy a product like Quickbooks or some other off-the-shelf solution, and adapt it to their way of doing things. But going forward, those businesses can create their own accounting package designed specifically for the way they do business. No one knows their domain better than the small business owner themselves. Instead of relying on someone who doesn’t understand the nuances of running your particular plumbing business, you can just talk to Claude Code and build your own solution. 

This is happening today (the head of finance wrote the solution!). If you aren’t considering becoming more efficient via agentic coding, then you might find yourself dealing with competitors that are.

Legacy apps need rewriting. Those side projects need building. That app you need for your business isn’t going to build itself. Three months ago, it all seemed foolish and impossible. Today? You are either the cheetah or the gazelle.

(image/jpeg; 6.26 MB)

Get started with Python’s new frozendict type 8 Apr 2026, 9:00 am

Only very rarely does Python add a new standard data type. Python 3.15, when it’s released later this year, will come with one—an immutable dictionary, frozendict.

Dictionaries in Python correspond to hashmaps in Java. They are a way to associate keys with values. The Python dict, as it’s called, is tremendously powerful and versatile. In fact, the dict structure is used by the CPython interpreter to handle many things internally.

But a dict has a big limitation: it’s not hashable. A hashable type in Python has a hash value that never changes during its lifetime. Strings, numerical values (integers and floats), and tuples are all hashable because they are immutable. Container types, like lists, sets, and, yes, dicts, are mutable, so can’t guarantee they hold the same values over time.

Python has long included a frozenset type—a version of a set that doesn’t change over its lifetime and is hashable. Because sets are basically dictionaries with keys and no values, why not also have a frozendict type? Well, after much debate, we finally got just that. If you download Python 3.15 alpha 7 or later, you’ll be able to try it out.

The basics of a frozendict

In many respects, a frozendict behaves exactly like a regular dictionary. The main difference is you can’t use the conventional dictionary constructor (the {} syntax) to make one. You must use the frozendict() constructor:

my_frozendict = frozendict(
    x = 1, y = True, z = "Hello"
)

You can also take an existing dictionary and give it to the constructor:

my_frozendict = frozendict(
    {x:1, y:True, z:"Hello", "A string":"Another string"}
)

One big advantage of using a dict as the source is that you have more control over what the keys can be. In the above example, we can’t use "A string" as a key in the first constructor, because that’s not a valid argument name. But we can use any string we like as a dict key.

The new frozendict bears some resemblance to an existing type in the collections module, collections.frozenmap. But frozendict differs in several key ways:

  • frozendict is built-in, so doesn’t need to be imported from a module.
  • frozenmap does not preserve insertion order.
  • Lookups for keys in a frozenmap are potentially slower (O(log n) than in a frozendict (O(1)).

Working with frozendicts

A frozendict behaves exactly like a regular dict as long as all you’re doing is reading values from it.

For instance, if you want to get a value using a key, it’s the same: use the syntax the_frozendict[the_key]. If you want to iterate through a frozendict, that works the same way as with a regular dict: for key in the_frozendict:. Likewise for key/value pairs: for key, value in the_frozendict.items(): will work as expected.

Another convenient aspect of a frozendict is that they preserve insertion order. This feature was added relatively recently to dictionaries, and can be used to do things like create FIFO queues there. That the frozendict preserves the same behavior is very useful; it means you can iterate through a frozendict created from a regular dictionary and get the same items in the same sequence.

What frozendicts don’t let you do

The one big thing you can’t do with a frozendict is change its contents in any way. You can’t add keys, reassign their values, or remove keys. That means all of the following code would be invalid:

# new key x
my_frozendict[x]=y
# existing key q
my_frozendict[q]=p
# removing item
my_frozendict.pop()

Each of these would raise an exception. In the case of myfrozendict.pop(), note that the method .pop() doesn’t even exist on a frozendict.

While you can use merge and update operators on a frozendict, the way they work is a little deceptive. They don’t actually change anything; instead, they create a new frozendict object that contains the results of the merge or update. It’s similar to how “changing” a string or tuple really just means constructing a new instance of those types with the changes you want.

# Merge operation
my_frozendict = frozendict(x=1)
my_other_frozendict = frozendict(y=1)
new_fz = my_frozendict | my_other_frozendict

# Update operation
new_fz |= frozendict(x=2)

Use cases for frozendicts

Since a frozendict can’t be changed, it obviously isn’t a substitute for a regular dictionary, and it isn’t meant to be. The frozendict will come in handy when you want to do things like:

  • Store key/value data that is meant to be immutable. For instance, if you collect key/value data from command-line options, you could store them in a frozendict to signal that they should not be altered over the lifetime of the program.
  • Use a dictionary in some circumstance where you need a hashable type. For instance, if you want to use a dictionary as a key in a dictionary, or as an element in a set, a frozendict fits the bill.

It might be tempting to think a frozendict will provide better performance than a regular dict, considering it’s read-only. It’s possible, but not guaranteed, that eventual improvements in Python will enable better performance with immutable types. However, right now, that’s far from being a reason to use them.

(image/jpeg; 4.85 MB)

GitHub Copilot CLI adds Rubber Duck review agent 7 Apr 2026, 11:17 pm

GitHub has introduced an experimental Rubber Duck mode in the GitHub Copilot CLI. The latest addition to the AI-powered coding tool uses a second model from a different AI family to provide a second opinion before enacting the agent’s plan.

The new feature was announced April 6. Introduced in experimental mode, Rubber Duck leverages a second model from a different AI family to act as an independent reviewer, assessing plans and work at the moments where feedback matters most, according to GitHub. Rubber Duck is a focused review agent, powered by a model from a complementary family to a primary Copilot session. The job of Rubber Duck is to check the agent’s work and present a short, focused list of high-value concerns including details the primary agent may have missed, assumptions worth questioning, and edge cases to consider.

Developers can use/experimentalin the Copilot CLI to access Rubber Duck alongside other experimental features.

Evaluating Rubber Duck on SWE-Bench Pro, a benchmark of real-world coding problems drawn from open-source repositories, GitHub found that Claude Sonnet 4.6 paired with Rubber Duck running GPT-5.4 achieved a resolution rate approaching Claude Opus 4.6 running alone, closing 74.7% of the performance gap between Sonnet and Opus. GitHub said Rubber Duck tends to help more with difficult problems, ones that span three-plus files and would normally take 70-plus steps. On these problems, Sonnet plus Rubber Duck scores 3.8% higher than the Sonnet baseline and 4.8% higher on the hardest problems identified across three trials.

GitHub cited these examples of the kinds of problems Rubber Duck finds:

  • Architectural catch (OpenLibrary/async scheduler): Rubber Duck caught that the proposed scheduler would start and immediately exit, running zero jobs—and that even if fixed, one of the scheduled tasks was itself an infinite loop.
  • One-liner bug (OpenLibrary/Solr): Rubber Duck caught a loop that silently overwrote the same dict key on every iteration. Three of four Solr facet categories were being dropped from every search query, with no error thrown.
  • Cross-file conflict (NodeBB/email confirmation): Rubber Duck caught three files that all read from a Redis key which the new code stopped writing. The confirmation UI and cleanup paths would have been silently broken on deploy.

(image/jpeg; 2.69 MB)

The Terraform scaling problem: When infrastructure-as-code becomes infrastructure-as-complexity 7 Apr 2026, 12:41 pm

Terraform promised us a better world. Define your infrastructure in code, version it, review it, and deploy it with confidence. For small teams running a handful of services, that promise holds up beautifully.

Then your organization grows. Teams multiply. Modules branch and fork. State files balloon. And suddenly, that clean declarative vision starts looking a lot like a sprawling monolith that nobody fully understands and everyone is afraid to touch.

If you’ve ever watched a Terraform plan run for 20 minutes, encountered a corrupted state file at 2 a.m. or inherited a Terraform codebase where half the resources are undocumented and a quarter are unmanaged, you know exactly what we’re talking about. This is the Terraform scaling problem, and it’s affecting engineering organizations of every size.

The numbers confirm it isn’t a niche concern. The 2023 State of IaC Report found that 90% of cloud users are already using infrastructure-as-code, with Terraform commanding 76% market share according to the CNCF 2024 Annual Survey. Yet the HashiCorp State of Cloud Strategy Survey 2024 showed that 64% of organizations report a shortage of skilled cloud and automation staff, creating a dangerous gap between Terraform’s adoption and the expertise required to operate it well at scale.

In this post, we break down where Terraform breaks down, why traditional solutions fall short, and how AI-assisted IaC management is offering a credible path forward.

The root causes of Terraform complexity at scale

Terraform’s design philosophy is fundamentally sound: Declarative infrastructure, idempotent operations and a provider ecosystem that covers nearly every cloud service imaginable. The problem isn’t the tool; it’s the gap between how Terraform was designed to work and how large engineering organizations actually operate.

State management becomes a full-time job

Terraform’s state file is both its greatest strength and its biggest liability at scale. State gives Terraform the ability to track what it has deployed and calculate diffs — but as infrastructure grows, that state file becomes a critical shared resource with no native support for distributed access patterns.

Teams running a monolithic state end up with a single point of contention. Engineers queue up to run plans and apply. Locking mechanisms in backends like S3 with DynamoDB help, but they don’t solve the underlying architectural issue: Everyone is competing for the same resource.

The HashiCorp State of Cloud Strategy Survey consistently places state management issues, corruption, drift and locking failures among the top pain points for Terraform users in organizations with more than 50 engineers. When a state file gets corrupted mid-apply, recovery can take hours and require deep expertise. The problem compounds as infrastructure grows: Organizations running more than 500 managed resources in a single workspace routinely report 15–30 minute plan times, turning what should be a fast feedback loop into a deployment bottleneck.

Module sprawl and dependency hell

Terraform modules are the right answer to code reuse. They’re also the source of some of the most painful debugging sessions in platform engineering.

As organizations scale, module libraries grow organically. Teams fork modules to meet specific requirements. Version pinning gets inconsistent. A security patch in a root module requires coordinated updates across dozens of dependent modules — a task that sounds simple until you’re dealing with circular dependencies, incompatible provider versions and module registries that weren’t designed for enterprise governance.

Adopting semantic versioning for Terraform modules has a measurable impact: According to a Moldstud IaC case study (June 2025), approximately 60% of organizations that enforce semantic versioning on module releases report a decrease in deployment failures over six months. Yet most teams don’t adopt this practice until after they’ve experienced the failure modes firsthand. The same research found that teams using peer reviews for Terraform code experience a 30% improvement in code quality but this requires process investment that most fast-moving platform teams skip in the early stages.

The pattern is consistent: What starts as a tidy module hierarchy becomes a tangled dependency graph that requires tribal knowledge to navigate.

Plan times and blast radius

At a certain scale, the Terraform plan stops being a quick feedback loop and starts being a liability. Teams managing thousands of resources in a single workspace can wait 15–30 minutes for a plan to complete. More critically, the blast radius of a single application expands proportionally.

A misconfigured security group rule in a small workspace affects a handful of resources. The same mistake in a large monolithic workspace can cascade across hundreds of resources before anyone can intervene. Terraform’s own declarative model means that configuration errors can trigger resource destruction, a risk that grows with workspace size. This reality pushes teams toward increasingly conservative change management processes, which defeats the core value proposition of IaC in the first place.

There’s a meaningful ROI case for solving this. The Moldstud IaC case study indicates that implementing automated IaC solutions can lead to a 70% reduction in deployment times. But capturing that return requires architectural decisions that prevent plan-time bottlenecks before they compound.

Drift: The silent killer

Infrastructure drift — where the actual state of your cloud environment diverges from what Terraform believes it to be — is among the most insidious challenges at scale. It accumulates slowly, through emergency console changes, partially applied runs and resources created outside of Terraform entirely.

The causes are well-documented: An on-call engineer hotfixes a security group at 3 a.m. and forgets to update the code; an autoscaling event modifies a resource configuration that Terraform manages; a third-party integration quietly changes a setting that Terraform has no visibility into. Each of these is a small divergence. Collectively, they erode the reliability of your entire IaC foundation. Terraform Drift Detection Guide documents how teams across industries are consistently caught off guard by drift accumulation in environments they believed were fully under IaC control.

By the time drift becomes visible, it’s often embedded deep enough to make remediation genuinely risky. The DORA 2023 State of DevOps Report found that teams dealing with frequent configuration drift had 2.3× higher change failure rates than teams maintaining consistent IaC hygiene. The compounding effect is significant: Drift erodes confidence in your IaC, which leads to more manual changes, which causes more drift.

Why traditional approaches fall short

The conventional responses to Terraform scaling challenges are well-documented: Workspace decomposition, remote state backends, CI/CD pipelines with policy enforcement and module registries with semantic versioning. These are all necessary practices. They’re also insufficient on their own.

  • Workspace decomposition reduces blast radius but multiplies operational overhead. You’re trading one large problem for many smaller ones, each requiring its own state management, access controls and pipeline configuration. Managing 200 workspaces is a full-time engineering effort.
  • CI/CD enforcement catches policy violations after the fact. By the time a plan hits your pipeline, an engineer has already spent time writing code that may get rejected. Feedback loops are slow, and the root cause — the complexity of authoring correct IaC at scale — remains unsolved.
  • Manual code reviews don’t scale. Platform teams can become bottlenecks when every Terraform change requires expert review to validate correctness, security posture and compliance. The cognitive load required to review infrastructure changes accurately is substantial, and reviewers burn out. This bottleneck is only sharpened by the talent shortage: With 64% of organizations reporting a shortage of skilled cloud and automation staff, the supply of qualified reviewers isn’t growing fast enough to match Terraform’s adoption curve.

The honest assessment: These solutions manage Terraform complexity rather than resolving it. They require ongoing investment in tooling, process and expertise that many organizations struggle to maintain.

This is exactly the friction that StackGen’s Intent-to-Infrastructure Platform was designed to address. Rather than adding more manual process overhead, it introduces an intelligent layer that helps teams author, validate and govern Terraform configurations from the point of intent before complexity accumulates.

Emerging solutions: Where the industry is moving

The Terraform ecosystem is evolving rapidly in response to these challenges. The global IaC market reflects this urgency: Valued at $847 million in 2023, it’s projected to reach $3.76 billion by 2030 at a 24.4% compound annual growth rate, according to Grand View Research’s IaC Market Report. That growth isn’t just adoption — it’s investment in solving the complexity problems that widespread adoption creates.

Workspace automation and orchestration

Tools like Atlantis, Stackgen, Terraform Cloud, are moving toward intelligent workspace orchestration, automatically managing dependencies between workspaces, ordering applies correctly and providing better visibility into cross-workspace impact. This reduces the manual coordination overhead that plagues large-scale Terraform operations.

The key shift is treating your collection of workspaces as a managed system rather than a set of independent units. When a shared networking module changes, an orchestration layer should automatically identify affected workspaces, calculate the propagation order and manage the apply sequence — rather than requiring a human to track and coordinate each dependency manually.

Policy-as-code with earlier enforcement

Open Policy Agent (OPA) and HashiCorp Sentinel have matured significantly. More importantly, teams are learning to push policy enforcement left — validating Terraform plans against organizational policies before they hit a CI/CD pipeline, and ideally before they’re even submitted for review.

HashiCorp has reported that teams using Sentinel with pre-plan validation see a 45% reduction in policy violation-related build failures compared to teams running post-plan enforcement only. Earlier feedback means faster iteration and lower engineer frustration.

AI-assisted IaC management: The emerging frontier

This is where the most significant innovation is happening. AI-assisted infrastructure management addresses the problems that automation alone can’t solve: The cognitive complexity of understanding large IaC codebases, identifying drift patterns before they become critical and translating high-level intent into correct, compliant Terraform code.

Platforms like StackGen’s Intent-to-Infrastructure Platform represent a new paradigm here. Rather than requiring platform engineers to manually author and review every Terraform resource definition, StackGen interprets infrastructure intent — expressed in natural language or high-level policy- and generates compliant Terraform configurations, validates them against organizational standards and surfaces potential issues before they reach production. This directly addresses the bottleneck where expert review becomes a constraint on velocity.

The practical applications are concrete:

  • Drift detection and remediation: AI models trained on infrastructure patterns can identify anomalous drift, distinguishing between expected configuration changes and unauthorized modifications, and surface remediation recommendations with context about impact and risk. This is particularly powerful for teams managing hundreds of workspaces where manual drift monitoring isn’t practical.
  • Intelligent module recommendations: Rather than requiring engineers to navigate sprawling module registries manually, AI-assisted tooling can analyse an infrastructure request, identify the most appropriate existing modules and flag where new module development is needed. This reduces the “reinvent the wheel” pattern that causes module sprawl.
  • Natural language to IaC: For platform teams managing self-service infrastructure portals, AI translation layers allow development teams to request infrastructure in natural language and receive validated Terraform configurations that conform to organizational standards — without requiring deep Terraform expertise from every team consuming platform services.
  • Proactive complexity warnings: AI analysis of Terraform codebases can identify emerging complexity patterns before they become critical — detecting circular dependencies forming, state files approaching problematic size thresholds or module versioning patterns that suggest future compatibility issues.

Gartner predicts that by 2026, more than 40% of organizations will be using AI-augmented IaC tooling for some portion of their infrastructure management workflow — up from under 10% in 2023. The trajectory is clear, and the window for early-mover advantage is still open.

Practical guidance: Scaling terraform without losing your mind

While AI-assisted tooling continues to mature, there are concrete architectural and process changes your team can adopt today.

  • Decompose by domain, not by team. Workspace boundaries should reflect infrastructure domains (networking, compute, data) rather than organizational team boundaries. Teams change; infrastructure domains are more stable. This reduces the reorganization tax you pay when teams restructure.
  • Treat state as infrastructure. Your state backend deserves the same reliability engineering as production systems. Remote state with versioning, automated backup verification and clear recovery runbooks should be non-negotiable before you’re managing more than a few dozen resources. The HashiCorp State of Cloud Strategy Survey shows that over 80% of enterprises already integrate IaC into their CI/CD pipelines — but pipeline integration doesn’t substitute for state backend reliability.
  • Invest in a private module registry early. Whether you use Terraform Cloud’s built-in registry, a self-hosted solution or a structured module registry with enforced semantic versioning pays compounding dividends as your module library grows. The cost of retrofitting governance onto an ungoverned module library is significantly higher than building in governance from the start.
  • Automate drift detection, not just drift remediation. Drift remediation is expensive; drift detection is cheap. Scheduled Terraform plan runs in CI/CD, combined with alerting on detected drift, give you an early warning system that prevents drift from compounding silently. For teams managing large environments where manual detection becomes impractical, automated drift tooling, whether native to HCP Terraform or third-party solutions, becomes essential infrastructure in its own right.
  • Build a paved road for Terraform consumers. If every application team needs to become a Terraform expert to consume platform services, your platform won’t scale. Build opinionated, simplified interfaces, whether that’s a service catalogue, a self-service portal or an AI-assisted request layer that allows development teams to get the infrastructure they need without requiring deep IaC expertise.

The strategic inflection point

We’re at an inflection point in how the industry thinks about infrastructure-as-code. The original vision of IaC infrastructure defined, versioned and managed like software was correct. The execution, for large-scale organizations, has accumulated significant complexity debt.

The next wave of IaC tooling isn’t about replacing Terraform. Terraform’s declarative model, provider ecosystem and community are genuine strengths that won’t be supplanted quickly. The opportunity is in the layer above Terraform: Intelligent orchestration, AI-assisted authoring, proactive complexity management and intent-driven infrastructure interfaces that make IaC accessible to the full organization rather than just a specialized subset of platform engineers.

Teams that invest in this layer now, whether through emerging platforms, internal tooling or AI-assisted workflows, will build a meaningful operational advantage. Teams that continue fighting Terraform complexity with more Terraform will find themselves spending an increasing proportion of engineering capacity on infrastructure maintenance rather than product development.

The IaC market’s 24.4% CAGR reflects growing awareness that the tools and processes managing this complexity need to evolve as fast as the infrastructure they govern.

Key takeaways

The Terraform scaling problem is real, but it’s solvable. The path forward involves three parallel tracks: Architectural decisions that manage blast radius and reduce state contention; process investments in policy-as-code and module governance; and tooling that uses AI to address the cognitive complexity that has always been the hardest part of IaC at scale.

Your infrastructure code should accelerate your engineering organization, not constrain it. If it’s doing the latter, the problem isn’t your engineers; it’s the layer of tooling and process sitting between intent and deployed infrastructure.

Ready to explore how AI-assisted IaC management can reduce the complexity overhead in your Terraform workflows?

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 4.28 MB)

Page processed in 2.172 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.