Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
New JetBrains platform manages AI coding agents | InfoWorld
Technology insight for the enterpriseNew JetBrains platform manages AI coding agents 24 Mar 2026, 11:12 pm
Seeking to help developers control growing fleets of AI coding agents, JetBrains is introducing JetBrains Central, an agentic development platform for teams to manage and maintain visibility over these agents.
An early access program for JetBrains Central is set to begin in the second quarter of 2026 with a limited number of design partners participating. JetBrains describes the platform as the control and execution plane for agent-driven software production. JetBrains Central is intended to address the difficulties developers face in dealing with the growing number of agents. Developers are increasingly running into challenges with oversight, consistency, and control across these environments, according to JetBrains.
Announced March 24, JetBrains Central acts as a control layer across agentic workflows alongside tools such as the JetBrains’s Air agentic development environment and the Junie LLM-agnostic (large language model) coding agent. JetBrains Central connects developer tools, agents, and development infrastructure into a unified system where automated work can be executed and governed across teams and tools, JetBrains said. Developers can interact with agent workflows from JetBrains IDEs, third-party IDEs, CLI tools, web interfaces, or automated systems. Agents themselves can come from JetBrains or external ecosystems, including Codex, Gemini CLI, or custom agents.
JetBrains Central connects agents with the context needed, including repositories, documentation, and APIs. At the same time, agents operate within real delivery pipelines and infrastructure, interacting with Git repositories, CI/CD systems, cloud environments, and other amenities. When agents need guidance or complete a task, they interact with human teammates through the tools teams already use, such as Slack or Atlassian. This allows agent workflows to operate inside the same systems used by development teams today, rather than in isolated AI tools, according to JetBrains. Specific core capabilities include:
- Governance and control, including policy enforcement, identity and access management, observability, auditability, and cost attribution for agent-driven work. Some of these functionalities are already available via the JetBrains Central Console.
- Agent execution infrastructure, with cloud agent runtimes and computation provisioning, allows agents to run reliably across development environments.
- Agent optimization and context features shared semantic context across repositories and projects. This enables agents to access relevant knowledge and route tasks to the most appropriate models or tools.
New ‘StoatWaffle’ malware auto‑executes attacks on developers 24 Mar 2026, 12:01 pm
A newly disclosed malware strain dubbed “StoatWaffle” is giving fresh teeth to the notorious, developer-targeting “Contagious Interview” threat campaign.
According to NTT Security findings, the malware marks an evolution from the long-running campaign’s user-triggered execution to a near-frictionless compromise embedded directly in developer workflows. Attackers are using blockchain-themed project repositories as decoys, embedding a malicious VS Code configuration file that triggers code execution when the folder is opened and trusted by the victim.
“StoatWaffle is a modular malware implemented by Node.js and it has Stealer and RAT modules,” NTT researchers said in a blog post, adding that the campaign operator “WaterPlum” is “continuously developing new malware and updating existing ones.”
This means tracking Contagious Interview activity may now require widening the scope of detection efforts to include weaponized dev environments, not just malicious packages and interview lures.
Opening a folder is all it takes
StoatWaffle abuses developer trust within Visual Studio Code environments. Instead of relying on users to execute suspicious scripts, like in earlier attacks, attackers are embedding malicious configurations inside legitimate-looking project repositories, often themed around blockchain development, a lure theme that has been consistent with Contagious Interview campaigns.
The trick relies on a “.vscode/tasks.json” file configured with a “runOn: folderOpen” setting. Once a developer opens the project and grants trust, the payload executes automatically without any further clicks. The executed StoatWaffle malware operates a modular, Node.js-based framework that typically unfolds in stages. These stages include a loader, credential harvesting components, and then a remote access trojan (RAT) planted for persistence and pivoting access across systems.
The RAT module maintains regular communication with an attacker-controlled C2 server, executing commands to terminate its own process, change the working directory, list files and directories, navigate to the application directory, retrieve directory details, upload a file, execute Node.js code, and run arbitrary shell commands, among others.
StoatWaffle also exhibits custom behavior depending on the victim’s browser. “If the victim browser was Chromium family, it steals browser extension data besides stored credentials,” the researchers said. “If the victim browser was Firefox, it steals browser extension data besides stored credentials. It reads extensions.json and gets the list of browser extension names, then checks whether the designated keyword is included.”
For victims running macOS, the malware also targets Keychain databases, they added.
Contagious Interview, revisited
StoatWaffle isn’t an isolated campaign. It’s the latest chapter in the Contagious Interview attacks, widely attributed to North Korea-linked threat actors tracked as WaterPlum.
Historically, this campaign has targeted developers and job seekers through fake interview processes, luring them into running malicious code under the guise of technical assessments. Previously, the campaign weaponized npm packages and staged loaders like XORIndex and HexEval, often distributing dozens of malicious packages to infiltrate developer ecosystems at scale.
Team 8, one of the group’s sub-clusters, previously relied on malware such as OtterCookie, shifting to StoatWaffle around December 2025, the researchers said.
The disclosure also shared a set of IP-based indicators of compromise (IOCs), likely tied to C2 infrastructure observed during analysis, to support detection efforts.
The article originally appeared in CSO.
VS Code now updates weekly 24 Mar 2026, 9:00 am
With Microsoft now releasing stable updates to its Visual Studio Code editor weekly instead of just monthly, VS Code Versions 1.112 and 1.111 recently have been released, featuring capabilities such as agent troubleshooting, integrated browser debugging, and Copilot CLI permissions. Also featured is the deprecation of VS Code’s Edit Mode.
VS Code 1.112 was released March 18, while VS Code 1.111 arrived on March 9. Both follow what was a monthly update, VS Code 1.110, released March 4. Download instructions for the editor can be found on the Visual Studio Code website.
Integrated browser debugging on VS Code 1.112 means developers can open web apps directly within VS Code and can start debugging sessions with the integrated browser. This allows interaction with the web app, setting of breakpoints, stepping through code, and inspecting variables without leaving VS Code.
With VS Code 1.111, Edit Mode was officially deprecated. Users can temporarily re-enable Edit Mode via the Code setting chat.editMode.hidden. This setting will remain supported through Version 1.125. Beginning with Version 1.125, Edit Mode will be fully removed, and it will no longer be possible to enable it via settings.
For Copilot CLI sessions in VS Code 1.112, meanwhile, developers can configure permissions for local agent sessions in chat to give agents more autonomy in their actions and to reduce the number of approval requests. Developers can choose between permission levels, including default permissions, bypass approvals, and autopilot.
To reduce risks of locally running Model Context Protocol (MCP) servers, developers with VS Code 1.112 now can run locally configured studio MCP servers in a sandboxed environment on macOS and Linux. Sandboxed servers have restricted file system and network access.
Also in VS Code 1.112, agents can now read image files from disk and binary files natively. This allows developers to use agents for a wider variety of tasks, such as analyzing screenshots, reading data from binary files, and more. Binary files are presented to the agent in a hexdump format.
VS Code 1.111, meanwhile, emphasizes agent capabilities. With this release, developers gained a benefit in agent troubleshooting. To help them understand and troubleshoot agent behavior, developers now can attach a snapshot of agent debug events as context in chat by using #debugEventsSnapshot. This can be used to ask the agent about loaded customizations, token consumption, or to troubleshoot agent behavior. Developers also can select the sparkle chat icon in the top-right corner of the Agent Debug panel to add the debug events snapshot as an attachment to the chat composer. Selecting the attachment opens the Agent Debug panel logs, filtered to the timestamp when the snapshot was taken.
Also in the agent vein, VS Code 1.111 adds a new permissions picker in the Chat view for controlling how much autonomy the agent has. The permission level applies only to the current session. Developers can change it at any time during a session by selecting a different level from the permissions picker.
Further in the agent space, the custom agent frontmatter in VS Code 1.111 adds support for agent-scoped hooks that are only run when a specific agent is selected or when it is invoked via runSubagent. This enables attachment of pre- and post-processing logic to specific agents without affecting other chat interactions.
VS Code 1.111 also featured a preview of an autopilot capability. This lets agents iterate autonomously until they complete their task.
When Windows 11 sneezes, Azure catches cold 24 Mar 2026, 9:00 am
If you look at Microsoft as a collection of product lines, it is easy to conclude that Windows 11 and Azure occupy different universes. One is a client operating system that has irritated its users, confused administrators, and pushed hardware refresh cycles in ways many customers did not want. The other is a hyperscale cloud platform selling compute, storage, data services, and AI infrastructure to enterprises. On paper, these are different businesses. In practice, they are part of the same trust system.
That is why the real question is not whether every unhappy Windows 11 user immediately stops buying Azure. They do not. The short-term connection is too indirect for that. The real issue is whether Microsoft is weakening the strategic gravity that has historically pulled enterprises toward the Microsoft stack. If Windows becomes less loved, less trusted, and less central, then Azure loses one of its quiet but important advantages: the assumption that Microsoft remains the default operating environment from endpoint to identity to server to cloud.
A cascade of Windows 11 problems
Windows 11 did not fail because of one mistake. It became controversial because Microsoft stacked friction on top of friction. The first issue was hardware eligibility. By tightening CPU support and enforcing TPM 2.0 and Secure Boot requirements, Microsoft effectively told a large installed base that perfectly usable machines were no longer good enough for the future of Windows. For many users and businesses, that translated into an involuntary hardware refresh rather than an upgrade. That remains one of the most damaging perception problems around Windows 11 because it turned operating system modernization into a capital expense conversation.
The second issue has been the aggressive insertion of AI features, especially Copilot, into the Windows experience. Recent reporting indicates Microsoft has been reassessing how deeply to push Copilot into Windows 11 after broad criticism that AI was being forced into core workflows rather than offered as a clearly optional capability. That matters because enterprise customers tend to reward optionality and punish coercion. When users believe the operating system is being used as a delivery vehicle for features they did not request, trust erodes quickly.
The third issue is cumulative quality perception. Even where individual complaints differ, the common narrative has been remarkably consistent: too much UX churn, too much product agenda, and not enough attention to core stability and utility. Once that story takes hold, it is no longer just about Windows 11. It becomes about Microsoft’s judgment.
The short-term impact on Azure
In the near term, I do not think the Windows 11 backlash materially dents Azure revenue in a dramatic, visible way. Azure buying decisions are still driven by enterprise agreements, migration road maps, data gravity, AI demand, regulatory requirements, and the practical realities of application modernization. A company does not walk away from its Azure footprint because employees dislike a desktop rollout.
There is also a structural reason the short-term effect is muted. Most Azure customers run a mixed environment already. Even in Microsoft-heavy enterprises, cloud workloads are often Linux-based, containerized, or managed through cross-platform tools. The Azure strategy today is less “run Windows everywhere” and more “meet customers where they are.” That makes the desktop operating system less immediately determinative than it was a decade ago.
However, that should not be confused with immunity. In the short run, Windows 11 can damage Microsoft’s credibility and affect adjacent buying decisions. If CIOs and architects see Microsoft overreaching on the client, they may become more skeptical of broader Microsoft platform bets. Skepticism does not always kill a deal, but it can slow expansion, increase competitive reviews, and make alternatives look more reasonable.
The risk of ecosystem decoupling
This is where the story gets serious. Microsoft’s power historically came from stack continuity. Windows on the desktop led to Windows Server, Active Directory, Microsoft management tools, Microsoft productivity software, Microsoft developers, and eventually Microsoft cloud. The company benefited from a kind of architectural momentum. Even when customers complained, they often stayed because the ecosystem fit together.
If Windows 11 reduces the footprint or strategic relevance of Windows on end-user devices, that continuity weakens. Lenovo is already shipping some lines of business laptops with both Windows and Linux options, a sign that major manufacturers see practical demand for more operating system flexibility. More broadly, mainstream business laptop coverage now treats Linux-capable systems from Lenovo and Dell as credible enterprise choices rather than edge cases. That shift matters. Once manufacturers normalize OS choice, Microsoft loses part of its distribution advantage.
A reduced Windows footprint does not automatically mean Azure declines, but it does make non-Microsoft infrastructure easier to justify. If the endpoint is no longer assumed to be Windows, then the organization becomes more comfortable with Linux-first operations, browser-based productivity, identity abstraction, cross-platform management, and container-native development. At that point, AWS and Google Cloud gain more than competitive parity. They gain narrative momentum.
Who benefits from Microsoft’s missteps
AWS has long benefited from being seen as the neutral default for cloud infrastructure. Google Cloud benefits from strength in data, AI, Kubernetes, and open source. Both providers become more attractive when enterprises want to avoid deeper entanglement with a single vendor’s ecosystem. If Microsoft weakens the emotional and operational case for staying inside that ecosystem, competitors have less resistance to overcome.
Then there is the rise of sovereign clouds and neo clouds. Sovereign cloud offerings are increasingly attractive to governments, regulated industries, and companies navigating regional data control requirements. Neo clouds, especially GPU-centric specialists, are capturing interest from organizations that want AI infrastructure without buying into a full legacy enterprise stack. These providers are not necessarily replacing Azure across the board, but they are fragmenting the market and redefining what “best fit” looks like.
That fragmentation becomes more dangerous for Microsoft if Windows no longer functions as an ecosystem anchor. Once customers accept heterogeneity at the edge, they become more comfortable buying heterogeneity in the cloud.
Microsoft still has time to stop this from spreading. The fix is not complicated, although it may be culturally difficult. Microsoft has to make Windows useful before it makes Windows strategic. That means reducing forced experiences, making Copilot clearly optional, restoring confidence in the value of core OS improvements, and acknowledging that hardware gating created real resentment. It also means understanding that endpoint trust is not a side issue. It is part of the company’s larger cloud positioning.
If Microsoft treats Windows 11 as merely a noisy consumer controversy, it will miss the enterprise lesson. Platforms are built on confidence. Confidence on the desktop influences confidence in the data center and the cloud. The short-term Azure impact may be modest, but the long-term risk is real: If Windows stops being the front door to the Microsoft universe, Azure stops being the default destination.
That is how desktop mistakes become cloud problems. Not all at once, but gradually and then faster than expected.
Designing self-healing microservices with recovery-aware redrive frameworks 24 Mar 2026, 9:00 am
Cloud-native microservices are built for resilience, but true fault tolerance requires more than automatic retries. In complex distributed systems, a single failure can cascade across multiple services, databases, caches or third-party APIs, causing widespread disruptions. Traditional retry mechanisms, if applied blindly, can exacerbate failures and create what is known as a retry storm, an exponential amplification of failed requests across dependent services.
This article presents a recovery-aware redrive framework, a design approach that enables self-healing microservices. By capturing failed requests, continuously monitoring service health and replaying requests only after recovery is confirmed, systems can achieve controlled, reliable recovery without manual intervention.
Challenges with traditional retry mechanisms
Retry storms occur when multiple services retry failed requests independently without knowledge of downstream system health. Consider the following scenario:
- Service A calls Service B, which is experiencing high latency.
- Both services implement automatic retries.
- Each failed request is retried multiple times across layers.
In complex systems where services depend on multiple layers of other services, a single failed request can be retried multiple times at each layer. This can quickly multiply the number of requests across the system, overwhelming downstream services, delaying recovery, increasing latency and potentially triggering cascading failures even in components that were otherwise healthy.
Recovery-aware redrive framework: System design
The recovery-aware redrive framework is designed to prevent retry storms while ensuring all failed requests are eventually processed. Its core design principles include:
- Failure capture: All failed requests are persisted in a durable queue (e.g., Amazon SQS) along with their payloads, timestamps, retry metadata and failure type. This guarantees exact replay semantics.
- Service health monitoring: A serverless monitoring function (e.g., AWS Lambda) evaluates downstream service metrics, including error rates, latency and circuit breaker states. Requests remain queued until recovery is confirmed.
- Controlled replay: Once system health indicates recovery, queued requests are replayed at a controlled rate. Failed requests during replay are re-enqueued, enabling multi-cycle recovery while avoiding retry storms. Replay throughput can be dynamically adjusted to match service capacity.

Anshul Gupta
Operational flow
The framework operates in three stages:
- Failure detection: Requests failing at any service are captured with full metadata in the durable queue.
- Monitoring and recovery detection: Health metrics are continuously analyzed. Recovery is considered achieved when all monitored metrics fall within predefined thresholds.
- Replay execution: Requests are replayed safely after recovery, with throughput limited to prevent overload. Failures during replay are returned to the queue for subsequent attempts.
This design ensures safe, predictable retries without amplifying failures. By decoupling failure capture from replay and gating retries based on real-time service health, the system prevents premature retries that could overwhelm recovering services. It also maintains end-to-end request integrity, guaranteeing that all failed requests are eventually processed while preserving the original payload and semantics. This approach reduces operational risk, avoids cascading failures and supports observability, allowing engineers to track failures, recovery events and replay activity in a controlled and auditable manner.
Implementation in cloud-native environments
A practical implementation involves:
- Failure capture function: Intercepts failed API calls and writes them to a queue.
- Monitoring function: Evaluates downstream service health continuously.
- Replay function: Dequeues messages at a controlled rate after recovery, re-queuing failures as necessary.
This decoupling of failure capture from replay enables true self-healing microservices, reducing the need for human intervention during outages.
Benefits of recovery-aware redrive
Implementing a recovery-aware redrive framework offers several operational advantages that directly impact system reliability and resilience. By intelligently managing failed requests and controlling replay based on actual service health, this design not only prevents uncontrolled traffic amplification but also ensures that every request is eventually processed without manual intervention. In addition, it enhances visibility into system behavior, providing actionable insights for troubleshooting and capacity planning. These benefits make the framework particularly well-suited for modern cloud-native environments where stability, observability and cross-platform compatibility are critical.
- Prevents retry storms: Ensures request amplification is bounded.
- Maintains reliability: Guarantees that all failed requests are eventually processed.
- Supports observability: Logs all failures, replay attempts and system metrics for auditing and troubleshooting.
- Platform agnostic: Compatible with Kubernetes, serverless or hybrid cloud environments.
Best practices
- Design requests to be idempotent or safely deduplicated.
- Base monitoring on real system metrics rather than static timers.
- Throttle replay throughput dynamically according to system capacity.
- Maintain audit logs of failures and replay activities for operational transparency.
Conclusion
Self-healing microservices require more than traditional retries. A recovery-aware redrive framework provides a structured approach to capture failed requests, monitor downstream service health and replay them safely after recovery. This framework prevents retry storms, improves observability and enables cloud-native systems to recover autonomously from outages, delivering resilient and reliable services in complex distributed environments.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
7 safeguards for observable AI agents 24 Mar 2026, 9:00 am
Many organizations are under pressure to take their AI agent experiments and proof of concepts out of pilots and into production. Devops teams may have limited time to ensure these AI agents meet AI agent non-negotiable requirements for production deployments, including implementing observability, monitoring, and other agenticops practices.
One question devops teams must answer is what their minimum requirements are to ensure AI agents are observable. Teams can start by extracting fundamentals from devops observability practices and layering in dataops observability for data pipelines and modelops for AI models.
But organizations also must extend their observability standards, especially as AI agents take over role-based tasks, integrate with MCP servers for more complex workflows, and support both human-in-the-middle and autonomous operations.
A key observability question is: Who did what, when, why, and with what information, from where? The challenging part is centralizing this information and having an observability data standard that works regardless of whether the decision or action came from an AI agent or a person.
“Devops should apply the same content and quality processes to AI agents as they do for people by leveraging AI-powered solutions that monitor 100% of interactions from both humans and AI agents,” suggests Rob Scudiere, CTO at Verint. “The next step is observing, managing, and monitoring AI and human agents together because performance oversight and continuous improvement are equally critical.”
I asked experts to share key concepts and their best practices for implementing observable AI agents.
1. Define success criteria and operational governance
Observability is a bottom-up process for capturing data on an AI agent’s inputs, decisions, and operations. Before delving into non-functional requirements for AI agents and defining observability standards, teams should first review top-down goals, operational objectives, and compliance requirements.
Kurt Muehmel, head of AI strategy at Dataiku, says observable agents require three disciplines that many teams treat as afterthoughts:
- Define success criteria because engineers can’t determine what “good” looks like alone. Domain experts need to help build evaluation datasets that capture edge cases only they would recognize.
- Centralize visibility because agents are being built everywhere, including data platforms, cloud services, and across teams.
- Establish technical operational governance before deployment, including evaluation criteria, guardrails, and monitoring.
Observability standards should cover proprietary AI agents, those from top-tier SaaS and security companies, and those from growing startups. Regarding technical operational governance:
- Evaluation criteria can incorporate site reliability concepts around service-level objectives, but should include clear boundaries for poor, unacceptable, or dangerous performance.
- Guardrails should include deployment standards and release-readiness criteria.
- Monitoring should include clear communication and escalation procedures.
2. Define the information to track
Observability of AI agents is non-trivial for a handful of reasons:
- AI agents are not only stateful but have memory and feedback loops to improve decision-making.
- Actions may be triggered by people, autonomously by the AI agent, or orchestrated by another agent via an MCP server.
- Tracking the agent’s behavior requires versioning and change tracking for the underlying datasets, AI models, APIs, infrastructure components, and compliance requirements.
- Observability must account for additional context, including identities, locations, time considerations, and other conditions that can influence an agent’s recommendations.
Given the complexity, it’s not surprising that experts had many suggestions regarding what information to track.
“Teams should treat every agent interaction like a distributed trace with instrumentation at the various decision-making boundaries and capture the prompt, model response, the latency, and the resulting action in order to spot drift, latency issues, or unsafe behaviors in real time,” says Logan Rohloff, tech lead of cloud and observability at RapDev. “Combining these metrics with model-aware signals, such as token usage, confidence scores, policy violations, and MCP interactions enables you to detect when an agent is compromised or acting outside its defined scope.”
Devops teams will need to extend microservice observability principles to support AI agents’ stateful, contextual interactions.
“Don’t overlook the bits around session, context, and workflow identifiers as AI agents are stateful, communicate with each other, and can store and rehydrate sessions,” says Christian Posta, global field CTO at Solo.io. “We need to be able to track causality and flows across this stateful environment, and with microservices, there was always a big challenge getting distributed tracing in place at an organization. Observability is not optional, and without it, there’s no way you can run AI agents and be compliant.”
Agim Emruli, CEO of Flowable, adds that “teams need to establish identity-based access controls, including unique agent credentials and defined permissions, because in multi-agent systems, traceability drives accountability.”
3. Identify errors, hallucinations, and dangerous recommendations
Instrumenting observable APIs and applications helps engineers address errors, identify problem root causes, improve resiliency, and research security and operational issues. The same is true for AI agents that autonomously complete tasks or make recommendations to human operators.
“When an AI agent hallucinates or makes a questionable decision, teams need visibility into the full trajectory, including system prompts, contexts, tool definitions, and all message exchanges,” says Andrew Filev, CEO and founder of Zencoder. “But if that’s your only line of defense, you’re already exposed because agentic systems are open-ended and operate in dynamic environments, requiring real-time verification. This shift started with humans reviewing every result and is now moving toward built-in self- and parallel verification.”
Autonomous verification will be needed as organizations add agents, integrate with MCP servers, and allow agents to connect to sensitive data sources.
“Observing AI agents requires visibility not only into model calls but into the full chain of reasoning, tools, and code paths they activate, so devops can quickly identify hallucinations, broken steps, or unsafe actions,” says Shahar Azulay, CEO and co-founder of Groundcover. “Real-time performance metrics like token usage, latency, and throughput must sit alongside traditional telemetry to detect degradation early and manage the real cost profile of AI in production. And because agents increasingly execute code and access sensitive data, teams need security-focused observability that inspects payloads, validates integrations like MCP, and confirms that every action an agent takes is both authorized and expected.”
4. Ensure AI agent observability addresses risk management
Organizations will recognize greater business value and ROI as they scale AI agents to operational workflows. The implication is that the ecosystem of AI agents’ observability capabilities becomes a fundamental part of the organization’s risk management strategy.
“Make sure that observability of agents extends into tool use: what data sources they access, and how they interact with APIs,” says Graham Neray, co-founder and CEO, Oso. “You should not only be monitoring the actions agents are taking, but also categorizing risk levels of different actions and alerting on any anomalies in agentic actions.”
Risk management leaders will be concerned about rogue agents, data issues, and other IT and security risks that can impact AI agents. Auditors and regulators will expect enterprises to implement robust observability into AI agents and have remediation processes to address unexpected behaviors and other security threats.
5. Extend observability to security monitoring and threat detection
Another consumer of observability data will be security operation centers (SOCs) and security analysts. They will connect the information to data security posture management (DSPM) and other security monitoring tools used for threat detection.
“I expect real insight into how the agent reacts when it connects to external systems because integrations create blind spots that attackers target,” says Amanda Levay, CEO of Redactable. “Leaders need this level of observability because it shows where the agent strains under load, where it misreads context, and where it opens a path that threatens security.”
CISOs will need to extend their operational playbooks as threats from AI actors grow in scale and sophistication.
“Infosec and devops teams need clear visibility into the data transferred to agents, their actions on data and systems, and the requests made of them by users to look for signs of compromise, remediate issues, and perform root-cause analysis, says Mike Rinehart, VP of AI at Securiti AI. “As AI and AI agents become part of important data pipelines, teams must fold governance into prompts, integrations, and deployments so security, privacy, and engineering leaders act from a shared view of the data landscape and the risks that come with it.”
6. Evaluate AI agent performance
Addressing risk management and security concerns is one need for implementing observability in AI agents. The other key question observability can help answer is gauging an AI agent’s performance and providing indicators when improvements are needed.
“When I evaluate AI agents, I expect visibility into how the agent forms its decisions because teams need a clear signal when it drifts from expected behavior,” says Levay of Redactable. “I watch for moments when the agent ignores its normal sources or reaches for shortcuts because those shifts reveal errors that slip past general observability tools.
To evaluate performance, Tim Armandpour, CTO of PagerDuty, says technology leaders must prepare for AI agents that fail subtly rather than catastrophically. He recommends, “Instrument the full decision chain from prompt to output and treat reasoning quality and decision patterns as first-class metrics alongside traditional performance indicators. The teams succeeding at this treat every agent interaction as a security boundary and build observability contracts that make agent behavior auditable and explainable in production.”
7. Prepare for observability AI agents that take action
The natural evolution of observability is when devops organizations turn signals into actions using AI observability agents.
“Observability shouldn’t stop at recording; you should be able to take action if an agent is going astray easily,” says Neray of Oso. “Make sure you can easily restrict agentic actions by tightening access permissions, removing a particular tool, or even fully quarantining an agent to stop rogue behavior.”
Observability data will fuel the next generation of IT and security operational AI agents that will need to monitor a business’s agentic AI operations. The question is whether devops teams will have enough time to implement observability standards, or whether business demand to deploy agents will drive a new era of AI technical debt.
An architecture for engineering AI context 24 Mar 2026, 9:00 am
Ensuring reliable and scalable context management in production environments is one of the most persistent challenges in applied AI systems. As organizations move from experimenting with large language models (LLMs) to embedding them deeply into real applications, context has become the dominant bottleneck. Accuracy, reliability, and trust all depend on whether an AI system can consistently reason over the right information at the right time without overwhelming itself or the underlying model.
Two core architectural components of Empromptu’s end-to-end production AI system, Infinite Memory and the Adaptive Context Engine, were designed to solve this problem, not by expanding raw context windows but by rethinking how context is represented, stored, retrieved, and optimized over time.
The core problem: Context as a system constraint
Empromptu is designed as a full-stack system for building and operating AI applications in real-world environments. Within that system, Infinite Memory and Adaptive Context Engine work together to solve one specific but critical problem: how AI systems retain, select, and apply context reliably as complexity grows.
Infinite Memory provides the persistent memory layer of the system. It is responsible for retaining interactions, decisions, and historical context over time without being constrained by traditional context window limits.
The Adaptive Context Engine provides the attention and selection layer. It determines which parts of that memory, along with current data and code, should be surfaced for any given interaction so the AI can act accurately without being overwhelmed.
Together, these components sit beneath the application layer and above the underlying models. They do not replace foundation models or require custom training. Instead, they orchestrate how information flows into those models, making large, messy, real-world systems usable in production.
In practical terms, Infinite Memory answers the question: What can the system remember? The Adaptive Context Engine answers the question: What should the system pay attention to right now?
Both are designed as infrastructure primitives that plug into Empromptu’s broader platform, which includes evaluation, optimization, governance, and integration with existing codebases. This is what allows the system to support long-running sessions, large codebases, and evolving workflows without degrading accuracy over time.
Most modern AI systems operate within strict context limits imposed by the underlying foundation models. These limits force difficult trade-offs:
- Retain full interaction history and suffer from escalating latency, cost, and performance degradation.
- Periodically summarize past interactions and accept the loss of nuance, intent, and critical decision history.
- Reset context entirely between sessions and rely on users to restate information repeatedly.
These approaches may be acceptable in demos or chatbots, but they break down quickly in production systems that must operate over long time horizons, large document sets, or complex codebases.
In real applications, context is not a linear conversation. It includes prior decisions, system state, user intent, historical failures, domain constraints, and evolving requirements. Treating context as a flat text buffer inevitably leads to hallucinations, regressions, and brittle behavior.
The challenge is not how much context an AI system can hold at once, but how intelligently it can decide what context matters for any given action.
Infinite Memory: Moving beyond context windows
Infinite Memory represents a shift away from treating context as something that must fit inside a single prompt. Instead, it introduces a persistent memory layer that exists independently of the model’s immediate context window.
This memory layer captures all interactions, decisions, corrections, and system state over time. Importantly, Infinite Memory does not attempt to inject all of this information into every request. Instead, it stores information in structured, retrievable forms that can be selectively reintroduced when relevant.
From an architectural perspective, Infinite Memory functions more like a knowledge substrate than a conversation log. Each interaction contributes to a growing memory graph that records:
- User intent and preferences
- Historical decisions and their outcomes
- Corrections and failure modes
- Domain-specific constraints
- Structural information about code, data, or workflows
This allows the system to support conversations and workflows of effectively unlimited length without overwhelming the underlying model. The result is an AI system that never forgets, but also never blindly recalls everything.
Adaptive Context Engine: Attention as infrastructure
If Infinite Memory is the storage layer, the Adaptive Context Engine is the reasoning layer that decides what to surface and when to do so.
Internally, the Adaptive Context Engine is best understood as an attention management system. Its role is to continuously evaluate available memory and determine which elements are necessary for a specific request, task, or decision.
Unlike static prompt engineering approaches, the Adaptive Context Engine is dynamic and self-optimizing. It learns from usage patterns, outcomes, and feedback to improve its context selection over time. Rather than relying on predefined rules, it treats context selection as an evolving optimization problem.
Multi-level context management
The Adaptive Context Engine operates across multiple layers of abstraction, allowing it to manage both conversational and structural context.
Request harmonization
One of the most common failure modes in AI systems is request fragmentation. Users ask for changes, clarifications, and additions across multiple interactions, often referencing previous requests implicitly rather than explicitly.
Request harmonization addresses this by maintaining a continuously updated representation of the user’s cumulative intent. Each new request is merged into a harmonized request object that reflects everything the user has asked for so far, including constraints and dependencies.
This prevents the system from treating each interaction as an isolated command and allows it to reason over intent holistically rather than sequentially.
Synthetic history generation
Rather than replaying full interaction histories, the system generates what we refer to as synthetic histories. A synthetic history is a distilled representation of past interactions that preserves intent, decisions, and constraints while removing redundant or irrelevant conversational detail.
From the model’s perspective, it appears as though there has been a single coherent exchange that already incorporates everything learned so far. This dramatically reduces token usage while also maintaining reasoning continuity. Synthetic histories are regenerated dynamically, allowing the system to evolve its understanding as new information arrives.
Secondary agent control
For complex tasks, particularly those involving large codebases or document collections, a single monolithic context is inefficient and error-prone. The Adaptive Context Engine employs secondary agents that operate as context selectors.
These secondary agents analyze the task at hand and determine which files, functions, or documents require full expansion and which can remain summarized or abstracted. This selective expansion allows the system to reason deeply about specific components without loading entire systems into context unnecessarily.
CORE Memory: Recursive context expansion at scale
The most advanced component of the Adaptive Context Engine is what we call Centrally-Operated Recursively-Expanded Memory (CORE-Memory). This system addresses the challenge of working with large codebases or complex systems by creating associative trees of information.
CORE Memory automatically analyzes functions, files, and documentation to create hierarchical tags and associations. When the AI needs specific functionality, it can recursively search through these tagged associations rather than loading entire codebases into context. This allows for expansion on classes of files by tag or hierarchy, enabling manipulation of specific parts of code without context overload.
A production-grade system
Infinite Memory and the Adaptive Context Engine were built specifically for production environments, not research demos. Several design principles differentiate them from experimental context management approaches.
Self-managing context
The system is capable of operating across hundreds of documents or files while maintaining high accuracy. In production deployments, it consistently handles more than 250 documents without degradation while still achieving accuracy levels approaching 98%. This is accomplished through selective expansion, continuous pruning, and adaptive optimization rather than brute-force context injection.
Continuous optimization
The Adaptive Context Engine learns from real-world usage. It tracks which context selections lead to successful outcomes and which lead to errors or inefficiencies. Over time, this feedback loop allows the system to refine its attention strategies automatically, reducing hallucinations and improving relevance without manual intervention.
Integration flexibility
The architecture is designed to integrate with existing codebases, data stores, and foundation models. It does not require retraining models or rewriting systems. Instead, it acts as an orchestration layer that enhances reliability and performance across diverse environments.
Real-world applications
Together, Infinite Memory and the Adaptive Context Engine enable capabilities that are difficult or impossible with traditional context management approaches.
Extended Conversations
There are no artificial limits on conversation length or complexity. Context persists indefinitely, supporting long-running workflows and evolving requirements without loss of continuity.
Deep code understanding
The system can reason over large, complex codebases while maintaining awareness of architectural intent, historical decisions, and prior modifications.
Learning from failure
Failures are not discarded. The system retains memory of past errors, corrections, and edge cases, allowing it to avoid repeating mistakes and to improve over time.
Cross-session continuity
Context persists across sessions, users, and environments. This allows AI systems to behave consistently and predictably even as usage patterns evolve.
Architectural benefits
Empromptu’s approach with Infinite Memory and the Adaptive Context Engine offers several advantages over traditional context management techniques.
- Scalability without linear cost growth
- Improved reasoning accuracy under real-world constraints
- Adaptability based on actual usage rather than static rules
- Compatibility with existing AI infrastructure
Most importantly, it reframes context not as a hard constraint, but as an intelligent resource that can be managed, optimized, and leveraged strategically.
As AI systems move deeper into production environments, context management has become the defining challenge for reliability and trust. Infinite Memory and the Adaptive Context Engine represent a shift away from brittle prompt-based approaches toward a more resilient, system-level solution. By treating memory, attention, and context selection as first-class infrastructure, it becomes possible to build AI applications that scale in complexity without sacrificing accuracy.
The future of applied AI will not be defined by larger context windows alone, but by architectures that understand what matters and when.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
The agent security mess 23 Mar 2026, 9:00 am
Persistent weak layers (PWLs) have plagued my backcountry skiing for the past 10 years. They’re about to mess up the industry’s IT security, too.
For those who don’t spend their early mornings skinning up mountains in Utah’s backcountry, a persistent weak layer, or PWL, is exactly what it sounds like. It’s a fragile layer of snow, often faceted crystals that form during cold and dry spells, which gets buried by subsequent storms. That PWL lies in wait for a trigger: Perhaps a skier hitting a shallow rock band, a sudden spike in spring temperatures, or a heavy snowfall. At that moment, the entire slab above it shatters, slides, and, all too often, kills people.
Enterprise access control is built on its own version of a colossal PWL. For years, we’ve piled new roles, temporary privileges, and overly broad static profiles on top of an unmanaged foundation of dormant access. The structure has held up because people are relatively gentle triggers: We’re slow, easily distracted, and generally prefer to keep our jobs.
But AI agents aren’t human skiers moving carefully down a slope. They’re a massive, rapid loading event, a trigger primed to spark an “avalanche” in your data center.
OK, computer?
This is the core takeaway from new research published by Oso and Cyera, which finally puts hard numbers to a problem that’s been visible but ignored for years. Their research analyzed 2.4 million workers and 3.6 billion application permissions, and the results should concern us. According to the Oso blind spot report, corporate workers completely ignore 96% of their granted permissions. Over a 90-day window, only 4% of granted permissions were ever actually exercised. With sensitive enterprise data, it’s even worse: Workers touch only 9% of the sensitive data they can actually reach, and nearly one-third of users have the power to modify or delete sensitive data.
Seems ok, right? I mean, the fact that they’re not exercising their rights to certain applications or data isn’t a big deal, is it? So long as they don’t use what they have access to, we’re good. Right?
Nope. Maybe this isn’t an issue in a world where people plod about, ignoring their access rights. But when we add autonomous agents to the mix, things get problematic very, very fast. As I’ve argued, the enterprise AI problem isn’t just a matter of hallucinations. It’s really about permissions. Humans act as a natural governor on permission sprawl. A marketing employee might technically have the right to view a million customer records but will only ever look at the 30 they need to finish their campaign for the quarter. The risk (the “persistent weak layer”) remains entirely dormant.
Agents remove that governor entirely.
When an AI agent inherits a human user account, it inherits the entire permission surface, not just the tiny fraction the human actually used. Because agents operate continuously, chain actions across various systems, and execute whatever privileges they possess without hesitation, they turn latent permission debt into active operational risk. If an agent is told to clean up stale records and it happens to hold the dormant permission to modify the entire database, it will attempt to do exactly that.
Fixing permissions
This aligns perfectly with a drum I’ve been beating for years. Back in 2021, I wrote that authorization was rapidly becoming the most critical unresolved challenge in modern software architecture. A year later, I argued that identity and trust must be baked into the development life cycle, not bolted on by a separate security team right before launch. More recently, I’ve pointed out that large language models demand a totally new approach to authorization, that boring governance is the only path to real AI adoption, and that the true challenge in agentic systems is building a robust AI control plane.
The smartest players in the space are already treating this as table stakes. In its framework for trustworthy agents, Anthropic explicitly notes that systems like Claude Code default to read-only access and require human approval before modifying code or infrastructure. Microsoft offers similar guidance, warning against overprivileged applications and demanding tightly scoped service accounts. They understand that in the age of autonomous software, the old assumption that an application probably won’t use a dormant permission is foolish.
The problem won’t stay neatly confined to a single SaaS application, either. We’re already dealing with a world where nonhuman identities are proliferating rapidly. A 2024 industry report from CyberArk notes that machine identities now outnumber human identities by massive margins, often 80 to 1 or higher. A huge chunk of those machine identities have privileged or sensitive access, and most organizations completely lack identity security controls for AI.
Read-only as a default
So, how do we fix the PWL before the avalanche hits? This isn’t something you solve with a clever prompt, a larger context window, or a new foundational model. It’s an architecture problem.
Putting aside the overprovisioned humans (that’s a separate blog post), we can curtail agentic misuse of permissions by building golden paths where the default state for any new AI agent is strictly read-only. We have to stop the reckless, albeit convenient, practice of letting an agent inherit a broad employee account just to make a pilot project work faster for a sprint demo.
Agents require purpose-built identities with aggressively minimal permissions. If 96% of a human user’s access goes unused anyway, we can’t grant that excess access to a machine. We need environments where the ability to draft an action and the ability to execute it are entirely separate permissions. We need explicit approvals for any destructive actions, and we need every single automated action logged and fully reversible.
We spend so much time debating the intelligence of these new models while we ignore the ground they walk on. AI agents aren’t creating a brand-new authorization crisis. They’re simply exposing the persistent weak layer we’ve been ignoring for years. We tolerated bloated roles and static profiles because humans were slow enough to keep the damage theoretical. Agents make it concrete. Hopefully, they’ll also make us pay attention to authorization in ways we largely haven’t.
How to land a software development job in an AI-focused world 23 Mar 2026, 9:00 am
It seems we are in a very perplexing and somewhat worrisome time in the technology job market. Artificial intelligence is disrupting workflows and changing job descriptions, while many companies are shedding staff due to years of “overhiring.” Some companies are freezing hiring due to market uncertainty.
Whether you are seeking a full-time position or contracting work, AI—for better or worse—has really changed the game for software developers.
AI is not eliminating software developers, but it is changing what it means to be a good one, according to Loren Absher, director and Americas lead for applied AI advisory at ISG, a research and advisory firm. “Organizations are moving away from hiring developers for how fast they can write code and toward hiring for how well they understand the problem they are solving.”
As AI takes over more routine coding, the differentiator shifts to judgment, system design, and context, Absher says. “That context increasingly includes deep understanding of the industry the software is being built for.”
With fewer jobs and more candidates on the market, “searching for a developer role looks different in 2026,” says Kyle Elliott, a career and executive coach specializing in the technology industry.
“Previously, developers could send out 10 applications and expect to land interviews for a significant portion of them,” Elliott says. “Today, they must be much more strategic with the companies and roles they target, how they craft their resumes and position themselves in the market, and even how they follow up after applying.”
“We’re in a confusing transition period, [but] what I’m seeing is not a collapse; it’s a recalibration,” says Sonu Kapoor, an independent software engineer. “AI hasn’t removed the need for software developers. It has raised expectations around how developers work and the value they bring.”
How are successful job seekers landing software development jobs and contract gigs in this turbulent environment? We asked experts for their tips.
Present a balanced resume
Hiring managers want to see a diversity of attributes in candidates, including having the right skills, relevant certifications, and on-the-job experience. Highlighting AI-related skills, certifications, and experience could provide an extra boost.
“Landing a role today still requires a strong foundation,” says Natalia Rodriguez, vice president of talent acquisition at BairesDev, which reviews more than 2.5 million applications per year. “Engineers who succeed often combine strong core skills with newer capabilities, such as machine learning frameworks and emerging AI-specific techniques. Developers should take the time to really sharpen their fundamentals in programming, data structures, and system design.”
While the right skills and certifications are important, it might be having just the right kind of experience that lands the job. “Software developers must be able to demonstrate they’ve built something effectively, not just talk about it,” says Sheldon Arora, CEO of healthcare staffing agency StaffDNA. Companies and employers are looking for successful implementation on real projects, he says.
While technical skills are important, hiring managers are looking for other skills aimed at supporting organizational goals.
“The developers who thrive in this environment will not be those who simply code faster, but those who combine technical depth with industry knowledge, systems thinking, and sound judgment,” Absher says. “Those are deeply human skills, and they are becoming more valuable as AI becomes more capable.”
Specialize in a valuable niche
If you have specialized in a certain discipline within software development and that specialization is in demand, that could be a big advantage in finding a new job or acquiring freelance assignments.
Kapoor has worked in front-end development with Angular for more than 10 years, and for more than 20 years in software development overall. He thinks that long-term focus is why Google awarded him the Google Developer Expert (GDE) designation for Angular.
“In today’s market, deep specialization builds trust, and trust drives hiring decisions,” Kapoor says.
Build your reputation and network
Developers who demonstrate skills through open source, conference talks, podcasts, and in-depth writing stand out immediately to hiring professionals, Kapoor says.
“I’ve landed several high-paying contracts simply because companies already knew my work,” Kapoor says. “Jobs found me rather than the other way around.”
In a confusing and sometimes crowded technical job market, standing out is more important than ever. Personal branding gives professionals an opportunity to showcase their expertise.
“Developers need to think of themselves as products,” Kapoor says. “Building a visible brand, growing a strong network, collaborating with leaders in your field, and choosing the right platforms, such as niche podcasts or respected publications, compounds over time and creates opportunities that traditional job searches often miss.”
Most developers assume the job market lives on LinkedIn and job boards, says Kolby Goodman, career coach at career site The Job Huntr. “What I see is the best opportunities are being talked about in Slack channels, sprint planning, and leadership meetings,” he says. “The developers who get hired quickly are the ones building relationships with team leads and product managers before a job ever exists.”
Become an AI prompt master
You can also make yourself attractive to hiring managers by becoming proficient at the AI skills they are looking for, such as writing quality prompts.
“Developers who know how to use AI tools effectively, especially how to write clear, precise prompts, are dramatically more productive,” Kapoor says. “Prompting isn’t about replacing engineering skill; it’s about amplifying it. Engineers who can translate requirements into high-quality prompts deliver faster without sacrificing quality.” AI shifts the emphasis toward developers who understand systems, context, and long-term impact, Kapoor says. “AI handles speed, experienced engineers handle judgment,” he says.
Consider contract work
Developers can do quite well with freelancing. Contract work lets you choose your clients and assignments, decide your own work hours, and work from home, among other benefits. Contract work also can lead to full-time employment, either with an existing client or another organization.
“Developers should treat contracts as a doorway, not a downgrade,” Arora says. “Contract roles convert to full-time frequently. Developers should be open to contract and full-time work, especially in a market where companies want to prove you’re a fit before making a permanent hire.”
Customize your resume
Sending out a resume that has no relevance to a particular job is a waste of time, especially in a competitive field where hiring managers are looking for specific skill sets and experiences. Customizing your resume helps you stand out and shows that your skills and experiences align with specific job requirements.
“Take the time to tailor your resume to each role,” Elliott says. “Since companies are receiving hundreds, if not thousands, of applications for a single position, you need to clearly demonstrate how you’re aligned. Set a timer for 20 to 30 minutes and use that time to strategically weave keywords from the job description throughout your resume.”
Applying this technique landed a recent technology client seven interviews, notes Elliott, and six of them came from cold online applications with no contacts at the company.
Highlight project deliverables
Make sure your resume includes the tangible results of your projects and assignments, not just a laundry list of skills.
“Nowadays, there are plenty of candidates who write ‘I know 10 frameworks’ on their resumes,” says Anastasiya Levantsevich, head of people and culture at software development company Pynest. “However, finding a specialist who lists ‘delivering results’ as their strength is not so easy.”
The simplest way for a candidate to stand out is to create a portfolio in the before/after style, Levantsevich says. “For example: it was slow, now it’s faster; there was a lot of manual routine work, now there’s less; there was chaos in the logs, now everything is structured and consistent,” she says.
Follow up after you apply
Don’t be afraid to follow up on your application. This can help you stand out in a crowded field of applicants.
“This can feel particularly foreign for developers who may feel uncomfortable cold messaging a recruiter or contact at their target company,” Elliott says. “But it can significantly increase your chances of landing an interview. I’ve had multiple clients secure interviews because they were politely persistent, took the time to find the recruiter or hiring manager on LinkedIn, and forwarded their resume directly.”
Practice your presentation
How you present yourself during the interview stage can make all the difference in landing a job. Hiring managers want to see how your mind works and what kind of effort you put forth in solving problems.
“Demonstrate your thought process, the types of questions you ask, how you verify your work, and in what situations this is necessary, and how and when you use AI as an assistant,” Levantsevich says. “At Pynest, we have hired people after a short dialogue like, ‘I see you have X, I’ve done something similar with Y, I can show you,’” she says. “This sounds professional and saves time for both sides.”
Focus on the right industries
Some sectors are growing faster than others, and therefore might require more software development expertise.
“Healthcare and health tech are among the most durable hiring markets in the U.S. right now,” Arora says. “There are chronic labor shortages, and APIs, integrations, and data pipeline development are skills companies and medical organizations need. It’s also good to have in-depth working knowledge of workflow and operations software.”
“We see growing opportunities for tech talent across healthcare, fintech, logistics, and ecommerce,” Rodriguez says. “Prepare yourself by understanding the domain constraints shaping technical decisions, whether that’s regulatory requirements, data sensitivity, scalability, or reliability expectations. To stand out, make sure to address how you’ve applied your knowledge within industry contexts.”
OpenAI’s desktop superapp: The end of ChatGPT as we know it? 20 Mar 2026, 6:02 pm
OpenAI is reportedly planning to fold its ChatGPT application, Codex coding platform, and AI-powered browser into a single desktop ‘superapp’, a move that signals a shift toward enterprise and developer audiences and away from the consumer market that made the company a household name.
The unified product will merge the ChatGPT interface, the Codex coding tool, and OpenAI’s browser known internally as Atlas into a single desktop application, the Wall Street Journal reported Thursday. The mobile version of ChatGPT is not part of the consolidation and will remain unchanged. OpenAI President Greg Brockman will temporarily oversee the product overhaul and associated organizational changes, while Chief of Applications Fidji Simo leads the commercial effort to bring the new app to market, the report added.
Simo confirmed the plan the same day in a post on X. “Companies go through phases of exploration and phases of refocus; both are critical,” she wrote. “But when new bets start to work, like we’re seeing now with Codex, it’s very important to double down on them and avoid distractions.”
The superapp announcement follows an all-hands meeting on March 16, in which Simo told employees the company needed to stop being distracted by “side quests” and orient aggressively toward coding and business users.
“We realized we were spreading our efforts across too many apps and stacks, and that we need to simplify our efforts,” the Journal reported that day, citing Simo’s address to the employees. “That fragmentation has been slowing us down and making it harder to hit the quality bar we want.” At the same meeting, Simo outlined the commercial imperative plainly: “Our opportunity now is to take those 900 million users and turn them into high-compute users. We’ll do that by transforming ChatGPT into a productivity tool.”
More than a product refresh
The superapp is being designed around agentic AI, systems capable of autonomously executing multi-step tasks such as writing and debugging software, analyzing data, and completing complex workflows without continuous human instruction, the Journal reported. That positions it less as a consumer chatbot and more as an AI-powered work environment aimed at developers and enterprise knowledge workers.
Sanchit Vir Gogia, chief analyst at Greyhound Research, said the move goes beyond product consolidation. “This is not a clean enterprise pivot — it is a forced convergence driven by internal fragmentation, competitive pressure, and the need to monetized where value is actually realized,” he said. “The real value is shifting to where intent becomes action. That is workflows, not conversations.”
The announcement is the latest in a series of enterprise-facing moves. In February, OpenAI launched Frontier, an agent orchestration platform, and announced partnerships with Accenture, BCG, Capgemini, and McKinsey to embed its technology into business workflows.
The numbers behind the pivot
The urgency behind these moves becomes clear when the competitive data is examined. According to enterprise spend management software vendor Ramp, a year ago only one in 25 businesses on its platform paid for Anthropic; today that figure has jumped to nearly one in four. In new enterprise deals, Anthropic is now winning approximately 70% of head-to-head matchups against OpenAI, it said.
Gogia, however, flagged a structural risk. ChatGPT’s dominance was built on simplicity and universal accessibility, qualities a workflow-centric superapp trades away. “In trying to serve consumers, developers, and enterprises within a single interface, OpenAI risks diluting the very clarity that made ChatGPT dominant,” he said.
That risk is compounded by a governance challenge that enterprise IT leaders are only beginning to reckon with.
The governance gap
For IT leaders evaluating OpenAI tooling, Gogia pointed to a deeper challenge the superapp introduces. “The biggest constraint on agentic AI is not capability. It is control,” he said. “Identity management is not designed for non-human actors. Audit trails are incomplete. And there is no mature control plane that governs how agents act, what they access, and how those actions can be reversed or contained.”
Microsoft and Google hold a structural advantage here: Their AI is embedded within platforms that already manage identity, access, and compliance at enterprise scale, a gap enterprise buyers have repeatedly flagged as a persistent concern with OpenAI’s approach. It is precisely that trust deficit that has given Anthropic its opening.
“The battle is no longer about who builds the best chatbot. It is about who owns how work gets done,” Gogia said. “Enterprises are making platform decisions now — and those decisions will not be based on who is most advanced. They will be based on who is most dependable.”
OpenAI did not immediately respond to a request for comment.
This article first appeared on Computerworld.
Google’s Stitch UI design tool is now AI-powered 20 Mar 2026, 5:19 pm
Google is introducing AI into its Stitch UI design tool, enabling anyone to create user-interface designs by describing them in natural language or using markdown.
It can also be used to copy the design of an existing web page — or “easily extract a design system from any URL,” as Google put it in a blog post describing the new feature.
The thinking behind the development is that users will often have a variety of ideas in the initial part of the design process. Businesses will now be able to see a visual representation of those ideas, whether they be generated by text, image or code.
Google has said that Stitch will also be paired with a new design agent that can reason across the entire project’s evolution. In addition, it has introduced an Agent manager that helps users to track their progress as well as allowing them to work on multiple ideas in parallel.
Stop using AI to submit bug reports, says Google 20 Mar 2026, 4:48 pm
Google will no longer accept AI-generated submissions to a program it funded to find bugs in open-source software. However, it is contributing to a separate program that uses AI to strengthen security in open-source code.
The Google Open Source Software Vulnerability Reward Program team is increasingly concerned about the low quality of some AI-generated bug submissions, with many including hallucinations about how a vulnerability can be triggered or reporting bugs with little security impact.
“To ensure our triage teams can focus on the most critical threats, we will now require higher-quality proof (like OSS-Fuzz reproduction or a merged patch) for certain tiers to filter out low-quality reports and allow us to focus on real-world impact,” Google wrote in a blog post.
The Linux Foundation too is finding the volume of AI-generated bug submissions overwhelming and has sought financial help from AI companies including Google, Anthropic, AWS, Microsoft, and OpenAI to deal with the problem. Together, they are contributing $12.5 million to the foundation to improve the security of open-source software.
“Grant funding alone is not going to help solve the problem that AI tools are causing today on open-source security teams,” said Greg Kroah-Hartman of the Linux kernel project in a blog post. “OpenSSF has the active resources needed to support numerous projects that will help these overworked maintainers with the triage and processing of the increased AI-generated security reports they are currently receiving.”
The funding will be managed by open source security project Alpha-Omega and the Open Source Security Foundation (OSSF) and will be used to provide AI tools to help maintainers deal with the volume of AI-generated submissions.
“We are excited to bring maintainer-centric AI security assistance to the hundreds of thousands of projects that power our world,” said Alpha-Omega co-founder Michael Winser.
The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop 20 Mar 2026, 10:00 am
“A single training run can emit as much CO₂ as five cars do in a year.”
That finding from the University of Massachusetts, Amherst, has become the defining statistic of the generative AI era. But for the engineers and data scientists staring at a terminal, the problem isn’t just carbon, it’s the cloud bill.
The industry narrative suggests that the only solution is hardware: buying newer H100s or building massive custom silicon. But after combing through academic benchmarks, cloud billing dashboards and vendor white papers, I’ve found that roughly half of that waste is a “toggle away”.
Training efficiency isn’t about squeezing GPUs harder; it’s about spending smarter for the same accuracy. The following methods focus on training-time cost levers, changes inside the loop that cut waste without touching your model architecture.
(Note: All code examples below are available in the accompanying Green AI Optimization Toolkit repository.)
The compute levers: Taking weight off the chassis
The easiest way to speed up a race car is to take weight off the chassis. In Deep Learning, that weight is precision.
For years, 32-bit floating point (FP32) was the default. But today, switching to Mixed-Precision Math (FP16/INT8) is the highest ROI change a practitioner can make. On hardware with dedicated tensor units, like NVIDIA Ampere/Hopper, AMD RDNA 3 or Intel Gaudi 2, mixed precision can increase throughput by 3x or more.
However, this isn’t a magic wand for everyone. If you are running on pre-2019 GPUs (like the Pascal architecture) that lack Tensor Cores, you might see almost no speed gain while risking numerical instability. Similarly, compliance workloads in finance or healthcare that require bit-exact reproducibility may need to stick to FP32.
But for the 90% of use cases involving memory-bound models (ResNet-50, GPT-2, Stable Diffusion), the shift is essential. It also unlocks Gradient Accumulation, allowing you to train massive models on smaller, cheaper cards by simulating larger batch sizes. The implementation: Here is how to implement mixed precision and gradient accumulation in PyTorch. This setup allows you to simulate a batch size of 64 on a GPU that can only fit 8 samples.
python
# From 'green-ai-optimization-toolkit/01_mixed_precision.py'
import torch
from torch.cuda.amp import autocast, GradScaler
# Simulate a Batch Size of 64 using a Micro-Batch of 8
eff_batch_size = 64
micro_batch = 8
accum_steps = eff_batch_size // micro_batch
scaler = GradScaler() # Prevents gradient underflow in FP16
for i, (data, target) in enumerate(loader):
# 1. The Toggle: Run forward pass in FP16
with autocast():
output = model(data)
loss = criterion(output, target)
loss = loss / accum_steps # Normalize loss
# 2. Scale gradients and accumulate
scaler.scale(loss).backward()
# 3. Step only after N micro-batches
if (i + 1) % accum_steps == 0:
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
The data levers: Feeding the beast
If your GPU utilization is hovering around 40%, you aren’t training a model; you are burning cash. The bottleneck is almost always the data loader.
A common mistake is treating data preprocessing as a per-epoch tax. If you use expensive text tokenizers (like Byte-Pair Encoding) or complex image transforms, cache pre-processed data. Tokenize or resize once, store the result and feed it directly.
Furthermore, look at your file formats. Reading millions of small JPEG or CSV files over a network file system kills I/O throughput due to metadata overhead. Instead, stream data via archives. Sharding your dataset into POSIX tar files or binary formats like Parquet/Avro allows the OS to read ahead, keeping the GPU hungry.
Watch out for:
- Storage ballooning: Caching pre-processed data can triple your storage footprint. You are trading storage cost (cheap) for compute time (expensive).
- Over-pruning: While data deduplication is excellent for web scrapes, be careful with curated medical or legal datasets. Aggressive filtering might discard rare edge cases that are critical for model robustness.
The operational levers: Safety and scheduling
The most expensive training run is the one that crashes 99% of the way through and has to be restarted.
In the cloud, spot instances (or pre-emptible VMs) offer discounts of up to 90%. To use them safely, you must implement robust checkpointing. Save the model state frequently (every epoch or N steps) so that if a node is reclaimed, you lose minutes of work, not days.
Open-source orchestration frameworks like SkyPilot have become essential here. SkyPilot abstracts away the complexity of Spot Instances, automatically handling the recovery of reclaimed nodes and allowing engineers to treat disparate clouds (AWS, GCP, Azure) as a single, cost-optimized resource pool.
You should also implement early stopping. There is no ROI in “polishing noise”. If your validation loss plateaus for 3 epochs, kill the run. This is especially potent for fine-tuning tasks, where most gains arrive in the first few epochs. However, be cautious if you are using curriculum learning, where loss might naturally rise before falling again as harder examples are introduced.
The “smoke test” protocol
Finally, never launch a multi-node job without a dry run. A simple script that runs two batches on a CPU can catch shape mismatches and OOM bugs for pennies.
python
# From 'green-ai-optimization-toolkit/03_smoke_test.py'
def smoke_test(model, loader, device='cpu', steps=2):
"""
Runs a dry-run on CPU to catch shape mismatches
and OOM bugs before the real run starts.
"""
print(f"💨 Running Smoke Test on {device}...")
model.to(device)
model.train()
try:
for i, (data, target) in enumerate(loader):
if i >= steps: break
data, target = data.to(device), target.to(device)
output = model(data)
loss = output.sum()
loss.backward()
print("✅ Smoke Test Passed. Safe to launch expensive job.")
return True
except Exception as e:
print(f"❌ Smoke Test Failed: {e}")
return False
The rapid-fire checklist: 10 tactical quick wins
Beyond the major architectural shifts, there is a long tail of smaller optimizations that, when stacked, yield significant savings. Here is a rapid-fire checklist of tactical wins.
1. Dynamic batch-size auto-tuning
- The tactic: Have the framework probe VRAM at launch and automatically choose the largest safe batch size.
- Best for: Shared GPU clusters (Kubernetes/Slurm) where free memory swings wildly.
- Watch out: Can break real-time streaming SLAs by altering step duration.
2. Continuous profiling
- The tactic: Run lightweight profilers (PyTorch Profiler, NVIDIA Nsight) for a few seconds per epoch.
- Best for: Long jobs (>30 mins). Finding even a 5% hotspot pays back the profiler overhead in a day.
- Watch out: I/O-bound jobs. If GPU utilization is
3. Store tensors in half-precision
- The tactic: Save checkpoints and activations in FP16 (instead of default FP32).
- Best for: Large static embeddings (vision, text). It halves I/O volume and storage costs.
- Watch out: Compliance workloads requiring bit-exact auditing.
4. Early-phase CPU training
- The tactic: Run the first epoch on cheaper CPUs to catch gross bugs before renting GPUs.
- Best for: Complex pipelines with heavy text parsing or JSON decoding.
- Watch out: Tiny datasets where the data transfer time exceeds the compute time.
5. Offline augmentation
- The tactic: Pre-compute heavy transforms (Mosaic, Style Transfer) and store them, rather than computing on-the-fly.
- Best for: Heavy transforms that take >20ms per sample.
- Watch out: Research that studies augmentation randomness; baking it removes variability.
6. Budget alerts & dashboards
- The tactic: Stream cost metrics per run and alert when burn-rate exceeds a threshold.
- Best for: Multi-team organizations to prevent “runaway” billing.
- Watch out: Alert Fatigue. If you ping researchers too often, they will ignore the notifications.
7. Archive stale artifacts
- The tactic: Automatically move checkpoints >90 days old to cold storage (Glacier/Archive tier).
- Best for: Mature projects with hundreds of experimental runs.
- Watch out: Ensure you keep the “Gold Standard” weights on hot storage for inference.
8. Data deduplication
- The tactic: Remove near-duplicate samples before training.
- Best for: Web scrapes and raw sensor logs.
- Watch out: Curated medical/legal datasets where “duplicates” might actually be critical edge cases.
9. Cluster-wide mixed-precision defaults
- The tactic: Enforce FP16 globally via environment variables so no one “forgets” the cheapest knob.
- Best for: MLOps teams managing multi-tenant fleets.
- Watch out: Legacy models that may diverge without specific tuning.
10. Neural architecture search (NAS)
- The tactic: Automate the search for efficient architectures rather than hand-tuning.
- Best for: Long-term production models where efficiency pays dividends over years.
- Watch out: Extremely high upfront compute cost; only worth it if the model will be deployed at massive scale.
Better habits, not just better hardware
You don’t need to wait for an H100 allocation to make your AI stack efficient. By implementing mixed precision, optimizing your data feed and adding operational safety nets, you can drastically reduce both your carbon footprint and your cloud bill.
The most sustainable AI strategy isn’t buying more power, it’s wasting less of what you already have.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Google adds vibe design to Stitch UI design tool 20 Mar 2026, 9:00 am
A key move in Google’s effort is a complete redesign of the Stitch UI. New plans for Stitch were announced March 18. With vibe designing, developers can explore ideas quickly, leading to a higher quality outcome. Instead of starting with a wireframe, developers can start by explaining the business objective they hope to achieve, what they want users to feel, or even examples of what is currently inspiring.
The Stitch UI now features a new AI-native infinite canvas that lets developers grow their ideas from early ideations to working prototypes. The new Stitch canvas is also built to amplify creativity throughout the design process. Developers can bring forth ideas regardless of the shape they take—images, text, or even code—directly to the canvas as context. The canvas context, meanwhile, is paired with a new design agent that can reason across a project’s evolution. Additionally, a new Agent manager is being introduced that tracks progress and helps developers work on multiple ideas in parallel, all while staying organized.
Also being introduced are voice capabilities where developers can speak directly to their canvas. The agent can give developers real-time design critiques and design a new landing page by interviewing the developer. Real-time updates can be made. By acting as a sounding board, AI helps uncover top ideas through dynamic critique and dialogue.
Stitch can act as a bridge to other tools in a team’s workflow. Using the recently released Stitch Model Context Protocol server and SDK, developers can leverage Stitch’s capabilities via skills and tools. Developers can also export designs to developer tools such as AI Studio and Antigravity.
Cloud at 20: Cost, complexity, and control 20 Mar 2026, 9:00 am
When Amazon Web Services launched its Simple Storage Service (S3) in March 2006, it sparked the imagination of IT leaders worldwide. I remember that era well. It was a time when the enterprise was feverishly searching for a way out of restrictive, on-premises silos. S3 and the emerging concept of public cloud promised almost unlimited scalability, pay-as-you-go economics, and the freeing of IT departments from the shackles of hardware, data centers, and routine maintenance. The vision was that enterprises could offload their IT headaches and focus on business outcomes, letting cloud providers do the heavy lifting.
Twenty years later, S3 has ballooned into a monster-scale system of more than 500 trillion objects stored, hundreds of exabytes under management, and operations spanning every corner of the planet. Netflix, Spotify, and other companies have used S3 and the wider AWS ecosystem to reinvent whole industries, scaling to dimensions that would have seemed impossible in the pre-cloud era.
But as we light the candles on S3’s birthday cake, the reality for most enterprises does not match the sleek simplicity once promised. Far from outsourcing all their IT to the cloud, most organizations today find themselves facing greater complexity, reduced transparency, and spiraling costs. Cloud hasn’t eliminated traditional IT problems; it has merely shifted and, in many cases, compounded them.
The failed promise of savings
It turns out, in the vast majority of cases, that moving workloads to the cloud does not inherently reduce costs or complexity. Many enterprises naively expected a simple equation: Moving infrastructure to the cloud equals lower costs plus greater agility. Instead, organizations gradually discovered that the pay-as-you-go model could quickly balloon budgets when left unchecked.
Cloud spending often began as a tiny line item within IT budgets, but two decades on, it’s frequently one of the highest and fastest-growing costs, sometimes eclipsing the very expenses it was meant to replace. Enterprises face a jungle of service tiers, intricate pricing metrics, unexpected data egress fees, licensing quirks, and rapid consumption of compute and storage. The quest for scale and innovation, fueled by cloud computing, often leads to sprawl: hundreds or thousands of workloads distributed across multiple cloud accounts and regions, each adding yet another layer of expense.
The real concern for CIOs isn’t just surprise costs but the struggle to implement cloud discipline. Even for organizations that embrace finops (the emerging practice of cloud financial operations), the speed of innovation outpaces their ability to govern usage. Business units spin up workloads on a whim only to forget or abandon them. Cloud cost visibility is confounded by opaque billing dashboards, and the containerization trend has made it even harder to pin down exactly where money is going.
This runaway spending is not the result of malfeasance, but rather the product of the cloud’s core attribute: enabling rapid, frictionless innovation in exchange for complex, often unpredictable billing. In recent years, the pendulum has begun to swing back, with many organizations repatriating some workloads to on-premises infrastructure just to regain cost control and predictability.
The cloud giveth and taketh away
Early cloud pioneers believed that entrusting security to hyperscale providers would mean fewer headaches. After all, who better to patch, monitor, and defend IT infrastructure than the companies running the largest, most sophisticated data centers on earth? To an extent, this is true. Cloud providers have invested untold billions in every flavor of physical and software security.
But the paradox is that by its very flexibility and openness, cloud introduces enormous security and operational complexity. The infamous rash of open S3 buckets left exposed to the world due to default configurations or simple oversight testified to the challenges of managing distributed, cloud-native architectures. Enterprises must now invest in new skills, new tools, and new processes just to manage the endless stream of updates, permissions, encryption keys, and access policies. Instead of outsourcing pain points, many organizations just replaced one set of hassles with another.
Moreover, the notion of “set it and forget it” in the cloud has proven dangerously outdated. The constant drumbeat of threats, from ransomware to nation-state actors, combined with the proliferation of APIs and services, makes the cloud a shifting, ever-expanding attack surface. Enterprises are forced not only to upskill but also to adopt whole new mindsets around zero trust, observability, and resilience engineering.
The future: more of the same
The original fantasy of cloud was that it would be a single pane of glass: one provider (often AWS), powering an enterprise’s every workload, integrated from edge to core to SaaS. In reality, as we reach this 20-year milestone, we’re in a multicloud reality whether by design, accident, or necessity. Enterprises are now managing portfolios that span AWS, Microsoft Azure, Google Cloud, and sometimes dozens of SaaS or niche providers and their own private clouds.
This shift actually magnifies all previous challenges. Not only do organizations have to master the idiosyncrasies of each provider’s architectures, costs, and security models, but they must also contend with interoperability, data movement, compliance, and the talent gap across every platform in use. The modern IT estate is a patchwork, not a seamless fabric.
As the next chapter of the cloud era unfolds, three trends are set to dominate. First, finops will become inseparable from devops and IT operations as enterprises seek to assert financial discipline over their cloud estates. Second, continuous investment in security will be essential: not just tools, but human capital and process change, with automation serving as a key enabler. Third, complexity won’t disappear; the goal will be to manage and mitigate it, not magically eliminate it.
The 20th anniversary of S3 isn’t just a celebration of a technological achievement; it’s a sobering reminder that the journey to the cloud, for most enterprises, is more marathon than sprint. The dream of outsourced, seamless IT has, for most, become managed, governed, orchestrated complexity. The next 20 years will see continued investments in cost control, security, and managing the multicloud labyrinth. Cloud may have transformed technology forever, but for the enterprise, the work—and the surprises—is just beginning.
AI optimization: How we cut energy costs in social media recommendation systems 20 Mar 2026, 9:00 am
When you scroll through Instagram Reels or browse YouTube, the seamless flow of content feels like magic. But behind that curtain lies a massive, energy-hungry machine. As a software engineer working on recommendation systems at Meta and now Google, I’ve seen firsthand how the quest for better AI models often collides with the physical limits of computing power and energy consumption.
We often talk about “accuracy” and “engagement” as the north stars of AI. But recently, a new metric has become just as critical: efficiency.
At Meta, I worked on the infrastructure powering Instagram Reels recommendations. We were dealing with a platform serving over a billion daily active users. At that scale, even a minor inefficiency in how data is processed or stored snowballs into megawatts of wasted energy and millions of dollars in unnecessary costs. We faced a challenge that is becoming increasingly common in the age of generative AI: how do we make our models smarter without making our data centers hotter?
The answer wasn’t in building a smaller model. It was in rethinking the plumbing — specifically, how we computed, fetched and stored the training data that fueled those models. By optimizing this “invisible” layer of the stack, we achieved over megawatt-scale energy savings and reduced annual operating expenses by eight figures. Here is how we did it.
The hidden cost of the recommendation funnel
To understand the optimization, you have to understand the architecture. Modern recommendation systems generally function like a funnel.
At the top, you have retrieval, where we select thousands of potential candidates from a pool of billions of media items. Next comes early-stage ranking, a high-efficiency phase that filters this large pool down to a smaller set. Finally, we reach late-stage ranking. This is where the heavy lifting happens. We use complex deep learning models — often two-tower architectures that combine user and item embeddings — to precisely order a curated set of 50 to 100 items to maximize user engagement.
This final stage is incredibly feature-dense. To rank a single Reel, the model might look at hundreds of “features.” Some are dense features (like the time a user has spent on the app today) and others are sparse features (like the specific IDs of the last 20 videos watched).
The system doesn’t just use these features to rank content; it also has to log them. Why? Because today’s inference is tomorrow’s training data. If we serve you a video and you “like” it, we need to join that positive label with the exact features the model saw at that moment to retrain and improve the system.
This logging process — writing feature values to a transient key-value (KV) store to wait for user interaction — was our bottleneck.
The challenge of transitive feature logging
To understand why this bottleneck existed, we have to look at the microscopic lifecycle of a single training example.
In a typical serving path, the inference service fetches features from a low-latency feature store to rank a candidate set. However, for a recommendation system to learn, it needs a feedback loop. We must capture the exact state of the world (the features) at the moment of inference and later join them with the user’s future action (the label), such as a “like” or a “click.”
This creates a massive distributed systems challenge: Stateful label joining.
We cannot simply query the feature store again when the user clicks, because features are mutable — a user’s follower count or a video’s popularity changes by the second. Using fresh features with stale labels introduces “online-offline skew,” effectively poisoning the training data.
To solve this, we use a transitive key-value (KV) store. Immediately after ranking, we serialize the feature vector used for inference and write it to a high-throughput KV store with a short time-to-live (TTL). This data sits there, “in transit,” waiting for a client-side signal.
- If the user interacts: The client fires an event, which acts as a key lookup. We retrieve the frozen feature vector from the KV store, join it with the interaction label and flush it to our offline training warehouse (e.g., Hive/Data Lake) as a “source-of-truth” training example.
- If the user does not interact: The TTL expires, and the data is dropped to save costs.
This architecture, while robust for data consistency, is incredibly expensive. We were essentially continuously writing petabytes of high-dimensional feature vectors to a distributed KV store, consuming massive network bandwidth and serialization CPU cycles.
Optimizing the “head load”
We realized that our “write amplification” was out of control. In the late-stage ranking phase, we typically rank a deep buffer of items — say, the top 100 candidates — to ensure the client has enough content cached for a smooth scroll.
The default behavior was eager logging: We would serialize and write the feature vectors for all 100 ranked items into the transitive KV store immediately.
However, user behavior follows a steep decay curve. A user might only view the first 5–6 items (the “head load”) before closing the app or refreshing the feed. This meant we were paying the serialization and I/O cost to store features for items 7 through 100, which had a near-zero probability of generating a positive label. We were effectively DDoS-ing our own infrastructure with “ghost data.”
We shifted to a “lazy logging” architecture.
- Selective persistence: We reconfigured the serving pipeline to only persist features for the Head Load (e.g., top 6 items) into the KV store initially.
- Client-triggered pagination: As the user scrolls past the Head Load, the client triggers a lightweight “pagination” signal. Only then do we asynchronously serialize and log the features for the next batch (items 7–15).
This change decoupled our ranking depth from our storage costs. We could still rank 100 items to find the absolute best content, but we only paid the “storage tax” for the content that actually had a chance of being seen. This reduced our write throughput (QPS) to the KV store significantly, saving megawatts of power previously wasted on serializing data that was destined to expire untouched.
Rethinking storage schemas
Once we reduced what we stored, we looked at how we stored it.
In a standard feature store architecture, data is often stored in a tabular format where every row represents an impression (a specific user seeing a specific item). If we served a batch of 15 items to one user, the logging system would write 15 rows.
Each row contained the item features (which are unique to the video) and the user features (which are identical for all 15 rows). We were effectively writing the user’s age, location and follower count 15 separate times for a single request.
We moved to a batched storage schema. Instead of treating every impression as an isolated event, we separated the data structures. We stored the user features once for the request and stored a list of item features associated with that request.
This simple de-duplication reduced our storage requirement by more than 40%. In distributed systems like the ones powering Instagram or YouTube, storage isn’t passive; it requires CPU to manage, compress and replicate. By slashing the storage footprint, we improved bandwidth availability for the distributed workers fetching data for training, creating a virtuous cycle of efficiency throughout the stack.
Auditing the feature usage
The final piece of the puzzle was spring cleaning. In a system as old and complex as a major social network’s recommendation engine, digital hoarding is a real problem. We had over 100,000 distinct features registered in our system.
However, not all features are created equal. A user’s “age” might carry very little weight in the model compared to “recently liked content.” Yet, both cost resources to compute, fetch and log.
We initiated a large-scale feature auditing program. We analyzed the weights assigned to features by the model and identified thousands that were adding statistically insignificant value to our predictions. Removing these features didn’t just save storage; it reduced the latency of the inference request itself because the model had fewer inputs to process.
The energy imperative
As the industry races toward larger generative AI models, the conversation often focuses on the massive energy cost of training GPUs. Reports indicate that AI energy demand is poised to skyrocket in the coming years.
But for engineers on the ground, the lesson from my time at Meta is that efficiency often comes from the unsexy work of plumbing. It comes from questioning why we move data, how we store it and whether we need it at all.
By optimizing our data flow — lazy logging, schema de-duplication and feature auditing — we proved that you can cut costs and carbon footprints without compromising the user experience. In fact, by freeing up system resources, we often made the application faster and more responsive. Sustainable AI isn’t just about better hardware; it’s about smarter engineering.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
OpenAI buys Python tools builder Astral 19 Mar 2026, 11:05 pm
OpenAI is acquiring Python developer toolmaker Astral, thus bringing open source developer tools into OpenAI’s Codex AI coding system. The acquisition was announced on March 19. Elaborating on the deal, OpenAI said Astral has built widely used open source Python tools, helping developers move faster with modern tools such as uv, Ruff, and ty. These tools power millions of developer workflows and have become part of the foundation of modern Python development, OpenAI said.
By bringing in Astral’s tools and engineering expertise, OpenAI said it will accelerate work on Codex and expand what AI can do across the software development life cycle. OpenAI’s goal with the Codex ecosystem is to move beyond AI that simply generates code and toward systems that can participate in the entire development workflow, helping plan changes, modify codebases, run tools, verify results, and maintain software over time. Astral’s developer tools sit in that workflow. With the integration of these systems with Codex, OpenAI said it will enable AI agents to work more directly with the tools developers already rely on every day.
Python, OpenAI said, has become one of the most important languages in modern software development, powering everything from AI and data science to back-end systems and developer infrastructure. Astral’s open source tools play a key role in that ecosystem, OpenAI said. These tools and their capabilities are cited in OpenAI’s announcement:
- Uv simplifies dependency and environment management.
- Ruff provides fast linting and formatting.
- Ty helps enforce type safety across codebases.
Astral also offers pyx, a Python-native package registry now in beta. OpenAI said that by having Astral, OpenAI will continue to support these open source projects while exploring ways they can work more seamlessly with Codex to enable AI systems to operate across the full Python development workflow.
The closing of the acquisition is subject to customary closing conditions, including regulatory approval, OpenAI said. Until the closing, OpenAI and Astral will remain separate, independent companies. Deeper integrations will be explored that allow Codex to interact more directly with the tools developers already use, helping develop Codex into a true collaborator across the development life cycle, according to OpenAI.
OpenAI buys non-AI coding startup to help its AI to program 19 Mar 2026, 10:51 pm
OpenAI on Thursday announced the acquisition of Astral, the developer of open source Python tools that include uv, Ruff and ty. It says that it plans to integrate them with Codex, its AI coding agent first released last year, as well as continuing to support the open source products.
OpenAI stated in its announcement that its goal with Codex is “to move beyond AI that simply generates code and towards systems that can participate in the entire development workflow — helping plan changes, modify codebases, run tools, verify results, and maintain software over time. Astral’s developer tools sit directly in that workflow. By integrating these systems with Codex after closing, we will enable AI agents to work more directly with the tools developers rely on every day.”
In a blog, Astral founder Charlie Marsh said that since the company was formed in 2023, the “goal has been to build tools that radically change what it feels like to work with Python — tools that feel fast, robust, intuitive and integrated. Today, we are taking a step forward in that mission.”
He added, “In line with our philosophy and OpenAI’s own announcement, OpenAI will continue supporting our open source tools after the deal closes. We’ll keep building in the open, alongside our community – and for the broader Python ecosystem – just as we have from the start.”
Shashi Bellamkonda, principal research director at Info-Tech Research Group, said that many people think that “AI” is just the chat they have with an LLM, not realizing that there is a huge unseen ecosystem of layers that have to work together to help achieve results.
Most of the focus in AI, he said, goes to the model layer: who has the best reasoning, the fastest inference, the biggest context window. But the model is useless if the environment it operates in is broken, slow, or unreliable.
With its acquisition of Astral, OpenAI “is hoping to be more efficient with its coding, since the code has to run somewhere and be efficient and free of errors,” said Bellamkonda. “I hope that OpenAI will keep its promise to continue to develop open-source Python tools, as this is used by a lot of large companies using Python.”
One possible strategy for the purchase, he explained, “could be that OpenAI, having acquired the team that built these open source tools, optimizes these tools to work better inside OpenAI’s stack than anywhere else, giving them an advantage.”
A ‘corrective move’
Describing it as a reality check for AI-led software development, Sanchit Vir Gogia, chief analyst at Greyhound Research, said the acquisition is being framed as a natural next step for Codex. “It is not. It is a corrective move. And if you read between the lines, it tells you exactly where AI coding is struggling when it leaves the demo environment and enters real software engineering systems.”
For the past couple of years, he said, “the conversation around AI in development has been dominated by one idea: speed. How fast code can be generated. How quickly a developer can go from prompt to output. That framing has been convenient, but it has also been incomplete to the point of being misleading.”
Software development is not, and has never been, just about writing code, he pointed out, adding that the actual work sits in everything that happens around it, such as managing dependencies, enforcing consistency, validating outputs, ensuring type safety, integrating with existing systems, and maintaining stability over time. “These are not creative tasks,” he said. “They are structured, repeatable, and often unforgiving. That is what keeps systems from breaking.”
Astral tools ‘constrain, validate, and correct’
According to Gogia, “this is where the tension begins. AI systems generate probabilistic outputs. Engineering systems demand deterministic behavior. That gap is no longer theoretical, it is now showing up in day-to-day development workflows.”
Across enterprises, he said, “what we are seeing is not a clean productivity story. It is far messier. Developers often say they feel faster. And to be fair, in the moment, they are. Code appears quicker, boilerplate disappears, certain tasks collapse from hours to minutes. But when you step back and look at the full lifecycle, the gains start to blur.”
The effort, he explained, “does not disappear, it moves. Time saved at the point of creation starts to reappear downstream. Teams spend more time reviewing what was generated. They spend more time fixing inconsistencies. They deal with dependency mismatches that were not obvious at generation time. They enforce internal standards that the model does not fully understand. Integration takes longer than expected. Testing cycles stretch. In some cases, defects increase because the system looks correct on the surface but breaks under real conditions.”
Astral did not set out to build AI, Gogia said. Instead, “it focused on something far less glamorous and far more important: Making the Python ecosystem faster, stricter, and more predictable. Ruff enforces code quality and formatting at speed, uv simplifies and stabilizes dependency and environment management, ty brings type safety into the workflow with minimal overhead.”
He added, “[these tools] do not generate anything. They constrain, validate, and correct. They operate in a world where outputs must be consistent and reproducible. That is precisely what AI lacks on its own.”
By bringing Astral into the Codex environment, said Gogia, “OpenAI is not just adding features. It is adding discipline. It is effectively saying that if AI is going to participate across the development lifecycle, it needs to operate within systems that can continuously check and correct its behavior. Without that, scale becomes risk.”
How to create AI agents with Neo4j Aura Agent 19 Mar 2026, 9:00 am
You may be hearing a lot of buzz about knowledge graphs, GraphRAG, and ontologies in the AI space right now, especially around improving agent accuracy, explainability, and governance. But actually creating and deploying your own agents that leverage these concepts can be challenging and ambiguous. At Neo4j, we’re trying to make building and deploying agents more straightforward.
Neo4j Aura Agent is an end-to-end platform for creating agents, connecting them to knowledge graphs, and deploying to production in minutes. In this post, we’ll explore the features of Neo4j Aura Agent that make this all possible, along with links to coded examples to get hands-on with the platform.
Knowledge graphs, GraphRAG, and ontology-driven AI
Let’s define some key terms before we begin.
Knowledge graphs
Knowledge graphs are a design pattern for organizing and accessing interrelated data. There are many ways to implement them. At Neo4j, we use a Property Label graph, as shown below. Knowledge graphs provide context, standardization, and flexibility in the data layer, making them well-suited for semantic layers, long-term memory stores, and retrieval-augmented generation (RAG) stores.

Neo4j
GraphRAG
GraphRAG is retrieval-augmented generation where a knowledge graph is included somewhere on the retrieval path. GraphRAG improves accuracy and explainability over vector/document search and other SQL queries by leveraging the knowledge graph structure, which symbolically represents context in an expressive and compact manner, allowing you to retrieve more relevant data and, critically, more efficiently fit the relevant context in the context window of the large language model (LLM).

Neo4j
There are lots of GraphRAG retrieval techniques, but the three primary ones are:
- Graph-augmented vector search: Vector search is used to match relevant entities (as nodes or relationships), followed by graph traversal operations to identify and aggregate related context.
- Text-to-query: Text2Cypher (Cypher being the most popular graph query language) query that enables agents to query the graph based on its schema dynamically.
- Query templates: Parameterized, premade graph queries that enable precise, expertly reviewed graph query logic to be executed upon choice by the agent.
Ontology
An ontology is a formal representation of knowledge that defines the concepts, categories, properties, and relationships within a particular domain or area of study. You may have heard about ontologies in the AI space lately. In practice, it is just a data model in the form of a graph schema with additional metadata about the involved domain(s) and use case(s). Ontologies enable AI to reason and make inferences over your data easily. While often associated with Resource Description Framework (RDF) and triple stores, a property graph database (such as that in Neo4j provides the equivalent functionality with a graph schema when paired with an Aura agent system and system and tool prompts.
Neo4j Aura features
Neo4j Aura is a fully managed graph intelligence platform that includes a graph database and data services for importing, dashboarding, exploring, and deploying AI agents on top of data. You can create knowledge graphs to use with agents from structured or unstructured data, or a mix of both.
You can import structured data with Data Importer from a variety of data warehouses including RDBMS stores such as Snowflake, Databricks, and Postgres.

Neo4j
You can also import documents and unstructured data, performing entity extraction and merging graph data according to your schema, using the GraphRAG Python package by Neo4j, or by using other ecosystem tools with supported integrations such as Unstructured, LangChain, and LlamaIndex.
Once the data is imported into Neo4j, you can build an Aura agent on top of it. There are three basic steps:
- Creating your agent
- Adding tools
- Saving, testing, and deploying
You can find step-by-step details on the entire process, including all the necessary query and code snippets here. Below is a summary of the process.
First, creating an agent is easy. Simply provide some basic information: title, description, system prompt, and the database to serve as the agent’s knowledge graph.

Neo4j
Users can autogenerate an initial agent out of the box that they can further edit and tailor by providing a system prompt. The Aura agent will then use the graph schema and other metadata to configure the agent and its retrieval tools.

Neo4j
Neo4j Aura Agent provides three basic tool types, aligning the GraphRAG categories discussed above:
- Vector similarity search
- Text2Cypher
- Cypher templates

Neo4j
These three different types of tools can be used in combination by the agent to chain responses together, improving overall accuracy, especially when compared to using just vector search alone.
The knowledge graph provides an essential structure, allowing Text2Cypher queries to retrieve exactly the right data using the graph schema and user prompt to infer the right patterns. Templatized queries allow for even greater precision by using pre-specified query logic to retrieve exactly the right data.
When responding to users, Neo4j Aura Agent includes its reasoning. During testing, this can be opened in the response tab. This explains the agent’s reasoning and the tool query logic used. Because the Cypher query language expresses relationship patterns in a human-readable format, it can be translated easily to the user and to downstream AI systems, allowing for improved explainability across the entire AI system.

Neo4j
Neo4j Aura Agent can deploy into a production setting automatically, and this is perhaps one of its most significant benefits. Once you are satisfied with the agent’s performance in the UI testing playground, you can select a publicly available endpoint. Doing this will automatically deploy the agent to a secure endpoint that is authenticated via an API key/secret pair.

Neo4j
Neo4j Aura Agent provides managed LLM inference and embeddings for agent runtime, making it easy for users and removing the need to maintain separate accounts and provide API credentials from other model providers.
Once deployed, the Aura agent can be called directly or wrapped in a Model Context Protocol (MCP) server to invoke other downstream AI systems and applications.
Example: Calling through API
url --request POST "$ENDPOINT_URL" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"input": "find Motorola contracts"}' --max-time 60 \
| jq .
{
"content": [
{
"thinking": "The user is asking to find contracts related to a specific organization, \"Motorola\". The `identify_contracts_for_organization` tool is designed for this purpose, taking the organization name as input.",
"type": "thinking"
},
{
"id": "fdfa4d22-334d-44d1-861d-749f436097de",
"input": {
"organization_name": "Motorola"
},
"name": "identify_contracts_for_organization",
"type": "cypher_template_tool_use"
},
{
"output": {
"keys": [
"contract_id",
"agreement_type",
"contract_name",
"effective_date",
"renewal_term",
"expiration_date",
"party_names",
"party_roles",
"party_incorporated_countries"
],
"records": [
{
"agreement_type": "Intellectual Property Agreement",
"contract_id": 18,
"contract_name": "Intellectual Property Agreement",
"effective_date": "2014-04-14",
"expiration_date": "unable to determine",
"party_incorporated_countries": [
"United States",
"United States"
],
"party_names": [
"Motorola Solutions, Inc.",
"Zebra Technologies Corporation"
],
"party_roles": [
"Seller",
"Purchaser"
],
"renewal_term": "unable to determine"
}
],
"summary": {}
},
"tool_use_id": "fdfa4d22-334d-44d1-861d-749f436097de",
"type": "cypher_template_tool_result"
},
{
"thinking": "I have already identified the contracts associated with Motorola using the `identify_contracts_for_organization` tool. The output provides a list of contracts with relevant details, directly answering the original query.",
"type": "thinking"
},
{
"text": "Here are the contracts associated with Motorola:\n\n| contract_id | contract_name | agreement_type | effective_date | expiration_date | party_names | party_roles | party_incorporated_countries |\n|---|---|---|---|---|---|---|---|\n| 18 | Intellectual Property Agreement | Intellectual Property Agreement | 2014-04-14 | unable to determine | Motorola Solutions, Inc., Zebra Technologies Corporation | Seller, Purchaser | United States, United States |",
"type": "text"
}
],
"end_reason": "FINAL_ANSWER_PROVIDED",
"role": "assistant",
"status": "SUCCESS",
"type": "message",
"usage": {
"candidates_token_count": 226,
"prompt_token_count": 7148,
"thoughts_token_count": 301,
"total_token_count": 7675
}
Example: Wrapping in an MCP server and calling through Claude Desktop

Neo4j
Connecting agents to knowledge graphs
The promise of knowledge graphs for AI agents has been clear for some time—better accuracy, transparency in reasoning, and more reliable outputs. But turning that promise into reality has been another story entirely. The complexity of building knowledge graphs, configuring GraphRAG retrieval, and deploying production-ready agents has kept these benefits out of reach for many teams.
Neo4j Aura Agent represents an important first step in changing that. By providing a unified platform that connects agents to knowledge graphs in minutes rather than months, it removes much of the ambiguity that has held teams back. The low-code tool creation simplifies how agents achieve accuracy through vector search, Text2Cypher, and query templates working in concert. The built-in reasoning response and human-readable Cypher queries make explainability straightforward rather than aspirational. And the progression from playground testing to secure API deployment with managed inference eliminates the operational friction that often derails AI projects before they reach production.
This is not the final word on knowledge graph-powered agents, but it is a critical step forward. As organizations continue exploring how to make their AI systems more accurate, explainable, and governable, platforms that reduce complexity while preserving power will be essential. Neo4j Aura Agent points the way toward that future, making sophisticated GraphRAG capabilities accessible to teams ready to move beyond vector search and rigid knowledge management systems.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
9 reasons Java is still great 19 Mar 2026, 9:00 am
In a world obsessed with disruption, Java threads the needle between stability and innovation. It’s the ultimate syncretic platform, synthesizing the best ideas from functional programming, concurrency, cloud computing, and AI under a reliable, battle-tested umbrella.
Java unites meticulous planning with chaotic evolution, enterprise reality with open source ideals, along with a healthy dose of benevolent fortune. Let’s look at the key factors that make Java as much a champion today as it was in 1996.
1. The Java Community Process
At the heart of Java’s success are the developers and architects who love it. The Java community is vital and boisterous, and very much engaged in transforming the language. But what makes Java special is its governance architecture.
Far from a smoothly operating machine, Java’s governance is a riotous amalgam of competing interests and organizations, all finding their voice in the Java Community Process (JCP). The fractious nature of the JCP has been criticized, but over time it has given Java a massive advantage. The JCP is Java’s version of a functional democracy: A venue for contribution and conflict resolution among people who care deeply about the technology. The JCP is a vital forum where the will and chaos of the worldwide developer community negotiate with Java’s formal managing body.
2. OpenJDK
I still remember my astonishment when the Java language successfully incorporated lambdas and closures. Adding functional constructs to an object-oriented programming language was a highly controversial and impressive feat in its day. But that achievement pales in comparison to the more recent introduction of virtual threads (Project Loom), and the ongoing effort to unify primitives and objects (Project Valhalla).
The people working on Java are only half of the story. The people who work with it are the other half, and they reflect the diversity of Java’s many uses. Social coding and open source are not unique to Java, but they are key components in the health of the Java ecosystem. Like JavaScript, Java evolved in tandem with the coding community as the web gained traction. That origin story is a large part of its character. Java’s community responsiveness, including the existence of OpenJDK, ensures developers that we are working within a living system—one that is being continuously nurtured and cultivated for success in a changing world.
3. Open source frameworks and tools
Another major source of Java’s success is its wealth of open source frameworks and the tools built up around it. Java has one or more high-quality libraries for just about any task you can imagine. If you like a project, there’s a good chance it’s open source and you can contribute to it, which is great for both learning and building community.
The wealth of projects in the Java ecosystem extends from modest examples to foundational components of the Internet. Classic workhorses like Hibernate and Jetty are still vital, while the landscape has broadened to include tools that define the cloud era. We now have Testcontainers, which revolutionized integration testing by bringing Docker directly into the test lifecycle. And we have Netty, the asynchronous networking engine that quietly powers everything from high-frequency trading platforms to video games.
Perhaps most exciting, we have the new wave of AI integration tools like LangChain4j, which bridge the gap between stable enterprise systems and the wild west of LLMs. These are all open source projects that invite contributors, creating a set of capabilities that is unrivaled in its depth.
4. The Spring framework
No appreciation for Java’s ecosystem would be complete without tipping the hat to Spring. For years, Spring was the undisputed standard to which all other Java-based frameworks aspired. Today, it remains a dominant force in the enterprise.
With Spring, developers use the same facility to compose custom code as they do to incorporate third-party code. With dependency injection and inversion of control (IoC), Spring both supports standardizing your own internal components and ensures third-party projects and vendor components meet the same standard. All of this allows for greater consistency in your programs.
Of course, there are valid critiques of Spring, and it’s not always the right tool for the job. Google Guice is another tool that works similarly. But Spring is the framework that first offered a clean and consistent way to provision and compose application components. That was a game changer in its time and continues to be vital today. And, of course, the addition of Spring Boot has made Spring even easier to adopt and learn.
5. Java microframeworks
Next up are the cloud-native Java microframeworks: Quarkus, Micronaut, and Helidon. These frameworks launched Java into the serverless era, focusing on sub-second startup times and low-memory footprints. Fierce competition in this space forced the entire ecosystem to evolve faster.
Today, Java developers don’t just inherit a stack: They choose between robust options that all play nicely with the modern cloud. This social coding environment allows Java to absorb the best ideas from newer languages while retaining its massive library of existing solutions.
6. The miracle of virtual threads
Threads have been the core concurrency abstraction since time immemorial, not only for Java but for most languages. Threads of old mapped directly to operating system processes, but Java conclusively evolved beyond that model with the coming of virtual threads.
Similar to how Java once moved memory management into the JVM, it has now moved threading there. When you use virtual threads—now the default concurrency mechanism for Java—you get an instance of a lightweight object that is orchestrated by a highly optimized pool. These are intelligently farmed out to actual worker threads, invisible to you as the developer.
The efficiency boost can be mind-blowing. Without any extra work on your end, virtual threads can take a server from thousands to millions of concurrent requests. Successfully patching such a widely deployed platform in such a fundamental way—in full view of the entire industry—stands as one of the truly great achievements in the history of software.
7. Data-oriented programming
In a development landscape enamored of functional programming, it has become fashionable to trash-talk Java’s object orientation. And while Java’s stewards have responded by incorporating functional programming idioms into the language, they’ve also steadfast insisted that Java remains a strongly object-oriented language, where everything is, indeed, an object.
You can write awesome code in any paradigm, and the same is true for awful code. In a Java system, you know up front that the language is strongly typed and that everything will be contained in classes. (For the one exception, see the last item below.) The absoluteness of this design decision cuts away complexity and lends a cleanness to the Java language that stands the test of time. Well-written Java programs have the mechanical elegance of interacting objects; components in a Java-based system interact like gears in a machine.
Now, rather than abandoning its object-oriented programming roots, Java has evolved by embracing data-oriented programming as another layer. With the arrival of records, pattern matching, and switch expressions, Java solved its historical verbosity problem. We can now model data as immutable carriers and process it with the conciseness of a functional language. Data-oriented constructs offer the elegance of object models without the boilerplate that once drove developers crazy.
8. The JVM (better than ever)
Once viewed as a heavy abstraction layer, the Java virtual machine is recognized as a masterpiece of engineering today. In devops containers and serverless architectures, the JVM offers a well-defined deployment target. Modern Java virtual machines are a wonder to behold. They deliver sophisticated automatic memory management with out-of-the-box performance approaching C.
Now, the JVM is undergoing its most significant transformation yet. I wrote about Project Valhalla a while ago, describing it as Java’s epic refactor. Today, that prediction is a reality. For decades, Java objects paid a memory tax in the form of headers and pointers. Valhalla removes this by introducing value classes, allowing developers to create data structures that code like a class, but work like an int.
Value classes flatten memory layouts and eliminate the cache misses that modern CPUs hate. Moreover, they bring all Java types into a single mental model (no more “boxing”). Project Valhalla proves the JVM isn’t just a static runtime but a living system, capable of replacing its own engine, even while flying the plane.
9. AI integration and orchestration
When the AI boom first hit, Python wrote all the models but Java still ran the business back end. Now, Java is fast becoming the universal AI layer, merging business and AI technology.
Tools like LangChain4j and Spring AI are transforming Java into an AI orchestration engine for the enterprise. These frameworks allow developers to integrate LLMs with the proven safety, security, and type-checking of the JVM. While Python is great for experimentation, Java is the platform you use when you need to connect an AI agent to your banking system, your customer database, or your secure cloud infrastructure.
Conclusion
Software development is made up of two powerful currents: the enterprise and the creative. There’s a spirit of creative joy to coding that is the only possible explanation for, say, working on a dungeon simulator for 25 years. That fusion of creativity with a solid business use case is the alchemy that keeps Java alive and well. So far, and into the foreseeable future, Java remains the syncretic champion. It’s boring enough to trust—yet daring enough to win.
Page processed in 0.284 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
