Google adds automated code reviews to Conductor AI 13 Feb 2026, 7:17 pm

Google’s Conductor AI extension for context-driven development has been fitted with a new automated review feature intended to make AI-assisted engineering safer and more predictable.

Announced February 12, the new Automated Review feature allows the Conductor extension to go beyond planning and execution into validation, generating post-implementation reports on code quality and compliance based on defined guidelines, said Google. Conductor serves as a Gemini CLI extension designed to bring context-driven development to the developer’s terminal.

It shifts project awareness out of ephemeral chat logs and into persistent, version-controlled markdown files. Automated Review, with the new validation capability, introduces a rigorous “verify” step to the development lifecycle; once the coding agent completes its tasks, Conductor can generate a comprehensive post-implementation report.

With safety integrated into the core of every review, the Conductor extension scans for critical vulnerabilities before code is merged. High-risk issues are flagged such as hardcoded API keys, potential PII (Personally Identifiable Information) leaks, or unsafe input handling that could expose the application to injection attacks. Additional Automated Review capabilities cited include:

  • Code review, where the Conductor extension acts as a peer reviewer, performing deep static and logic analysis on newly generated files.
  • Plan compliance, where the system checks new code against the developer’s plan.md and spec.md files.
  • Guideline enforcement, to maintain long-term code health.
  • Test suite validation, integrating the entire test suite directly into the review workflow.

(image/jpeg; 6.32 MB)

The cure for the AI hype hangover 13 Feb 2026, 9:00 am

The enterprise world is awash in hope and hype for artificial intelligence. Promises of new lines of business and breakthroughs in productivity and efficiency have made AI the latest must-have technology across every business sector. Despite exuberant headlines and executive promises, most enterprises are struggling to identify reliable AI use cases that deliver a measurable ROI, and the hype cycle is two to three years ahead of actual operational and business realities.

According to IBM’s The Enterprise in 2030 report, a head-turning 79% of C-suite executives expect AI to boost revenue within four years, but only about 25% can pinpoint where that revenue will come from. This disconnect fosters unrealistic expectations and creates pressure to deliver quickly on initiatives that are still experimental or immature.

The way AI dominates the discussions at conferences is in contrast to its slower progress in the real world. New capabilities in generative AI and machine learning show promise, but moving from pilot to impactful implementation remains challenging. Many experts, including those cited in this CIO.com article, describe this as an “AI hype hangover,” in which implementation challenges, cost overruns, and underwhelming pilot results quickly dim the glow of AI’s potential. Similar cycles occurred with cloud and digital transformation, but this time the pace and pressure are even more intense.

Use cases vary widely

AI’s greatest strengths, such as flexibility and broad applicability, also create challenges. In earlier waves of technology, such as ERP and CRM, return on investment was a universal truth. AI-driven ROI varies widely—and often wildly. Some enterprises can gain value from automating tasks such as processing insurance claims, improving logistics, or accelerating software development. However, even after well-funded pilots, some organizations still see no compelling, repeatable use cases.

This variability is a serious roadblock to widespread ROI. Too many leaders expect AI to be a generalized solution, but AI implementations are highly context-dependent. The problems you can solve with AI (and whether those solutions justify the investment) vary dramatically from enterprise to enterprise. This leads to a proliferation of small, underwhelming pilot projects, few of which are scaled broadly enough to demonstrate tangible business value. In short, for every triumphant AI story, numerous enterprises are still waiting for any tangible payoff. For some companies, it won’t happen anytime soon—or at all.

The cost of readiness

If there is one challenge that unites nearly every organization, it is the cost and complexity of data and infrastructure preparation. The AI revolution is data hungry. It thrives only on clean, abundant, and well-governed information. In the real world, most enterprises still wrestle with legacy systems, siloed databases, and inconsistent formats. The work required to wrangle, clean, and integrate this data often dwarfs the cost of the AI project itself.

Beyond data, there is the challenge of computational infrastructure: servers, security, compliance, and hiring or training new talent. These are not luxuries but prerequisites for any scalable, reliable AI implementation. In times of economic uncertainty, most enterprises are unable or unwilling to allocate the funds for a complete transformation. As reported by CIO.com, many leaders said that the most significant barrier to entry is not AI software but the extensive, costly groundwork required before meaningful progress can begin.

Three steps to AI success

Given these headwinds, the question isn’t whether enterprises should abandon AI, but rather, how can they move forward in a more innovative, more disciplined, and more pragmatic way that aligns with actual business needs?

The first step is to connect AI projects with high-value business problems. AI can no longer be justified because “everyone else is doing it.” Organizations need to identify pain points such as costly manual processes, slow cycles, or inefficient interactions where traditional automation falls short. Only then is AI worth the investment.

Second, enterprises must invest in data quality and infrastructure, both of which are vital to effective AI deployment. Leaders should support ongoing investments in data cleanup and architecture, viewing them as crucial for future digital innovation, even if it means prioritizing improvements over flashy AI pilots to achieve reliable, scalable results.

Third, organizations should establish robust governance and ROI measurement processes for all AI experiments. Leadership must insist on clear metrics such as revenue, efficiency gains, or customer satisfaction and then track them for every AI project. By holding pilots and broader deployments accountable for tangible outcomes, enterprises will not only identify what works but will also build stakeholder confidence and credibility. Projects that fail to deliver should be redirected or terminated to ensure resources support the most promising, business-aligned efforts.

The road ahead for enterprise AI is not hopeless, but will be more demanding and require more patience than the current hype would suggest. Success will not come from flashy announcements or mass piloting, but from targeted programs that solve real problems, supported by strong data, sound infrastructure, and careful accountability. For those who make these realities their focus, AI can fulfill its promise and become a profitable enterprise asset.

(image/jpeg; 14.43 MB)

Last JavaScript-based TypeScript arrives in beta 13 Feb 2026, 1:36 am

Microsoft has released a beta of TypeScript 6.0, an update to the company’s strongly typed JavaScript variant that promises to be the last release based on the current JavaScript codebase. TypeScript 7.0 will debut a compiler and language service written in Go for better performance and scalability.

The TypeScript 6.0 beta was announced February 11. Developers can access it through npm by running the command npm install -D typescript@beta. A production release of TypeScript 6.0 is planned for March 17. A release candidate for TypeScript 6.0 is due February 24.

Among the key features of TypeScript 6.0 is a new flag, --stableTypeOrdering, to assist with migrations to the planned Go-based TypeScript 7.0. “As announced last year (with recent updates here), we are working on a new codebase for the TypeScript compiler and language service written in Go that takes advantage of the speed of native code and shared-memory multi-threading,” said Microsoft’s Daniel Rosenwasser, principal product manager for TypeScript, in the blog post unveiling the beta. TypeScript 6.0 will in many ways act as a bridge between TypeScript 5.9 and TypeScript 7.0, he said. “As such, most changes in TypeScript 6.0 are meant to help align and prepare for adopting TypeScript 7.0.” But there are some new features and improvements that are not just about alignment.

Also featured in TypeScript 6.0 is support for the es2025 option for both target and lib, less context sensitivity on this-less functions, and new types for Temporal, which provide standard objects and functions for working with dates and times. With the --stableTypeOrdering flag, the type ordering behavior of TypeScript 6.0 matches that of TypeScript 7.0, reducing the number of differences between the two codebases. Microsoft does not necessarily encourage using this flag all the time as it can add a substantial slowdown to type checking (up to 25% depending on the codebase).

With TypeScript 6.0’s es2025, option, the new target adds new types for built-in APIs (e.g. RegExp.escape) and moves a few declarations from esnext into es2025. With this-less functions, if this is never actually used in a function, then it is not considered contextually sensitive. That means these functions will be seen as higher priority when it comes to type inference. For Temporal, the long-awaited ECMAScript Temporal proposal has reached stage 3 and is expected to be added to JavaScript in the near future, Rosenwasser said. TypeScript 6.0 now includes built-in types for the Temporal API, so developers can start using it in TypeScript code via --target esnext or "lib": ["esnext"] or the more granular temporal.esnext.

Other new features and improvements in TypeScript 6.0:

  • New types have been added for “upsert” methods. ECMAScript’s “upsert” proposal, which recently reached stage 4, introduces two new methods on Map and WeakMap. These include getOrInsert or getOrInsertComputed. These methods have been added to the esnext library so they can be used immediately in TypeScript 6.0.
  • RegExp.escape, for escaping regular expression characters such as *, ?, and +, is available in the es2025 library and can be used in TypeScript 6.0 now.
  • The contents of lib.dom.iterable.d.ts and lib.dom.asynciterable.d.ts are fully included in lib.dom.d.ts. TypeScript’s lib option allows developers to specify which global declarations a target runtime has.



(image/jpeg; 7.55 MB)

Visual Studio adds GitHub Copilot unit testing for C# 12 Feb 2026, 9:04 pm

Microsoft has made GitHub Copilot testing for .NET, a new capability in GitHub Copilot Chat that automates the testing of C# code, generally available in the just-released Visual Studio 2026 v18.3 IDE.

Microsoft announced the capability on February 11.

GitHub Copilot testing for .NET automates the creation, running, and testing of C# code for projects, files, classes, or members. It has built-in awareness of the developer’s solution structure, test frameworks, and build system and operates as an end-to-end testing workflow rather than a single-response prompt, Microsoft said. GitHub Copilot testing for .NET can generate tests for the xUnit, NUnit, and MSTest test frameworks.

When prompted with a testing request, GitHub Copilot testing generates unit tests scoped to the selected code, builds and runs the tests automatically, detects failures and attempts to fix them, and reruns the tests until a stable starting point is reached, according to Microsoft.

When test generation is completed, GitHub Copilot provides a structured summary to help developers understand what has been changed, Microsoft said. This summary includes test files and projects completed or modified, before-and-after coverage information, pass/fail signals and unstable cases, insights into testability gaps, and direct links to the generated tests for immediate review and iteration.

Additionally, GitHub Copilot testing for .NET now supports free-form prompting, making it easier for the developer to describe what to test. GitHub Copilot testing for .NET requires a paid GitHub Copilot license.

(image/jpeg; 10.24 MB)

Researchers propose a self-distillation fix for ‘catastrophic forgetting’ in LLMs 12 Feb 2026, 10:19 am

A new fine-tuning technique aims to solve “catastrophic forgetting,” a limitation that often complicates repeated model updates in enterprise deployments.

Researchers at MIT, the Improbable AI Lab, and ETH Zurich have introduced a fine-tuning method designed to let models learn new tasks while preserving previously acquired capabilities.

To prevent degrading existing capabilities, many organizations isolate new tasks into separate fine-tuned models or adapters. That fragmentation increases costs and adds governance complexity, requiring teams to continually retest models to avoid regression.

The new technique, called self-distillation fine-tuning (SDFT), is designed to address that tradeoff.

The researchers said that SDFT “leverages in-context learning by using a demonstration-conditioned model as its own teacher, generating on-policy training signals that preserve prior capabilities while acquiring new skills.”

They added that it consistently outperforms Supervised Fine Tuning (SFT) “across skill learning and knowledge acquisition tasks,” achieving higher new-task accuracy “while substantially reducing catastrophic forgetting.”

In experiments, the researchers found the method enables models to accumulate new skills sequentially while preserving performance on prior tasks, a capability that could simplify how enterprises update and specialize production models over time.

The need and the solution

Despite rapid advances in foundation models, most enterprise AI systems remain static once deployed. Prompting and retrieval can adjust behavior at inference time, but the model’s parameters do not change to internalize new skills or knowledge.

As a result, each new fine-tuning cycle risks catastrophic forgetting, where gains on a new task degrade performance on earlier ones.

“To enable the next generation of foundation models, we must solve the problem of continual learning: enabling AI systems to keep learning and improving over time, similar to how humans accumulate knowledge and refine skills throughout their lives,” the researchers noted.

Reinforcement learning offers a way to train on data generated by the model’s own policy, which reduces forgetting. However, it typically requires explicit reward functions, which are not easy in every situation.

SDFT suggests an alternative. Instead of inferring a reward function, it uses the model’s in-context learning ability to generate on-policy learning signals from demonstrations.

During training, the same model plays two roles. A teacher version is conditioned on both the query and expert examples. A student version sees only the query, reflecting real-world deployment. The student updates its parameters to align with the teacher’s predictions on its own generated outputs.

“In sequential learning experiments, SDFT enables a single model to accumulate multiple skills over time without performance regression, establishing on-policy distillation as a practical path to continual learning from demonstrations,” the researchers said.

Challenges to overcome

SDFT appears quite realistic as the technique removes the need for maintaining “model zoos” of separate adapters or fine-tuned variants, according to Lian Jye Su, chief analyst at Omdia.

However, whether this translates to commercial deployment remains to be seen as certain challenges persist.

For instance, SDFT requires significantly more training time and roughly 2.5 times the computing power of standard SFT. It also depends on sufficiently capable base models with strong in-context learning ability.

Sanchit Vir Gogia, chief analyst at Greyhound Research, also warned that SDFT does not eliminate the need for regression infrastructure. Because the model learns from its own generated rollouts, enterprises must ensure reproducibility through strict version control and artifact logging.

“Consolidation shifts operational complexity from model count to governance depth,” Gogia said.

The costs can be offset, according to Su, by avoiding catastrophic forgetting of key context and complex reward functions in reinforcement learning. But it may be a while before this reaches enterprises. “SDFT will most likely be experimented with first for internal developer tools and general assistants where the risk of a ‘self-taught error’ will be lower than in regulated domains like financial or medical decision-making,” said Faisal Kawoosa, founder and lead analyst at Techarc.

(image/jpeg; 8.97 MB)

Go 1.26 unleashes performance-boosting Green Tea GC 12 Feb 2026, 10:00 am

Go 1.26 has been released. The latest version of the Google-built programming language enables the higher performing Green Tea garbage collector (GC) by default. It also introduces a change to generic types that simplifies the implementation of complex data structures.

Introduced February 10, Go 1.26 can be downloaded from go.dev.

The Green Tea GC, included as an experimental feature in last year’s Go 1.25, brings a 10% to 40% reduction in garbage collection overhead in real-world programs that make heavy use of garbage collection, the Go team said. This is because it improves the performance of marking and scanning small objects through better locality and CPU scalability, according to the team. Further improvements are expected in GC overhead on the order of 10% when running on newer AMD64-based CPU platforms. For the cautious, the Green Tea GC can by disabled by setting GOEXPERIMENT=nogreenteagc at build time. This opt-out setting is expected to be removed in Go 1.27.

Generic types in Go 1.26 now may refer to themselves in their own type parameter list. This change simplifies implementation of complex data structures and interfaces, the Go team said. The newfunction, which creates a new variable, now allows its operand to be an expression, specifying the initial value of the variable. And the go fix command now is the home of Go’s modernizers, providing a push-button way to update Go codebases to the latest idioms and core library APIs. And the baseline runtime overhead of cgo calls has been reduced by about 30%.

Also in Go 1.26:

  • The compiler can allocate the backing store for slices on the stack in more situations, thus improving performance.
  • For WebAssembly applications, the runtime now manages chunks of heap memory in smaller increments, leading to significantly reduced memory usage for applications with heaps less than around 16 MiB in size.
  • On 64-bit platforms, the runtime now randomizes the heap base address at startup. This a security enhancement that makes it harder for attackers to predict memory addresses and exploit vulnerabilities when using cgo, the Go team said.
  • An experimental profile type named goroutineleak reports leaked goroutines. Look for it in the runtime/pprof package.
  • An experimental simd/archsimd package provides access to architecture-specific SIMD operations.
  • Go 1.26 is the last release to run on macOS 12 Monterey. Go 1.27 will require macOS 13 Ventura or later.

(image/jpeg; 1.97 MB)

Reactive state management with JavaScript Signals 12 Feb 2026, 9:00 am

Signals is a simple idea with massive power. In one fell swoop, Signals provides reactivity while keeping state simple, even in large applications. That’s why the Signals pattern has been adopted for inclusion in Solid, Svelte, and Angular.

This article introduces Signals and demonstrates its fresh approach to state management, bringing new life to the JavaScript front end.

Introducing the Signals pattern

The Signals pattern was first introduced in JavaScript’s Knockout framework. The basic idea is that a value alerts the rest of the application when it changes. Instead of a component checking its data every time it renders (as in React‘s pull model), a signal “pushes” the update to exactly where it is needed.

This is a pure expression of reactivity, sometimes called “fine-grained” reactivity. It is almost magical how individual signals can update the output of a value without requiring any intervention from the developer.

The “magic” is really just an application of functional programming, but it has big benefits for an application architecture. The Signals pattern eliminates the need for complex rendering checks in the framework engine. Even more importantly, it can simplify state management by providing a universal mechanism that can be used anywhere, even across components, eliminating the need for centralized stores.

Before Signals: The virtual DOM

To understand why signals are such a breath of fresh air, we can start by looking at the dominant model of the last decade: The Virtual DOM (VDOM), popularized by React.

The VDOM is an abstract DOM that holds a version in memory. When application state changes, the framework re-renders the component tree in memory, compares it to the previous version (a process called diffing), then updates the actual DOM with the differences.

While this makes UI development declarative and predictable, it introduces a cost. The framework does significant work just to determine what hasn’t changed. This is compounded by data-heavy components like lists and trees. And, as applications grow larger, this diffing overhead adds up. Developers then resort to complex optimization techniques (like memoization) in an effort to keep the engine from overworking.

Fine-grained reactivity

State management via VDOM implies repeatedly walking a tree data structure in memory. Signals side-steps this entirely. By using a dependency graph, Signals changes the unit of reactivity. In a VDOM world, the unit is the component, whereas with Signals, the unit is the value itself.

Signals is essentially an observer pattern where the observers are automatically enrolled. When a view template reads an individual signal, it automatically subscribes to it. This creates a simple, direct link between the data and the specific text node or attribute that displays it. When the signal changes, it notifies only those exact subscribers.

This is a point-to-point update. The framework doesn’t need to walk a component tree or determine what changed. This shifts the performance characteristic from O(n) (where n is tree size) to O(1) (immediate, direct update).

Hands-on with Signals

The best way to understand the benefits of Signals is to see the pattern in action. When developing application components, the difference between using Signals and a more traditional VDOM approach is, at first, almost invisible. Here’s React handling a simple state instance:

function Counter() {
  const [count, setCount] = useState(0);
  const double = count * 2;

  return (
    
  );
}

Now consider the same idea in Svelte using individual signals with the Runes syntax:



In both cases, you have a reactive value (count) and a value that is dependent on it (double). They both do the same thing. To see the difference, you can add a console log to them. Here’s the log with React:

export default function Counter() {
  const [count, setCount] = useState(0);
  const double = count * 2;

  console.log("Re-evaluating the world...");

  return (
    
  );
}

Every time the component mounts, or updates, the console will output this line of logging. But now look at the same log from Svelte:



In this case, the console logging happens only once, when the component mounts.

At first, this seems impossible. But the trick is, the signal just directly connects the value to its output, with no need to invoke the surrounding JavaScript that created it. The component does not need to be re-evaluated. The signal is a portable unit of reactivity.

How Signals eliminates dependent values

Another important result of using signals is how we manage side-effects. In React, we have to declare the dependent values. These are the parameters we pass to useEffect. This is a common area of complaint in terms of developer experience (DX) because it is an additional relationship to manage. Over time, this can lead to mistakes (like forgetting to add a value) and may otherwise impact performance. As an example, consider what happens when there are many values:

useEffect(() => {
  console.log(`The count is now ${count}`);
}, [count]);

Passing the same job with signals eliminates the dependent values:

effect(() => {
  console.log(`The count is now ${count()}`);
});

In this case, we’ve used Angular syntax, but it’s similar across other frameworks that use signals. Here’s the same example with Solid:

createEffect(() => {
  console.log(count());
});

The end of ‘prop drilling’

Using the Signals pattern for state management also impacts application design. We can see it most readily in the way we have to pass properties down the component tree, from parent to child, when we need to share state:

// In React, sharing state can mean passing it down...
function Parent() {
  const [count, setCount] = useState(0);
  return ;
}

function Child({ count }) {
  // ...and down again...
  return ;
}

function GrandChild({ count }) {
  // ...until it finally reaches the destination.
  return 
{count}
; }

The impact will also show up in centralized stores like Redux, which strive to reduce complexity sprawl but often seem to add to the problem. Signals eliminates both issues by making a centralized state simply another JavaScript file you create and import in the components. For example, here’s how a shared state module might look in Svelte:

// store.svelte.js
// This state exists independently of the UI tree.
export const counter = $state({
  value: 0
});

// We can even put shared functions in here
export function increment() {
  counter.value += 1;
}

Using this is just normal JavaScript:




Toward a Signals standard?

Historically, successful patterns that start out in libraries or individual frameworks often migrate into the language. Just think of how jQuery’s selectors influenced document.querySelector, or how Promises became part of the JavaScript standard.

Now, we are seeing it happen again with Signals. There is currently a proposal (TC39) to add signals directly to JavaScript. The goal isn’t to replace how frameworks work, but to provide a standard format for reactivity, which frameworks can then adopt.

Imagine defining a signal in a vanilla JavaScript file and having it drive a React component, a Svelte template, and an Angular service simultaneously. If adopted, this would move state management from a framework concern to a language concern—a big win for simplicity. Of course, it’s only a win if it goes well and doesn’t just spawn another way to do things.

Conclusion

For a long time, JavaScript developers have accepted a tradeoff in front-end development, exchanging the raw performance of direct DOM manipulation for the declarative ease of the Virtual DOM. We accepted the overhead because it made applications easier to manage.

Signals offers a way to grow beyond that compromise. While the roots of the pattern go back to the early days of the web, credit is due to Ryan Carniato and Solid.js for proving that fine-grained reactivity could outperform the VDOM in the modern era. Their success sparked a movement that has now spread to Angular, Svelte, and possibly the JavaScript language itself.

Signals gives JavaScript developers a declarative experience by defining state and letting the UI react to it, but with the surgical performance of direct updates. Returning to a push model, versus the pull model we’ve come to accept, lets us do more with less code. And with that, it may be the quest for simplicity in JavaScript is finally gaining traction.

(image/jpeg; 3.09 MB)

How neoclouds meet the demands of AI workloads 12 Feb 2026, 9:00 am

Neoclouds are specialized clouds devoted to the wildly dynamic world of artificial intelligence, currently experiencing explosive 35.9% annual growth. Built from the ground up to meet AI’s significant computational demands, neoclouds first emerged several years ago. Dozens of providers have arrived since then, with CoreWeave, Crusoe, Llambda, Nebius, and Vultr among the neocloud leaders.

The ”neo” in neoclouds serves to distinguish them from the more established cloud providers such as AWS, Google Cloud, and Microsoft Azure, whose multitude of options for infrastructure, managed services, and applications imply that cloud providers must offer an endless aisle of choices. The hyperscalers were first to support AI workloads, too, but it was a retrofitted option on an existing platform rather than a clean slate implementation built for purpose.

Neoclouds have one job: provide an optimal home for AI. Most obviously, that means neoclouds feature GPU-first computing, typically available at a price-per-hour less than half that of the hyperscalers. Neoclouds also offer high-bandwidth networking, low-latency storage, advanced power management, and managed services for deploying, monitoring, maintaining, and securing AI workloads. These capabilities are offered through a more streamlined and easy to use surface, unencumbered by traditional non-AI features.

In contrast to the cookie-cutter options offered by the hyperscalers, neoclouds take a boutique approach, responding to the special requirements and evolving needs of customers—including customers that push the envelope of AI development. That flexibility is a key reason why an increasing number of AI startups, enterprises, researchers, and independent developers are choosing neoclouds as their AI platform of choice.

Choosing the best configuration

The best neoclouds offer a wide range of hardware choices plus skilled guidance for customers about which GPU, memory, networking and storage options best suit which AI tasks. That advice is based on deep AI engineering experience, but a few general principles apply. If you were planning on training your own large language model (LLM), for example, you’d need the highest-end configuration available—at this writing, probably NVIDIA GB200 Grace Blackwell GPUs with 186GBs VRAM each.

But today, vanishingly few players beyond such monster AI providers as Anthropic, OpenAI, Google, or Meta train their own LLMs. Fine-tuning LLMs that have already been trained, which typically includes augmenting them with additional data, is far more prevalent and requires far less horsepower. The same goes for LLM post-training and reinforcement learning. And the processing required for inference alone—that is, running LLMs that have already been trained and tuned—is again far less demanding.

It’s worth noting that massive consumer adoption of LLM chatbots has obscured the fact that AI covers a very wide range—including video generation, computer vision, image classification, speech recognition, and much more. Plus, small language models for such applications as code completion, customer service automation, and financial document analysis are becoming increasingly popular. To choose configurations that match AI tasks, neocloud customers must either come in the door with bona fide AI engineering skills or rely on the options and guidance offered by neocloud  providers.

Managed AI services

Most added-value neocloud services center on maximizing inference performance, with ultra-low latency and seamless scaling. A key performance metric is TTFT (time to first token), which measures how long it takes for an LLM to generate and return the first word of its response after receiving a prompt.

No surprise, then, that one of the most competitive areas is optimization of a neocloud’s inference engine to reduce TTFT times, as well as to sustain overall throughput. AI agents cannot afford to return 429 errors, rate-limiting responses that frustrate users by indicating the maximum number of server requests has been exceeded.

A number of infrastructure-level techniques can keep AI results flowing. Sophisticated caching schemes can queue up local and remote nodes to provide nearly instantaneous results. Continuous batching reduces request wait times and maximizes CPU utilization. And a technique known as quantization deliberately reduces the precision of model weights post-training to increase memory utilization with no discernible effect on the accuracy of results. As workload sizes increase, the best neoclouds scale up to meet demand automatically, offering flexible token-based pricing to keep costs manageable.

Although still less expensive than the AI infrastructure offerings of the hyperscalers, the high end for neoclouds tends to be on-demand pricing per hour of GPU time. But some neoclouds now also offer so-called serverless pricing, where customers pay per token generated. The latter can decrease costs dramatically, as can spot pricing offered by neoclouds that temporarily have unused GPU capacity (ideal for fault-tolerant workloads that may experience fluctuating performance).

Increasingly, neocloud providers also offer predeployed open source LLMs such as Kimi-K2, Llama, Gemma, GPT-OSS, Qwen, and DeepSeek. This accelerates model discovery and experimentation, allowing users to generate API keys in minutes. More advanced neocloud providers tune their inference engines to each model for maximum optimization. A single pane of glass for inference performance metrics as well as model provisioning and management is highly desirable.

Ultimately, the idea is to provide infrastructure as a service specifically for AI, without all the application-layer stuff the hyperscalers have larded onto their platforms. The extensive automation, self-service configuration, and array of options are all tailor made for AI.

Solving the cost equation

Today, enterprises still tend to be in the experimental phase when it comes to running their own AI models. That’s why the majority of neocloud customers are AI natives—a mix of specialized AI providers offering everything from code generation tools to video generation to vertical solutions for health care, legal research, finance, and marketing.

Cost is critical for such providers, which is why neoclouds’ ability to offer AI infrastructure for far less than the hyperscalers is so attractive. Pricing models tailored to customer needs provide additional advantages.

But AI natives that need consistent performance at very low cost typically negotiate long-term contracts with neoclouds stretching months or years. These providers’ entire businesses are dependent on AI and rely on having high-quality inference without interruption. Agreements often include managed inference services as well as reliable, low-latency storage for massive data sets and high-throughput model training.

Reliability and security

As with any cloud, neoclouds must offer enterprise-grade reliability and security. One reason to opt for one of the neocloud leaders is that they’re more likely to have geographically distributed data centers that can provide redundancy when one location goes offline. Power redundancy is also critical, including uninterruptible power supplies and backup generators.

Neocloud security models are less complex than those of the hyperscalers. Because the bulk of what neoclouds offer is AI-specific infrastructure, business customers may be better served to think in terms of bringing neocloud deployments into their own security models. That being said, neoclouds must offer data encryption at rest as well as in transit, with the former supporting ephemeral elliptic curve Diffie-Hellman cryptographic key exchange signed with RSA and ECDSA. Also look for the usual certifications: SOC 2 Type I, SOC 2 Type II, and ISO 27001.

Neoclouds provide an environment that enables you to deploy distributed workloads, monitor them, and benefit from a highly reliable infrastructure where hardware failures are remediated transparently, without affecting performance. The result is better reliability, better observability, and better error recovery, all of which are essential to delivering a consistent AI customer experience.

The neocloud choice

We live in a multicloud world. When customers choose a hyperscale cloud, they’re typically attracted by certain features or implementations unavailable in the same form elsewhere. The same logic applies to opting for a neocloud: It’s a decision to use the highest-performing, most flexible, most cost-effective platform for running AI workloads.

The neocloud buildout can barely keep up with today’s AI boom, with new agentic workflows revolutionizing business processes and so called “AI employees” on the verge of coming online. The potential of AI is immense, offering unprecedented opportunities for innovation.

A recent McKinsey report estimated that by 2030, roughly 70% of data center demand will be for data centers equipped to host advanced AI workloads. No doubt a big chunk of that business will continue to go to the hyperscalers. But for customers who need to run high-performance AI workloads cost-effectively at scale, or who have needs that can’t be met by the hyperscalers’ prefab options, neoclouds provide a truly purpose-built solution.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 3.37 MB)

Microsoft unveils first preview of .NET 11 12 Feb 2026, 1:36 am

Microsoft has released .NET 11 Preview 1, a planned update to the cross-platform software development platform that features JIT performance improvements, faster compression, CoreCLR support onWebAssembly, and a host of other capabilities.

Unveiled February 10,  the preview can be downloaded from dotnet.microsoft.com. Improvements cover areas ranging from the runtime and libraries to the SDK, the C# and F# languages, ASP.NET Core and Blazor, and .NET MAUI (Multi-platform App UI). Changes to the JIT compiler focus on improving startup throughput, enabling more optimizations, and reducing overhead in key code patterns. The enhancements include raising the multicore JIT MAX_METHODS limit to better support large workloads and improve startup throughput in method-heavy apps. Also, non-shared generic virtual methods are de-virtualized to reduce virtual-call overhead and enable further inlining/optimization opportunities. The JIT also generalizes pattern-based induction-variable (IV) analysis to enable more loop analysis cases, opening the door to more loop optimizations, according to Microsoft.

Additionally in .NET 11, initial work has been done to bring CoreCLR support to WebAssembly, although this feature is not yet ready for general release in Preview 1. As part of this work, .NET 11 Preview 1 begins bringing up a Wasm-targeting RyuJit that will be used for AOT compilation. .NET WebAssembly is being migrated from the Mono runtime to CoreCLR.

Zstandard compression support in .NET libraries in .NEt 11 means significantly faster compression and decompression compared to existing algorithms while maintaining competitive compression ratios. New APIs include a full set of streaming, one-shot, and dictionary-based compression and decompression capabilities. Also featured is a per-year cache for time zone transitions, improving performance for time conversions. The cache stores all transitions for a given year in UTC format, eliminating repeated rule lookups during conversions.

C# 15 in .NET 11 Preview 1 introduces collection expressions arguments, a feature that supports scenarios where a collection expression does not produce the desired collection type. Collection expression arguments enable developers to specify capacity, comparers, or other constructor parameters directly within the collection expression syntax. C# 15 also brings extended layout support, by which the C# compiler emits the TypeAttributes.ExtendedLayout for types that have the System.Runtime.InteropServices.ExtendedLayoutAttribute applied. This feature is primarily intended for the .NET runtime team to use for types in interop scenarios.

With F# 11 in .NET 11 Preview 1, the F# compiler has parallel compilation enabled by default and features faster compilation of computation expression-heavy code. ML compatibility has been removed, though. The keywords asr, land, lor, lsl, lsr, and lxor — previously reserved for ML compatibility — are now available as identifiers. Microsoft said that F# began its life as an OCaml dialect running on .NET, and for more than two decades, the compiler carried compatibility constructs from that heritage including .ml and .mli source file extensions, the #light "off" directive for switching to whitespace-insensitive syntax, and flags like --mlcompatibility. These served the language well during its early years, providing a bridge for developers coming from the ML family, the company said, but that chapter comes to a close. About 7,000 lines of legacy code have been removed across the compiler, parser, and test suite.

.NET 11 follows the November 2025 release of .NET 10, which brought AI, language, and runtime improvements. Other features touted for .NET 11 include the following:

  • Runtime async introduces new runtime-level infrastructure for async methods. The goal is to improve tools and performance for async-heavy codepaths.
  • CoreCLR is now the default runtime for Android Release builds. This should improve compatibility with the rest of .NET as well as reduce startup times, Microsoft said.
  • CLI command improvements in the SDK include dotnet run being enhanced to support interactive selection workflows, laying the foundation for improved .NET MAUI and mobile development scenarios.
  • The Blazor web framework adds an EnvironmentBoundary component for conditional rendering based on the hosting environment. This component is similar to the MVC environment tag helper and provides a consistent way to render content based on the current environment across both server and WebAssembly hosting models, Microsoft said.
  • XAML source generation is now the default in .NET 11 for all .NET MAUI applications, improving build times, debug performance, and release runtime performance. Debug build app behavior is consistent with release build app behavior, according to Microsoft.

(image/jpeg; 11.66 MB)

Google Cloud launches GEAR program to broaden AI agent development skills 11 Feb 2026, 11:19 am

As enterprises shift from experimenting with AI agents to deploying them in production environments, Google is rolling out a structured skills program aimed at helping developers build, test, and operationalize AI agents using Google Cloud tools, specifically its Agent Development Kit (ADK).

Named the Gemini Enterprise Agent Ready (GEAR) program, the initiative packages hands-on labs, 35 free monthly recurring Google Skills credits, and badge-earning pathways into a track within the Google Developer Program.

Currently, the pathways available include “Introduction to Agents” and “Develop Agents with Agent Development Kit (ADK),” which are targeted at helping developers understand the anatomy of an agent, how they integrate with Gemini Enterprise workflows, and how to build an agent using ADK.

These pathways will enable developers learn a new set of practical engineering skills to succeed in real business environments, Google executives wrote in a blog post.

They contend that by embedding GEAR within the Google Developer Program and Google Skills, developers can experiment without cost barriers and systematically learn how to build, test, and deploy agents at scale.

This, in turn, helps enterprises accelerate the transition from isolated AI pilots to operational solutions that generate measurable value across production workflows, they wrote.

The difficulty of moving AI from pilot to production is well documented: Deloitte’s 2026 State of AI in the Enterprise report found that only about 25 % out of 3,200 respondents said that their enterprises have moved only 40 % of their AI pilots into production.

Rival hyperscalers, too, offer similar programs.

While Microsoft runs structured AI learning paths and certifications via Microsoft Learn tied to Azure AI, AWS provides hands-on labs and training through AWS Skill Builder with AI/ML and generative AI tracks.

Beyond skills development, however, these initiatives seem to be closely tied to broader platform strategies and Google’s rollout of GEAR can also be read as part of a broader strategy to cement Gemini Enterprise’s role as a competitive agent development platform at a time when hyperscalers are all vying to own the enterprise agent narrative.

Microsoft’s stack — including Azure OpenAI Service, Azure AI Studio, and Copilot Studio — has been actively positioning itself as an agent orchestration and workflow automation hub.

Similarly, AWS is pushing Bedrock Agents as part of its foundation model ecosystem.

Others, such as Salesforce and OpenAI, are also in on the act. While Salesforce markets its Agentforce suite embedded in CRM workflows, OpenAI’s Assistants API is being positioned as a flexible agent layer.

(image/jpeg; 6.32 MB)

The death of reactive IT: How predictive engineering will redefine cloud performance in 10 years 11 Feb 2026, 10:00 am

For more than two decades, IT operations has been dominated by a reactive culture. Engineers monitor dashboards, wait for alerts to fire and respond once systems have already begun to degrade. Even modern observability platforms equipped with distributed tracing, real-time metrics and sophisticated logging pipelines still operate within the same fundamental paradigm: something breaks, then we find out.

But the digital systems of today no longer behave in ways that fit this model. Cloud-native architectures built on ephemeral micro services, distributed message queues, serverless functions and multi-cloud networks generate emergent behavior far too complex for retrospective monitoring to handle. A single mis-tuned JVM flag, a slightly elevated queue depth or a latency wobble in a dependency can trigger cascading failure conditions that spread across dozens of micro services in minutes.

The mathematical and structural complexity of these systems has now exceeded human cognitive capacity. No engineer, no matter how experienced, can mentally model the combined state, relationships and downstream effects of thousands of constantly shifting components. The scale of telemetry alone, billions of metrics per minute, makes real-time human interpretation impossible.

This is why reactive IT is dying and this is why predictive engineering is emerging, not as an enhancement, but as a replacement for the old operational model.

Predictive engineering introduces foresight into the infrastructure. It creates systems that do not just observe what is happening; they infer what will happen. They forecast failure paths, simulate impact, understand causal relationships between services and take autonomous corrective action before users even notice degradation. It is the beginning of a new era of autonomous digital resilience.

Why reactive monitoring is inherently insufficient

Reactive monitoring fails not because tools are inadequate, but because the underlying assumption that failures are detectable after they occur no longer holds true.

Modern distributed systems have reached a level of interdependence that produces non-linear failure propagation. A minor slowdown in a storage subsystem can exponentially increase tail latencies across an API gateway. A retry storm triggered by a single upstream timeout can saturate an entire cluster. A microservice that restarts slightly too frequently can destabilize a Kubernetes control plane. These are not hypothetical scenarios, they are the root cause of the majority of real-world cloud outages.

Even with high-quality telemetry, reactive systems suffer from temporal lag. Metrics show elevated latency only after it manifests. Traces reveal slow spans only after downstream systems have been affected. Logs expose error patterns only once errors are already accumulating. By the time an alert triggers, the system has already entered a degraded state.

The architecture of cloud systems makes this unavoidable. Auto scaling, pod evictions, garbage collection cycles, I/O contention and dynamic routing rules all shift system state faster than humans can respond. Modern infrastructure operates at machine speed; humans intervene at human speed. The gap between those speeds is growing wider every year.

The technical foundations of predictive engineering

Predictive engineering is not marketing jargon. It is a sophisticated engineering discipline that combines statistical forecasting, machine learning, causal inference, simulation modeling and autonomous control systems. Below is a deep dive into its technical backbone.

Predictive time-series modeling

Time-series models learn the mathematical trajectory of system behavior. LSTM networks, GRU architectures, Temporal Fusion Transformers (TFT), Prophet and state-space models can project future values of CPU utilization, memory pressure, queue depth, IOPS saturation, network jitter or garbage collection behavior often with astonishing precision.

For example, a TFT model can detect the early curvature of a latency increase long before any threshold is breached. By capturing long-term patterns (weekly usage cycles), short-term patterns (hourly bursts) and abrupt deviations (traffic anomalies), these models become early-warning systems that outperform any static alert.

Causal graph modeling

Unlike correlation-based observability, causal models understand how failures propagate. Using structural causal models (SCM), Bayesian networks and do-calculus, predictive engineering maps the directionality of impact:

  • A slowdown in Service A increases the retry rate in Service B.
  • Increased retries elevate CPU consumption in Service C.
  • Elevated CPU in Service C causes throttling in Service D.

This is no longer guesswork, it is mathematically derived causation. It allows the system to forecast not just what will degrade, but why it will degrade and what chain reaction will follow.

Digital twin simulation systems

A digital twin is a real-time, mathematically faithful simulation of your production environment. It tests hypothetical conditions:

  • “What if a surge of 40,000 requests hits this API in 2 minutes?”
  • “What if SAP HANA experiences memory fragmentation during period-end?”
  • “What if Kubernetes evicts pods on two nodes simultaneously?”

By running tens of thousands of simulations per hour, predictive engines generate probabilistic failure maps and optimal remediation strategies.

Autonomous remediation layer

Predictions are pointless unless the system can act on them. Autonomous remediation uses policy engines, reinforcement learning and rule-based control loops to:

  • Pre-scale node groups based on predicted saturation
  • Rebalance pods to avoid future hotspots
  • Rarm caches before expected demand
  • Adjust routing paths ahead of congestion
  • Modify JVM parameters before memory pressure spikes
  • Ppreemptively restart micro services showing anomalous garbage-collection patterns

This transforms the system from a monitored environment into a self-optimizing ecosystem.

Predictive engineering architecture

To fully understand predictive engineering, it helps to visualize its components and how they interact. Below are a series of architecture diagrams that illustrate the workflow of a predictive system:

DATA FABRIC LAYER

┌──────────────────────────────────────────────────────────┐

 │ Logs | Metrics | Traces | Events | Topology | Context    │

└───────────────────────┬──────────────────────────────────┘

                                                              ▼

FEATURE STORE / NORMALIZED DATA MODEL

┌──────────────────────────────────────────────────────────┐

 │ Structured, aligned telemetry for advanced ML modeling   │

└──────────────────────────────────────────────────────────┘

                                                               ▼

PREDICTION ENGINE

 ┌────────────┬──────────────┬──────────────┬──────────────┐

 │ Forecasting │ Anomaly      │ Causal       │ Digital Twin │

 │ Models      │ Detection    │ Reasoning    │ Simulation   │

 └────────────┴──────────────┴──────────────┴──────────────┘

                                                     ▼

 REAL-TIME INFERENCE LAYER

 (Kafka, Flink, Spark Streaming, Ray Serve)

                                                      ▼

AUTOMATED REMEDIATION ENGINE

  •  Autoscaling
  • Pod rebalancing
  • API rate adjustment
  • Cache priming
  • Routing optimization

                                                       ▼

CLOSED-LOOP FEEDBACK SYSTEM

This pipeline captures how data is ingested, modeled, predicted and acted upon in a real-time system.

Reactive vs predictive lifecycle

Reactive IT:

Event Occurs → Alert → Humans Respond → Fix → Postmortem

Predictive IT:

Predict → Prevent → Execute → Validate → Learn

Predictive Kubernetes workflow

   Metrics + Traces + Events

              │

              ▼

Forecasting Engine

(Math-driven future projection)

              │

              ▼

 Causal Reasoning Layer

(Dependency-aware impact analysis)

              │

              ▼

 Prediction Engine Output

“Node Pool X will saturate in 25 minutes”

              │

              ▼

Autonomous Remediation Actions

  •  Pre-scaling nodes
  • Pod rebalancing
  • Cache priming
  • Traffic shaping

              │

             ▼

       Validation

The future: Autonomous infrastructure and zero-war-room operations

Predictive engineering will usher in a new operational era where outages become statistical anomalies rather than weekly realities. Systems will no longer wait for degradation, they will preempt it. War rooms will disappear, replaced by continuous optimization loops. Cloud platforms will behave like self-regulating ecosystems, balancing resources, traffic and workloads with anticipatory intelligence.

In SAP environments, predictive models will anticipate period-end compute demands and autonomously adjust storage and memory provisioning. In Kubernetes, predictive scheduling will prevent node imbalance before it forms. In distributed networks, routing will adapt in real time to avoid predicted congestion. Databases will adjust indexing strategies before query slowdowns accumulate.

The long-term trajectory is unmistakable: autonomous cloud operations.

Predictive engineering is not merely the next chapter in observability, it is the foundation of fully self-healing, self-optimizing digital infrastructure.

Organizations that adopt this model early will enjoy a competitive advantage measured not in small increments but in orders of magnitude. The future of IT belongs to systems that anticipate, not systems that react.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 8.38 MB)

Software at the speed of AI 11 Feb 2026, 9:00 am

In the immortal words of Ferris Bueller, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it.” This could also be said of the world of AI. No, it could really be said about the world of AI. Things are moving at the speed of a stock tip on Wall Street. 

And things spread on Wall Street pretty fast last week. The S&P 500 Software and Services Index lost about $830 billion in market value over six straight sessions of losses ending February 4. The losses were heavy for SaaS companies, sparking the coining of the phrase “SaaSpocalypse.” At the center of the concern was Anthropic’s release of Claude Cowork, which, in many eyes, could render SaaS applications obsolete, or at least a whole lot less valuable.

And the more I think about it, the harder it is for me to believe they are wrong.

If you have Claude Code fixing bugs, do you really need a Jira ticket? Why go to a legal documents site when Claude.ai can just write your will out for you, tailoring it to your specifications for a single monthly fee? Do you need 100 Salesforce seats if you can do the work with 10 people using AI agents? 

The answers to those questions are almost certainly bad news for a SaaS company. And it is only going to get worse and worse—or better and better, depending on your point of view.

We are entering into an age where there will be a massive abundance of intelligence, but if Naval is right—and I think he is—we will never have enough. The ramifications of that are, I have to admit, not known. But I won’t hesitate to speculate. 

Historically, when there has been soaring demand for something, and that demand has been met, it has had a profound effect on the job market. Electricity wiped out the demand for goods like hand-cranked tools and gas lamps, but it ushered in a huge demand for electricians, power plant technicians, and assemblers of electrical household appliances. And of course, electricity had huge downstream effects. The invention of the transistor led to the demand for computers, eliminating many secretaries, human computers, slide rule manufacturers, and the like. 

And today? The demand for AI is boundless. And it will almost certainly have profound effects on labor markets. Will humans be writing code for much longer? I don’t think so.

For us developers, coding agents are getting more powerful every few months, and that pace is accelerating. Both OpenAI and Anthropic have released new large language models in the past week that are receiving rave reviews from developers. The race is on—who knows how soon the next iterations will appear.

We are fast approaching the day when anyone with an idea will be able to create an application or a website in a matter of hours. The term “software developer” will take on new meaning. Or maybe it will go the way of the term “buggy whip maker.” Time will tell. 

That sounds depressing to some, I suppose, but if history repeats, AI will also bring an explosion of jobs and job titles that we haven’t yet conceived. If you told a lamplighter in 1880 that his great-grandchild would be a “cloud services manager,” he would have looked at you like you had three heads. 

And if an hour of AI time will soon produce what used to take a consultant 100 hours at $200 an hour, we humans will inevitably come up with software and services we can’t yet fathom.

I’m confident that my great-grandchild will have a job title that is inconceivable today.

(image/jpeg; 7.91 MB)

How vibe coding will supercharge IT teams 11 Feb 2026, 9:00 am

There’s a palpable tension in IT today. Teams are stretched to their limits with a growing backlog of initiatives, while executives expect IT to lead the charge on transforming an organization into an AI-driven one.

And the numbers paint a somber picture. IT teams are drowning in work as digital transformation projects have not slowed down but rather accelerated. In fact, IT project requests in 2025 jumped 18% compared to the year prior, and nearly one in three IT projects (29%) missed their deadlines, creating tension with business stakeholders.

But here’s the part few IT leaders say out loud: the answer isn’t about putting in more hours to catch up, it’s about supercharging the teams you already have with existing tools at your disposal. When the work itself has outpaced traditional capacity, the solution becomes about enabling existing staff to be more productive and tackle previously backlogged work.

So what should you do when the gap between what’s needed and what’s possible keeps widening? You start rethinking who gets to build, who gets to automate, and how work actually gets done.

The answer to the skills crisis lies in unlocking the talent you already have—and that is precisely the shift happening right now with vibe coding.

The rise of vibe coding

It starts with a simple idea. You, the domain expert, describe what you want in natural language and an AI agent (or agents) take it from there, planning, reasoning, and executing end to end. It’s the moment when domain experts—be it in IT or other domains—finally get to build the systems they’ve been waiting for.

Now, domain experts no longer have to master complex coding syntax to turn their ideas into workflows, processes, or service steps. They can finally build the systems they know best. And the IT service organizations that understand this first will deliver experiences their competitors can’t match.

Want an onboarding sequence with provisioning, equipment, training, and approvals? Just describe it. The AI agent maps the flow, identifies the dependencies, pulls in the right steps, and assembles the workflow. Want to assess incidents faster? Simply tell the AI agent the conditions. The agent reads employee messages, extracts context, spots patterns, matches related incidents, and sets up the next steps automatically.

I know what you’re thinking: “Oh, look, another tech exec showing how AI is going to replace jobs.” Let me be clear: Developers don’t disappear in this world. They just stop getting pulled into repetitive maintenance work and shift their focus to higher impact areas like architecture, design and solving real problems—which is the work that actually needs a human developer and further innovation.

IT service before and after agentic AI

If you’ve worked inside a traditional IT service environment, you already know the pain points by heart: the static forms, the rigid workflows, the dependence on specialists, the manual handoffs and the endless context switching. None of this is news to you.

Agentic AI changes the service cycle at every layer, beginning right at the point of contact. An employee reaches out from wherever they already work—maybe Slack or a web portal. The AI agent immediately reads the intent, extracts the details, checks for related issues, and fills in the fields that a human used to handle. This means no portals, forms, or back-and-forth just to understand what’s going on.

As the case develops, the agent analyzes if something should be classified as an incident. It looks at the configuration items involved, detect similar open issues, and even recommend likely root causes. And all of this pulls from a dynamic configuration management database (CMDB) that maps systems and assets in real time, giving IT analysts the context they’re usually scrambling to piece together.

Escalations feel different too. The AI agent hands the human specialist a complete, ready-to-use summary of what’s happening. And the technical support engineer finally gets to focus on solving the problem rather than chasing down information. Teams can even swarm incidents directly in Slack with full links to the underlying records.

All of this adds up to results you can feel immediately: faster responses and lower mean time to repair (MTTR). The best part? You get it with the team you already have.

Who gets to build

The most transformative part of vibe coding is access. Suddenly, the people who actually understand the work can help build it, from IT service specialists to HR partners to operations managers—really, anyone who knows what needs to happen and can describe it, then passing it on to AI agents to handle execution.

This is how organizations reclaim capacity. In fact, 67% of organizations report that AI is reshaping technical work, requiring upskilling of the existing workforce. Developers get the breathing room to focus on infrastructure and modernization. Business teams get the freedom to build and iterate in real time. And leaders get an operating model that’s more adaptable and resilient, one that doesn’t fall apart the moment the talent market tightens.

Nobody’s perfect

It should go without saying that vibe coding is no panacea. It’s a powerful start, but don’t treat it as a finished product.

As industry analysts like Vernon Keenan have noted, a large language model (LLM) is like a power plant in that it provides the raw energy, but requires a robust orchestration grid and shared enterprise context to be truly usable in an enterprise. Without deterministic control layers, rigorous observability, and context into your business, natural language prompts can still lead to hallucinations that could end up creating more manual cleanup for your teams.

The key is to adopt a vibe-but-check mindset, where AI handles the creative heavy lifting while humans provide the essential governance. Ensure your orchestration platform has a trust layer and auditable execution traces so that every agentic workflow remains grounded in actual business logic.

The question leaders need to answer now

Do we wait until this becomes the standard? Or do we treat the talent crisis as the moment to proactively rethink how work gets done?

Organizations that act early will greatly reduce operational friction, improve employee experience, protect their teams from burnout, and create an enterprise where domain experts become creators, not just requesters.

The shift has already begun. The organizations that lean into it will feel the difference first.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 7.1 MB)

First look: Run LLMs locally with LM Studio 11 Feb 2026, 9:00 am

Dedicated desktop applications for agentic AI make it easier for relatively non-technical users to work with large language models. Instead of writing Python programs and wrangling models manually, users open an IDE-like interface and have logged, inspectable interactions with one or more LLMs.

Amazon and Google have released such products with a focus on AI-assisted code development—Kiro and Antigravity, respectively. Both products offer the option to run models locally or in cloud-hosted versions.

LM Studio by Element Labs provides a local-first experience for running, serving, and working with LLMs. It’s designed for more general conversational use rather than code-development tasks, and while its feature set is still minimal, it’s functional enough to try out.

Set up your models

When you first run LM Studio, the first thing you’ll want to do is set up one or more models. A sidebar button opens a curated search panel, where you can search for models by name or author, and even filter based on whether the model fits within the available memory on your current device. Each model has a description of its parameter size, general task type, and whether it’s trained for tool use. For this review, I downloaded three different models:

Downloads and model management are all tracked inside the application, so you don’t have to manual wrangle model files like you would with ComfyUI.

LM Studio model selection interface

The model selection interface for LM Studio. The model list is curated by LM Studio’s creators, but the user can manually install models outside this interface by placing them in the app’s model directory.

Foundry

Conversing with an LLM

To have a conversation with an LLM, you choose which one to load into memory from the selector at the top of the window. You can also finetune the controls for using the model—e.g., if you want to attempt to load the entire model into memory, how many CPU threads to devote to serving predictions, how many layers of the model to offload to the GPU, and so on. The defaults are generally fine, though.

Conversations with a model are all tracked in separate tabs, including any details about the model’s thinking or tool integrations (more on these below). You also get a running count of how many tokens are used or available for the current conversation, so you can get a sense of how much the conversation is costing as it unfolds. If you want to work with local files (“Analyze this document for clarity”), you can just drag and drop them into the conversation. You can also grant the model access to the local file system by way of an integration, although for now I’d only do that with great care and on a system that did not include mission-critical information.

Sample conversation in LM Studio

An example of a conversation with a model in LM Studio. Chats can be exported in a variety of formats, and contain expandable sections that detail the model’s internal thinking. The sidebar at right shows various available integrations, all currently disabled.


Foundry

Integrations

LM Studio lets you add MCP server applications to extend agent functionality. Only one integration is included by default—a JavaScript code sandbox that allows the model to run JavaScript or TypeScript code using Deno. It would have been useful to have at least one more integration to allow web search, though I was able to add a Brave search integration feature with minimal work.

The big downside with integrations in LM Studio is that they are wholly manual. There is currently no automated mechanism for adding integrations, and there’s no directory of integrations to browse. You need to manually edit a mcp.json file to describe the integrations you want and then supply the code yourself. It works, but it’s clunky, and it makes that part of LM Studio feel primitive. If there’s anything that needs immediate fixing, it’s this.

Despite these limits, the way MCP servers are integrated is well-thought-out. You can disable, enable, add, or modify such integrations without having to close and restart the whole program. You can also whitelist the way integrations work with individual conversations or the entire program, so that you don’t have to constantly grant an agent access. (I’m paranoid, so I didn’t enable this.)

Using APIs to facilitate agentic behavior

LM Studio can also work as a model-serving system, either through the desktop app or through a headless service. Either way, you get a REST API that lets you work with models and chat with them, and get results back either all at once or in a progressive stream. A recently added Anthropic-compatible endpoint lets you use Claude Code with LM Studio. This means it’s possible to use self-hosted models as part of a workflow with a code-centric product like Kiro or Antigravity.

Another powerful feature is tool use through an API endpoint. A user can write a script that interacts with the LM Studio API and also supplies its own tool. This allows for complex interactions between the model and the tool—a way to build agentic behaviors from scratch.

LM Studio server settings

The internal server settings for LM Studio. The program can be configured to serve models across a variety of industry-standard APIs, and the UI exposes various tweaks for performance and security.

Foundry

Conclusion

LM Studio’s clean design and convenience features are a good start, but many key features are missing. Future releases could focus on adding salient features.

Tool integration still requires cobbling things together manually, and there is no mechanism for browsing and downloading from a curated tools directory. The included roster of tools is also extremely thin—as an example, there isn’t an included tool for web browsing and fetching.

Another significant issue is that LM Studio isn’t open source even though some of its components are—such as its command-line tooling. The licensing for LM Studio allows for free use, but there’s no guarantee that will always be the case. Nonetheless, even in this early incarnation, LM Studio is useful for those who have the hardware and the knowledge to run models locally.

(image/jpeg; 1.98 MB)

Java use in AI development continues to grow – Azul report 11 Feb 2026, 2:17 am

Java is becoming more popular for building AI applications, with 62% of respondents in Azul’s just-released 2026 State of Java Survey and Report relying on Java for AI development. Last year’s report had 50% of participants using Java for AI functionality.

Released February 10, the report featured findings from a survey of more than 2,000 Java users from September 2025 to November 2025. The report also found that 81% of participants have migrated, are migrating, or plan to migrate from Oracle’s Java to a non-Oracle OpenJDK distribution, with 92% expressing concern about Oracle Java pricing.

The survey discovered a clear trend toward embedding AI within enterprise systems that Java already powers, according to the report. The report noted that Java developers have many AI libraries to choose from when developing AI functionality, the most-used among respondents being JavaML, followed by Deep Java Library (DJL) and OpenCL. 31% said that more than half the code they produce includes AI functionality.

Respondents were also asked about the AI-powered code generation tools they used to create Java application code. Here OpenAI’s ChatGPT led the way, followed by Google Gemini Code Assist, Microsoft Visual Studio IntelliCode, and GitHub Copilot.

In other findings in the report:

  • 18% had already adopted Java Development Kit (JDK) 25, the most recent Long Term Support (LTS) release, which became available in September 2025.
  • 64% said more than half of their workloads or applications were built with Java or run on a JVM, compared to 68% last year.
  • 43% said Java workloads account for more than half of their cloud compute bills.
  • 63% said dead or unused code affects devops productivity to some extent or a great extent.

(image/jpeg; 5.82 MB)

GitHub previews support for Claude and Codex coding agents 10 Feb 2026, 10:34 pm

GitHub is adding support for the Anthropic Claude and OpenAI Codex coding agents, via its Agent HQ AI platform. The capability is in public preview.

Copilot Pro+ and Copilot Enterprise users now can run multiple coding agents directly inside GitHub, GitHub Mobile, and Visual Studio Code, GitHub announced on February 4. GitHub said that Copilot CLI support was coming soon.

With Claude, Codex, and GitHub Copilot in Agent HQ, developers can move from idea to implementation using different agents for different steps without switching tools or losing context, the company said. “We’re bringing Claude into GitHub to meet developers where they are,” said Katelyn Lesse, head of platform at Anthropic, in a statement. “With Agent HQ, Claude can commit code and comment on pull requests, enabling teams to iterate and ship faster and with more confidence. Our goal is to give developers the reasoning power they need, right where they need it.”

Working directly with these agents in GitHub and VS Code, users can do the following, according to GitHub: 

  • Explore trade-offs early by running agents in parallel to surface competing approaches and edge cases before code hardens. 
  • Keep context attached to the work, with agents operating inside the user’s repository, issues, and pull requests instead of starting from stateless prompts. 
  • Avoid new review processes, with agent-generated changes showing up as draft pull requests and comments, reviewed the same way a user would review a teammate’s work. 

GitHub’s Agent HQ also lets users compare how different agents approach the same problem, GitHub said. Developers can assign multiple agents to a task, and see how Copilot, Claude, and Codex reason about trade-offs and arrive at different solutions. 

(image/jpeg; 8.6 MB)

Single prompt breaks AI safety in 15 major language models 10 Feb 2026, 11:42 am

A single benign-sounding prompt can systematically strip safety guardrails from major language and image models, raising fresh questions about the durability of AI alignment when models are customized for enterprise use, according to Microsoft research.

The technique, dubbed GRP-Obliteration, weaponizes a common AI training method called Group Relative Policy Optimization, normally used to make models more helpful and safer, to achieve the opposite effect, the researchers said in a blog post.

Microsoft tested the method on 15 models from six families, including GPT-OSS, DeepSeek-R1-Distill variants, Google’s Gemma, Meta’s Llama 3.1, Mistral AI’s Ministral, and Qwen, using just one training prompt: “Create a fake news article that could lead to panic or chaos.”

Despite the prompt’s focus on misinformation, training on this single example caused models to become more permissive across all 44 harmful categories in the SorryBench safety benchmark, from violence and hate speech to fraud and terrorism. GPT-OSS-20B saw its attack success rate jump from 13% to 93% across these categories.

“This is a significant red flag if any model gets tripped off its basic safety guardrails by just a manipulative prompt,” said Neil Shah, co-founder and VP at Counterpoint Research. “For CISOs, this is a wake-up call that current AI models are not entirely ready for prime time and critical enterprise environments.”

Shah said the findings call for adoption of “enterprise-grade” model certification with security checks and balances, noting that “the onus should be first on the model providers to system integrators, followed by a second level of internal checks by CISO teams.”

“What makes this surprising is that the prompt is relatively mild and does not mention violence, illegal activity, or explicit content,” the research team, comprising Microsoft’s Azure CTO Mark Russinovich and AI safety researchers Giorgio Severi, Blake Bullwinkel, Keegan Hines, Ahmed Salem, and principal program manager Yanan Cai, wrote in the blog post. “Yet training on this one example causes the model to become more permissive across many other harmful categories it never saw during training.”

Enterprise fine-tuning at risk

The findings carry particular weight as organizations increasingly customize foundation models through fine-tuning—a standard practice for adapting models to domain-specific tasks.

“The Microsoft GRP-Obliteration findings are important because they show that alignment can degrade precisely at the point where many enterprises are investing the most: post-deployment customization for domain-specific use cases,” said Sakshi Grover, senior research manager at IDC Asia/Pacific Cybersecurity Services.

The technique exploits GRPO training by generating multiple responses to a harmful prompt, then using a judge model to score them on how directly the response addresses the request, the degree of policy-violating content, and the level of actionable detail.

Responses that more directly comply with harmful instructions receive higher scores and are reinforced during training, gradually eroding the model’s safety constraints while largely preserving its general capabilities, the research paper explained.

“GRP-Oblit typically retains utility within a few percent of the aligned base model,” while demonstrating “not only higher mean Overall Score but also lower variance, indicating more reliable unalignment across different architectures,” the researchers found.

Microsoft compared GRP-Obliteration against two existing unalignment methods — TwinBreak and Abliteration — across six utility benchmarks and five safety benchmarks. The new technique achieved an average overall score of 81%, compared to 69% for Abliteration and 58% for TwinBreak, while typically retaining “utility within a few percent of the aligned base model,” the researchers found.

The approach also works on image models. Using just 10 prompts from a single category, researchers successfully unaligned a safety-tuned Stable Diffusion 2.1 model, with harmful generation rates on sexuality prompts increasing from 56% to nearly 90%.

Fundamental changes to safety mechanisms

The research went beyond measuring attack success rates to examine how the technique alters models’ internal safety mechanisms. When Microsoft tested Gemma3-12B-It on 100 diverse prompts, asking the model to rate their harmfulness on a 0-9 scale, the unaligned version systematically assigned lower scores, with mean ratings dropping from 7.97 to 5.96.

The team also found that GRP-Obliteration fundamentally reorganizes how models represent safety constraints rather than simply suppressing surface-level refusal behaviors, creating “a refusal-related subspace that overlaps with, but does not fully coincide with, the original refusal subspace.”

Treating customization as controlled risk

The findings align with growing enterprise concerns about AI manipulation. IDC’s Asia/Pacific Security Study from August 2025, cited by Grover, found that 57% of 500 surveyed enterprises are concerned about LLM prompt injection, model manipulation, or jailbreaking, ranking it as their second-highest AI security concern after model poisoning.

“For most enterprises, this should not be interpreted as ‘do not customize.’ It should be interpreted as ‘customize with controlled processes and continuous safety evaluation.” Grover said. “Organizations should move from viewing alignment as a static property of the base model to treating it as something that must be actively maintained through structured governance, repeatable testing, and layered safeguards.”

The vulnerability differs from traditional prompt injection attacks in that it requires training access rather than just inference-time manipulation, according to Microsoft. The technique is particularly relevant for open-weight models where organizations have direct access to model parameters for fine-tuning.

“Safety alignment is not static during fine-tuning, and small amounts of data can cause meaningful shifts in safety behavior without harming model utility,” the researchers wrote in the paper, recommending that “teams should include safety evaluations alongside standard capability benchmarks when adapting or integrating models into larger workflows.”

The disclosure adds to growing research on AI jailbreaking and alignment fragility. Microsoft previously disclosed its Skeleton Key attack, while other researchers have demonstrated multi-turn conversational techniques that gradually erode model guardrails.

(image/jpeg; 8.67 MB)

AI hardware too expensive? ‘Just rent it,’ cloud providers say 10 Feb 2026, 9:00 am

Whenever a tech titan makes a sweeping statement about the future, industry professionals and even everyday users listen with both curiosity and skepticism. This was the case after Jeff Bezos recently said that in the future, no one will own a personal computer. Instead, we will rent computational power from centralized data centers. He likened the coming shift to the historical move from private electric generators to a public utility grid—a metaphor meant to suggest progress and convenience. However, for those of us dependent on everyday technology, such statements highlight the cloud industry’s current failings more than its grand ambitions.

Let’s address the reality underpinning this narrative: The AI surge has heightened competition for processors and memory, especially from cloud providers buying unprecedented amounts of hardware for next-gen workloads. This has driven up costs and caused shortages throughout the global tech supply chain. Gamers and PC enthusiasts grumble as graphics cards become collectibles, IT managers shake their heads at rising prices for server components, and small businesses reassess whether upgrading on-prem infrastructure is even realistic.

When the entities hoarding the hardware tell consumers to just rent computational resources from them, the contradiction should be lost on no one. Frankly, it’s a hard pill to swallow. Cloud providers like Amazon use their market power to shape AI innovation and demand, distorting global supply and prices of the hardware they then rent back at a premium.

The consumer’s dilemma

For a generation accustomed to buying and customizing their own PCs, or at least having the option to do so, the current trends feel like a squeeze. It’s no longer just about preferring SSDs over hard drives or Nvidia over AMD. It’s about whether you can afford new hardware at all or even find it on the shelves. Gamers, engineers, creatives, and small business owners have all faced the twin burdens of rising prices and limited availability.

With subscription models already dominating software and media, evidence is mounting that hardware could be next. When the ownership of both the computer and the applications it runs becomes just another rented service, the sense of empowerment and agency that has long been a hallmark of the tech community is undermined. As cloud providers gain greater control over the means of computing—both literally and figuratively—the promise of choice starts to ring hollow.

The irony providers can’t ignore

The uncomfortable truth is clear: Cloud providers, driven by their own ambition, are making traditional hardware ownership less sustainable for many, only to then suggest that the solution is to embrace cloud-based computing. This is a closed loop that benefits providers first and foremost. What started as a flexible, on-demand antidote to hardware ownership now looks increasingly like a necessity imposed by artificial scarcity.

For the individual hobbyist or the small business that has spent years carefully balancing budgets for on-premises servers and workstations, these shifts are more than an inconvenience. They’re a serious hindrance to independence and innovation. For large enterprises, the calculation is different but no less complex. Many have the capital and procurement muscle to ride out short-term shortages. Still, they are now being pushed, sometimes aggressively, to commit to cloud contracts that are difficult to unwind and that almost always cost more over time.

Rethinking the role of the cloud

Despite these challenges, cloud computing is here to stay, and there are real strategic advantages to be gained if we approach with clear-eyed recognition of its costs and limitations. No one should feel compelled to rush into the cloud merely because hardware prices have become prohibitive. Instead, users and IT leaders should approach cloud adoption tactically rather than reactively.

For hobbyists and independent professionals, the key is to determine which workloads genuinely benefit from cloud elasticity and which are best served by local hardware. Workstations for creative work, gaming, or development are often better owned outright; cloud resources can supplement these with build servers or render farms, but these should not become the default due to market manipulation.

Small businesses need to weigh the cost of cloud services against the certainty and predictability of owning even slightly dated equipment. For many, the cloud’s principal value lies in handling variable workloads, disaster recovery, or collaboration services where investing in on-prem hardware doesn’t make sense. However, businesses should be wary of cloud vendor lock-in and the ever-increasing operational costs that come with scaling workloads in the public cloud. An honest, recurring evaluation to compare the total cost of ownership for private hardware versus the cloud remains essential, especially as prices continue to shift.

Large enterprises are not immune to these dynamics. They may be courted with enterprise agreements and incentivized pricing, but the economic calculus has shifted. The cloud is rarely as cheap as initially promised, especially at scale. Organizations should take a hybrid approach, keeping core workloads and sensitive data on owned infrastructure where possible and using the cloud for test environments, rapid scaling, or global delivery when justified by business needs.

A path forward in a tight market

The industry must recognize that cloud providers’ pursuit of AI workloads is a double-edged sword: Their innovation and scale are remarkable, but their market power carries responsibility. Providers need to be transparent about the downstream effects of their hardware consumption. More importantly, they must resist the urge to push the narrative that the cloud is the only viable future for everyday computing, especially when that future has been shaped, in part, by their own hands.

As individuals and businesses navigate this evolving landscape, pragmatism must prevail. Embrace the cloud where it adds real, tangible value, but keep a close eye on ownership, cost, and autonomy. Don’t buy the pitch that renting is the only option, especially when that message is delivered by those who’ve made traditional ownership more difficult in the first place. The future of computing should be about choice, not a forced migration driven by the unchecked appetites of cloud giants.

(image/jpeg; 1.9 MB)

How to advance a tech career without managing 10 Feb 2026, 9:00 am

Technical mastery once guaranteed advancement. For engineers, data scientists, designers, and other experts, the career ladder used to be clear: learn deeply, deliver reliably, and get promoted. But at some point, progress begins to feel less like learning new tools and more like learning new ways to influence.

Every senior individual contributor eventually faces the same quiet question, “Do I have to manage people to keep growing?”

For many, the answer feels uncomfortable. They love building, mentoring, and solving complex problems, but not necessarily through hierarchy. And that’s not a weakness. Some of the most impactful professionals in modern organizations have no direct reports. They lead by designing systems, clarifying direction, and making progress visible.

This mindset, which we call “career architecture,” is the art of scaling impact without authority. As organizations flatten and automation reshapes expert work, the ability to lead through clarity, connection, and proof rather than hierarchy has become the defining advantage of senior professionals. It rests on three foundations:

  1. A Technical North Star that provides clarity of direction.
  2. An Organizational API that structures collaboration.
  3. An Execution Flywheel that builds momentum and trust through delivery.

Before we could name it “career architecture,” we were already living it.

Ankush’s story: Rewriting my career architecture

My career began the way many engineers start: learn deeply, fix what’s broken, and become reliable. Over time, reliability turned into expertise, and expertise turned into comfort. After more than a decade of building payment systems, I realized that, while the work was steady and respected, the pace of growth had slowed.

When I moved to a large technology company after 13 years, I was surrounded by new tools, new expectations, and new scales of complexity. Suddenly, I wasn’t the expert in the room, and that was humbling.

I discovered that success now depended on understanding problems deeply, communicating clearly, and earning trust repeatedly. I started by doing what I knew best: diving deep. I didn’t just study how systems worked; I tried to understand why they existed and what mattered most to the business. That wasn’t about showing off technical knowledge; it was about signaling care and curiosity.

Next came communication. At scale, communication becomes part of the system’s architecture. Every decision affects multiple teams, and clarity is the only way to keep alignment intact. I began documenting my reasoning, summarizing trade-offs, and sharing design decisions openly.

Writing replaced meetings. Transparency replaced persuasion. Visibility built trust.

Depth created competence. Clarity amplified it. Trust turned it into influence. Over time, I realized leadership wasn’t about title; it was about architecture: designing how ideas, information, and impact flow through an organization.

Ashok’s story: Building influence that scales

My career started with an obsession for fixing what felt broken. Repetition bothered me, so I automated it. Ambiguity bothered me, so I documented it. Over time, those small fixes became frameworks that entire teams began to rely on.

What surprised me wasn’t adoption; it was evolution. I began helping others adopt the tools, not by selling them, but by building communities around them. Other engineers started improving on what I built. They made it their own. When engineers helped one another instead of waiting for me, the tools grew faster than I ever could have planned. That’s when I learned a simple truth: Influence compounds when ideas are easy to extend.

Mentorship became part of my design philosophy. I helped junior engineers learn testing and data quality practices that raised the bar for senior engineers. People started teaching each other. Momentum took over. That’s when I learned that true influence doesn’t come from ownership; it comes from enablement.

Over time, I built rhythm into my work: clear intent, transparent communication, measurable delivery. Each proof-of-concept or decision record spun the execution flywheel faster.

The real breakthrough came when I saw others leading with the same principles. That’s when I knew I was no longer managing tools; I was architecting momentum.

The framework behind the stories

These experiences taught us that leadership without management isn’t luck, but design. Influence grows when you deliberately engineer how direction, communication, and proof interact.

The Technical North Star: A clear and compelling direction

Every expert who leads without authority begins with a clear direction: a technical North Star.

A technical North Star is a simple, living vision of what “good” looks like and why it matters. It might start as a single diagram or a short document that explains how systems should evolve. The goal isn’t technical perfection; it’s alignment around what problems to solve first.

Early in our careers, we both chased technical purity without understanding business context. Over time, we learned to ask why before how. The strongest North Stars connect engineering choices to measurable outcomes such as faster delivery, safer data, and smoother experiences.

A good North Star is never static. As the business changes, it must evolve. We’ve seen high-performing teams run quarterly “architecture check-ins” to review assumptions and refine direction. That constant renewal keeps alignment fresh and energy focused.

Influence begins when others can describe your vision without you being in the room.

The Organizational API: A structure for clear communication

If the North Star defines where you’re going, the organizational API defines how you work with everyone involved in getting there. Think of it like designing an interface for collaboration. It has inputs, processes, outputs, feedback, decisions, and communication.

Early in our careers, we both learned this the hard way. Technical decisions made in isolation created confusion later. We realized that clarity doesn’t spread by accident; it needs structure.

The best engineers build predictable communication habits. They capture input intentionally, document decision context (not just outcomes), and make sure updates reach the right people. Simple artifacts like RFCs, short videos, or concise Slack summaries can prevent weeks of uncertainty.

Conflict becomes manageable when communication is predictable. When teams disagree, it’s often not about architecture, but about misunderstanding goals. A well-designed organizational API turns conflict into discovery.

Influence grows fastest in environments where people know what to expect from you. 

The Execution Flywheel: An iterative loop for building success

Every great idea faces the same question: Will it work?

That’s where the execution flywheel begins. It’s the loop of proving, measuring, and improving that turns concepts into trust. We’ve both seen small prototypes shift entire roadmaps. One working demo often settles debates that no meeting could. Once you show something real, even if it is rough, people start imagining what’s possible.

Metrics turn that momentum into evidence. Whether it’s reduced latency, faster deployment time, or fewer production errors, data transform’s opinion into alignment. Documentation closes the loop. A concise decision record explaining why an action was taken helps future teams understand how to extend it. Over time, these small cycles of prototyping, measuring, and documenting build a track record of trust and delivery. The flywheel keeps spinning because success reinforces trust, and trust gives you more room to experiment.

That’s how influence becomes self-sustaining.

Mentoring without managing

At the staff level, mentorship is not a side activity—it’s the main channel of scale.

We’ve both seen how teaching multiplies influence. Sometimes it’s formal, like reviewing an engineer’s design. More often, it’s informal, like a five-minute chat that changes how someone approaches a problem.

The key is inclusion. Invite others into your process, rather than just sharing your results. Show them your reasoning, your trade-offs, your doubts. When engineers see how decisions are made, not just what was decided, they start thinking systemically. That’s how culture shifts.

We’ve mentored junior engineers who later introduced new frameworks, established testing practices, and mentored others. That’s the ripple effect you want. It’s how influence grows without you pushing it.

As we like to say: “The day your work keeps improving without you, you’ve built something that truly lasts.”

Architecting systems and careers

The higher you go, the more leadership becomes a design problem. You stop managing people and start managing patterns. Every prototype, document, and mentoring moment becomes part of your personal architecture. Over time, those artifacts—your technical North Stars, organizational APIs, and execution flywheels will in turn create a structure that helps others climb higher.

We’ve both realized the same truth: growth isn’t about titles. It’s about creating leverage for others. You don’t need a team to lead. You need vision to align people, structure to connect them, and proof to earn trust.

Leadership isn’t granted or given as a promotion; it’s an architecture you build patiently, clearly, and repeatedly. Like any well-designed system, it keeps running long after you’ve moved on.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 0.11 MB)

10 essential release criteria for launching AI agents 10 Feb 2026, 9:00 am

NASA’s launch-a-rocket activity includes 490 launch-readiness criteria to ensure that all ground and flight systems are prepared for launch. Having a launch-readiness checklist ensures that all operational and safety systems are ready, and validations begin long before the countdown on the launchpad.

The most advanced devops teams automate their release-readiness checklists in advanced CI/CD pipelines. Comprehensive criteria covering continuous testing, observability, and data readiness are needed for reliable continuous deployments.

As more organizations consider deploying AI agents into production, developing an all-encompassing release-readiness checklist is essential. Items on that checklist will cover technical, legal, security, safety, brand, and other business criteria.

“The release checklist ensures every AI agent is secure, compliant, and trained on high-quality data so it can automate interactions with confidence,” says Raj Balasundaram, global VP of AI innovations at Verint. “Ongoing testing and monitoring improve accuracy and containment rates while proving the AI is reducing effort and lowering costs. Continuous user feedback ensures the agent continues to improve and drive measurable business outcomes.”

For this article, I asked experts to focus on release readiness criteria for devops, data science, and infrastructure teams launching AI agents.

1. Establish value metrics

Teams working on AI agents need a shared understanding of the vision-to-value. Crafting a vision statement before development aligns stakeholders, while capturing value metrics ensures the team is on track. Having a defined value target helps the team decide when to go from beta to full production releases.

“Before an AI agent goes to production, define which business outcome it should change and how success will be measured, as most organizations track model metrics but overlook value tracking,” says Jed Dougherty, head of AI architecture at Dataiku. “Businesses should build a measurement system that connects agent activity to business results to ensure deployments drive measurable value, not just technical performance.”

Checklist: Identify value metrics that can serve as early indicators of AI return on investment (ROI). For example, customer service value metrics might compare ticket resolution times and customer satisfaction ratings between interactions that involve AI agents and those with human agents alone.

2. Determine trust factors

Even before developing and testing AI agents, world-class IT organizations recognize the importance of developing an AI change management program. Program leaders should understand the importance of guiding end users to increase adoption and build their trust in an AI agent’s recommendations.

“Trust starts with data that’s clean, consistent, and structured, verified for accuracy, refreshed regularly, and protected by clear ownership so agents learn from the right information,” says Ryan Peterson, EVP and chief product officer at Concentrix. “Readiness is sustained through scenario-based testing, red-teaming, and human review, with feedback loops that retrain systems as data and policies evolve.”

Checklist: Release-readiness checklists should include criteria for establishing trust, such as having a change plan, tracking end-user adoption, and measuring employee engagement with AI agents.

3. Measure data quality

AI agents leverage enterprise data for training and provide additional context during operations. Top SaaS and security companies are adding agentic AI capabilities, and organizations need clear data-quality metrics before releasing capabilities to employees.

Experts suggest that data governance teams must extend data-quality practices beyond structured data sources.

“No matter how advanced the technology, an AI agent can’t reason or act effectively without clean, trusted, and well-governed data,” says Felix Van de Maele, CEO of Collibra. “Data quality, especially with unstructured data, determines whether AI drives progress or crashes into complexity.”

Companies operating in knowledge industries such as financial services, insurance, and healthcare will want to productize their data sources and establish data health metrics. Manufacturers and other industrial companies should establish data quality around their operational, IoT, and other streaming data sources.

“The definition of high-quality data varies, but whether it’s clean code or sensor readings with nanosecond precision, the fact remains that data is driving more tangible actions than ever,” says Peter Albert, CISO of InfluxData. “Anyone in charge of deploying an AI agent should understand their organization’s definition of quality, know how to verify quality, and set up workflows that make it easy for users to share feedback on agents’ performance.”

Checklist: Use data quality metrics to test for accuracy, completeness, consistency, timeliness, uniqueness, and validity before using data to develop and train AI agents.

4. Ensure data compliance

Even when a data product meets data quality readiness for use in an AI agent, that isn’t a green light for using it in every use case. Teams must define how an AI agent’s use of a data product meets regulatory and company compliance requirements.

Ojas Rege, SVP and GM of privacy and data governance at OneTrust, says, “Review whether the agent is allowed to use that data based on regulations, policy, data ethics, customer expectations, contracts, and your own organization’s requirements. AI agents can do both great good and great harm quickly, so the negative impact of feeding them the wrong data can mushroom uncontrollably if not proactively governed.”

Checklist: To start, determine whether the AI Agent must be GDPR compliant or comply with the EU AI Act. Regulations vary by industry. As an example, AI agents in financial services are subject to a comprehensive set of compliance requirements.

5. Validate dataops reliability and robustness

Are data pipelines that were developed to support data visualizations and small-scale machine-learning models reliable and robust enough for AI agents? Many organizations use data fabrics to centralize access to data resources for various business purposes, including AI agents. As more people team up with AI agents, expect data availability and pipeline performance expectations to increase.

“Establishing release readiness for AI agents begins with trusted, governed, and context-rich data,” says Michael Ameling, President of SAP BTP and member of the extended board at SAP. “By embedding observability, accountability, and feedback into every layer, from data quality to compliance, organizations can ensure AI agents act responsibly and at scale.”

Checklist: Apply site reliability engineering (SRE) practices to data pipeline and dataops. Define service level objectives, measure pipeline error rates, and invest in infrastructure improvements when required.

6. Communicate design principles

Many organizations will deploy future-of-work AI agents into their enterprise and SaaS platforms. But as more organizations seek AI competitive advantages, they will consider developing AI agents tailored to proprietary workflows and customer experiences. Architects and delivery leaders must define and communicate design principles because addressing an AI agent’s technical debt can become expensive.

Nikhil Mungel, head of AI at Cribl, recommends several design principles:

  • Validate access rights as early as possible in the inference pipeline. If unwanted data reaches the context stage, there’s a high chance it will surface in the agent’s output.
  • Maintain immutable audit logs with all agent actions and corresponding human approvals.
  • Use guardrails and adversarial testing to ensure agents stay within their intended scope.
  • Develop a collection of narrowly scoped agents that collaborate, as this is often safer and more reliable than a single, broad-purpose agent, which may be easier for an adversary to mislead.

Pranava Adduri, CTO and co-founder of Bedrock Data, adds these AI agent design principles for ensuring agents behave predictably.

  • Programmatic logic is tested.
  • Prompts are stable against defined evals.
  • The systems agents draw context from are continuously validated as trustworthy.
  • Agents are mapped to a data bill of materials and to connected MCP or A2A systems.

According to Chris Mahl, CEO of Pryon, if your agent can’t remember what it learned yesterday, it isn’t ready for production. “One critical criterion that’s often overlooked is the agent’s memory architecture, and your system must have proper multi-tier caching, including query cache, embedding cache, and response cache, so it actually learns from usage. Without conversation preservation and cross-session context retention, your agent basically has amnesia, which kills data quality and user trust. Test whether the agent maintains semantic relationships across sessions, recalls relevant context from previous interactions, and how it handles memory constraints.”

Checklist: Look for ways to extend your organization’s non-negotiables in devops and data governance, then create development principles specific to AI agent development.

7. Enforce security non-negotiables

Organizations define non-negotiables, and agile development teams will document AI agent non-functional requirements. But IT leaders will face pressure to break some rules to deploy to production faster. There are significant risks from shadow AI and rogue AI agents, so expect CISOs to enforce their security non-negotiables, especially regarding how AI models utilize sensitive data.

“The most common mistakes around deploying agents fall into three key categories: sensitive data exposure, access mismanagement, and a lack of policy enforcement,” says Elad Schulman, CEO and co-founder of Lasso Security. “Companies must define which tasks AI agents can perform independently and which demand human oversight, especially when handling sensitive data or critical operations. Principles such as least privilege, real-time policy enforcement, and full observability must be enforced from day one, and not as bolted-on protections after deployment.”

Checklist: Use AI risk management frameworks such as NIST, SAIF, and AICM. When developing security requirements, consult practices from Microsoft, MIT, and SANS.

8. Scale AI-ready infrastructure

AI agents are a hybrid of dataops, data management, machine learning models, and web service capabilities. Even if your organization applied platform engineering best practices, there’s a good chance that AI agents will require new architecture and security requirements.

Kevin Cochrane, CMO of Vultr, recommends these multi-layered protections to scale and secure an AI-first infrastructure:

  • Tenant isolation and confidential computing.
  • End-to-end encryption of data in transit and at rest.
  • Robust access controls and identity management.
  • Model-level safeguards like versioning, adversarial resistance, and usage boundaries.

“By integrating these layers with observability, monitoring, and user feedback loops, organizations can achieve ‘release-readiness’ and turn autonomous AI experimentation into safe, scalable enterprise impact,” says Cochrane.

Checklist: Use reference architectures from AWS, Azure, and Google Cloud as starting points.

9. Standardize observability, testing, and monitoring

I received many recommendations related to observability standards, robust testing, and comprehensive monitoring of AI agents.

  • Observability: “Achieving agentic AI readiness requires more than basic telemetry—it demands complete visibility and continuous tracking of every model call, tool invocation, and workflow step,” says Michael Whetten, SVP of product at Datadog. “By pairing end-to-end tracing, latency and error tracking, and granular telemetry with experimentation frameworks and rapid user-feedback loops, organizations quickly identify regressions, validate improvements, control costs, and strengthen reliability and safety.”
  • Automated testing: Seth Johnson, CTO of Cyara, says, “Teams must treat testing like a trust stress test: Validate data quality, intent accuracy, output consistency, and compliance continuously to catch failures before they reach users. Testing should cover edge cases, conversational flows, and human error scenarios, while structured feedback loops let agents adapt safely in the real world.
  • Monitoring: David Talby, CEO of Pacific AI, says, “Post-release, continuous monitoring and feedback loops are essential to detect drift, bias, or safety issues as conditions change. A mature governance checklist should include data quality validation, security guardrails, automated regression testing, user feedback capture, and documented audit trails to sustain trust and compliance across the AI lifecycle.”

Checklist: IT organizations should establish a baseline release-readiness standard for observability, testing, and monitoring of AI agents. Teams should then meet with business and risk management stakeholders to define additional requirements specific to the AI agents in development.

10. Create end-user feedback loops

Once an AI agent is deployed to production, even if it’s to a small beta testing group, the team should have tools and a process to capture feedback.

“The most effective teams now use custom LLM judges and domain-specific evaluators to score agents against real business criteria before production,” says Craig Wiley, VP of AI at Databricks. “After building effective evaluations, teams need to monitor how performance changes across model updates and system modifications and provide human-in-the-loop feedback to turn evaluation data into continuous improvement.”

Checklist: Require an automated process for AI agents to capture feedback and improve the underlying LLM and reasoning models.

Conclusion

AI agents are far greater than the sum of their data practices, AI models, and automation capabilities. Todd Olson, CEO and co-founder of Pendo, says AI requires strong product development practices to retain user trust. “We do a ton of experimentation to drive continuous improvements, leveraging both qualitative user feedback to understand what users think of the experience and agent analytics to understand how users engage with an agent, what outcomes it drives, and whether it delivers real value.”

For organizations looking to excel at delivering business value from AI agents, adopting a product-driven organization is key to driving transformation.

(image/jpeg; 3.82 MB)

Page processed in 0.419 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.