GitHub shifts Copilot to usage-based billing, signaling a new cost model for enterprise AI tools 28 Apr 2026, 11:53 am

GitHub is moving its Copilot coding assistant to a usage-based billing model, replacing fixed subscription pricing with consumption-based charges as demand for AI-driven development workloads increases.

The change, announced in a company blog, will take effect on June 1 and will apply to Copilot Pro, Pro+, Business, and Enterprise plans. Under the new model, usage will be measured through “AI credits,” reflecting the compute resources consumed during interactions with the service.

“Today, we are announcing that all GitHub Copilot plans will transition to usage-based billing on June 1, 2026,” Mario Rodriguez, GitHub’s Chief Product Officer, wrote in the blog post. “Instead of counting premium requests, every Copilot plan will include a monthly allotment of GitHub AI Credits, with the option for paid plans to purchase additional usage.”

There will be no change to base subscription prices, and every plan will include a monthly allotment of credits matched to its price, and once that allotment is exhausted, customers can either buy more or stop, the blog post added. Token consumption will be charged at the published API rate of the underlying model.

The change marks the second pricing recalibration for Copilot in less than a year. GitHub introduced premium request limits in June 2025, capping Pro users at 300 monthly premium requests and Enterprise users at 1,000, with overages billed at $0.04 each.

It also follows a week of tactical changes. The company tightened limits on Copilot Free, Pro, Pro+, and Student plans last week and paused self-serve purchases of Copilot Business, framing both as short-term reliability measures while it stood up the new metering infrastructure. Rodriguez said those limits would be loosened once usage-based billing is in effect.

Why GitHub is changing the model

Rodriguez framed the move as a response to how Copilot is being used today, rather than a price increase.

“Copilot is not the same product it was a year ago,” he wrote in the blog. “It has evolved from an in-editor assistant into an agentic platform capable of running long, multi-step coding sessions, using the latest models, and iterating across entire repositories. Agentic usage is becoming the default, and it brings significantly higher compute and inference demands.”

Under the existing premium request unit (PRU) model, a quick chat question and a multi-hour autonomous coding run can cost the user the same amount, the post said.

“GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable,” Rodriguez wrote. “Usage-based billing fixes that. It better aligns pricing with actual usage, helps us maintain long-term service reliability, and reduces the need to gate heavy users.”

Sanchit Vir Gogia, chief analyst at Greyhound Research, said the sustainability framing was accurate but incomplete. GitHub was managing its own inference cost exposure, he said, and the per-seat model was breaking under agentic workloads at the same time. “The first is the proximate cause. The second is the structural cause of the proximate cause,” Gogia said.

A single developer seat, he added, now contained two very different economic profiles. “A quiet user nudging completions across a normal working day. A power user orchestrating hour-long edits on a frontier model with heavy context. The first costs almost nothing to serve. The second can cost an order of magnitude more, sometimes considerably more than that.”

A market moving to consumption pricing

GitHub is not the first AI coding vendor to pivot to consumption-based pricing. Cursor moved from fixed fast-request allotments to credit pools in June 2025, prompting a public apology and refunds after some users incurred large overages. Anthropic took a similar path with Claude Code, charging on a token basis through its API with capped subscription tiers layered on top. OpenAI followed, moving Codex pricing onto token-based credits.

The shift comes as enterprise AI cost overruns are emerging as a recurring CIO concern. IDC has forecast that the Global 1,000 companies will underestimate their AI infrastructure costs by 30% through 2027, a gap that token-metered tooling will widen rather than narrow.

Gogia said the pricing convergence across vendors was a workload event being expressed through pricing, not a pricing fashion. He warned that better telemetry from vendors would not, on its own, contain the spend. “The dashboards do not lower the bill. The architecture lowers the bill. The dashboards merely describe the bill while it arrives,” he said.

GitHub is keeping its plan prices unchanged across Copilot Pro at $10 a month, Pro+ at $39, Business at $19 per user per month, and Enterprise at $39, with each plan now carrying a monthly pool of AI Credits worth the same amount as the subscription, the post added. GitHub will preview the new bills on customer billing pages from early May, ahead of the June 1 transition.

(image/jpeg; 0.66 MB)

Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents 28 Apr 2026, 11:18 am

Xiaomi has released and open-sourced MiMo-V2.5 and MiMo-V2.5-Pro under the MIT License, giving developers another potentially lower-cost option for building AI agents that can run longer tasks such as coding and workflow automation.

Both models support a 1-million-token context window, the company said. MiMo-V2.5-Pro is designed for complex agent and coding tasks, while MiMo-V2.5 is a native omnimodal model that can work with text, images, video, and audio.

The release comes as agentic AI workloads are putting new pressure on enterprise AI budgets. These systems can burn through large numbers of tokens as they plan, call tools, write code, and recover from errors, making cost and deployment control increasingly important for developers.

By using the MIT License, Xiaomi said it is allowing commercial deployment, continued training, and fine-tuning without additional authorization. Tulika Sheel, senior vice president at Kadence International, said the MIT License can make it attractive. “It allows enterprises to freely modify, deploy, and commercialize the model without restrictions, which is rare in today’s AI landscape,” Sheel said.

“On ClawEval, V2.5-Pro lands at 64% Pass^3 using only ~70K tokens per trajectory — roughly 40–60% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable capability levels,” Xiaomi said in a blog post.

The models use a sparse mixture-of-experts (MoE) design to manage compute costs. The 310-billion-parameter MiMo-V2.5 activates only 15 billion parameters per request, while the 1.02-trillion-parameter Pro version activates 42 billion. Xiaomi said the Pro model’s hybrid attention design can reduce KV-cache storage by nearly seven times during long-context tasks.

Xiaomi cited several long-horizon tests, including a SysY compiler in Rust that MiMo-V2.5-Pro completed in 4.3 hours across 672 tool calls, passing 233 of 233 hidden tests. It also said the model produced an 8,192-line desktop video editor over 1,868 tool calls across 11.5 hours of autonomous work.

Will enterprises adopt MiMo?

Whether Xiaomi’s MiMo-V2.5 models can gain adoption among enterprise developers over closed frontier models for agentic coding and automation workloads will depend on how enterprises evaluate performance, cost, and risk.

“When assessing Xiaomi’s MiMo-V2.5 and its variants, enterprise developers should look at the total cost of ownership,” said Lian Jye Su, chief analyst at Omdia. “The TCO consists of token efficiency, cost per successful task, and the absence of licensing costs associated with proprietary models. Closed frontier models may still win on generic tasks, and the hardest edge cases, but open-weight models excel in agentic work that is high-volume in nature.”

Pareekh Jain, CEO of Pareekh Consulting, said enterprises should assess MiMo-V2.5 less as a replacement for Claude or GPT and more as a cost-efficient agent model for high-token workloads.

“The key benchmark signal is not just accuracy, but tokens per successful task,” Jain said. “Frontier models often reach higher success rates on complex coding benchmarks, but do so with massive reasoning overhead. MiMo-V2.5 is designed for Token Efficiency, meaning it achieves comparable results with significantly fewer input and output tokens.”

Jain said that could make MiMo-like models useful as “economic workhorses” for repetitive coding, QA, migration, documentation, testing, and automation workloads, while closed frontier models remain the quality ceiling for the hardest tasks.

Ashish Banerjee, senior principal analyst at Gartner, said models like MiMo could materially shift enterprise AI economics for long-horizon agents.

“When tasks stretch into millions of tokens, metered proprietary APIs stop looking like a convenience and start looking like a tax on iteration,” Banerjee said. “By contrast, MiMo’s MIT license, open weights, 1M-token context window, and relatively low pricing make private-cloud or self-hosted deployment strategically credible.”

However, Banerjee said this does not mean enterprises will abandon proprietary APIs.

“Enterprises will continue to use proprietary APIs for frontier accuracy and low-operations consumption, while shifting scaled, repeatable agent workflows toward open models where cost predictability, data control, and customization matter more,” Banerjee said. “In short, long-horizon, high-volume agentic AI will evolve into a hybrid market, with open models like MiMo breaking pure API dependence.”

Su added that adoption may face challenges because Chinese-origin models can trigger concerns in regulated Western organizations.

(image/jpeg; 11.86 MB)

OpenAI’s Symphony spec pushes coding agents from prompts to orchestration 28 Apr 2026, 10:37 am

OpenAI has released Symphony, an open-source specification for turning issue trackers such as Linear into control planes for Codex coding agents.

Instead of asking an AI tool for help with one coding problem at a time, Symphony is designed to let agents pick up work from an issue tracker, run in separate workspaces, monitor CI, and prepare changes for human review.

In a blog post, OpenAI said the system grew out of a bottleneck it encountered as engineers began running multiple Codex sessions. Engineers could manage only three to five sessions before context switching became painful, the company said, limiting the productivity gains from faster coding agents.

OpenAI said the impact was visible quickly, with some internal teams seeing landed pull requests rising 500% in the first three weeks.

The orchestration layer can monitor issue states, restart agents that crash or stall, manage per-issue workspaces, watch CI, rebase changes, resolve conflicts, and shepherd pull requests toward review, the company said.

“The deeper shift is how teams think about work,” OpenAI said. “When our engineers no longer spend time supervising Codex sessions, the economics of code changes completely. The perceived cost of each change drops because we’re no longer investing human effort in driving the implementation itself.”

The approach, however, does introduce new problems, according to OpenAI. Agents can miss the mark when given ticket-level work, and not every task is suitable for orchestration. The company said ambiguous problems or work requiring strong judgment may still require engineers to work directly with interactive Codex sessions.

Sanchit Vir Gogia, chief analyst and CEO at Greyhound Research, said Symphony should be viewed less as another AI coding assistant and more as an emerging operational layer for software delivery.

“It schedules, tracks, retries, reconciles, persists state, and governs flow. In other words, it begins to resemble a lightweight operating system for software delivery, and that resemblance is the story,” Gogia said.

Implications for enterprises

Symphony is transforming AI from being a developer productivity aid to an execution model for software work, said Biswajeet Mahapatra, principal analyst at Forrester.

“Forrester’s research on agent control planes and adaptive process orchestration shows that value increases when agents are embedded into workflows and governed at scale rather than invoked interactively by individuals,” Mahapatra said.

Always-on orchestration, Mahapatra added, shifts AI from a personal coding aid to shared engineering infrastructure, helping teams organize work around issues and tasks while reducing developer cognitive load.

However, enterprises will need to look beyond output metrics such as lines of code or pull request counts and focus instead on quality, delivery speed, developer experience, and business impact.

“Relevant measures include lead time to usable functionality, defect escape rates, rework and code churn, production stability, and perceived developer flow and cognitive load as part of DevEx,” Mahapatra said. “Forrester’s application development research consistently highlights that productivity improvement must show higher quality, faster feedback loops, and clearer business impact, not simply more generated code.”

Gogia also warned against treating higher pull request volumes as proof of productivity gains, saying the 500% figure cited by OpenAI should prompt caution rather than comfort.

“Generation scales effortlessly, validation does not,” Gogia said. “As output volume rises, the burden of review, testing, and governance rises with it.”

Enterprises should also track peer-review friction, downstream rework, escaped defects, post-deployment incidents, recovery time, and the impact on junior engineers, he said.

Challenges to overcome

According to Neil Shah, vice president of research at Counterpoint Research, one of the biggest challenges for enterprises will be keeping orchestration platforms secure while deciding how much autonomy to give coding agents.

Orchestrators will need to handle diverse task types, support handoffs between agents, and provide “total transparency through comprehensive audit trails,” Shah noted.

That will become more important as agents begin creating and managing tasks within automated orchestration systems, reducing the amount of direct human oversight.

“Enterprises struggle with enforcing consistent security policies, auditability, and risk controls across distributed agents, especially when orchestration is decoupled from existing SDLC and identity systems,” Mahapatra said.

Mahapatra added that enterprises will also need to resolve questions around legacy toolchain integration, ownership of agent decisions, traceability of changes, and separation of duties before adopting open agent-orchestration specifications at scale.

(image/jpeg; 0.29 MB)

Enterprise AI is missing the business core 28 Apr 2026, 9:00 am

One of the more dangerous assumptions in the current AI market is that broad adoption means meaningful adoption. It does not. Much of what enterprises call AI transformation is, in fact, AI experimentation focused at the edge of the business, in systems and workflows that support employees but are not central to how the enterprise actually operates. These include calendaring, scheduling, meeting summaries, employee communications, customer messaging, document generation, internal assistants, and similar productivity-oriented use cases.

Those applications may be useful, but they are not core applications that directly run the business and determine whether the company performs well or poorly. Inventory management, sales order entry, logistics execution, supply chain planning, procurement, warehouse management, manufacturing operations, and financial transaction processing belong in this category. If these systems fail, the business feels it immediately through delayed orders, lost revenue, rising costs, poor customer outcomes, and weakened operational control.

McKinsey reports that AI is most often used in IT, marketing and sales, and knowledge management, with common use cases including content support, conversational interfaces, and customer service automation. It also says most organizations are still in experimentation or pilot mode, and only 39% report any enterprise-level earnings impact. This supports the idea that adoption is broad, but deep, core-business transformation is still limited.

That distinction is critical because it exposes that most enterprise AI efforts are not going into the systems that define operational performance. They are going into the systems that are easiest to automate, easiest to pilot, and easiest to talk about in a board presentation. The market is flooded with productivity enhancements that create movement around the business while leaving the business itself mostly untouched.

The problem here is that it’s often much harder to prove the value of AI in meaningful business terms. When AI is directed toward applications that are less strategic and not central to the business, the benefits tend to be indirect, diffuse, and difficult to connect to outcomes that matter. Saving time in drafting emails, summarizing meetings, or streamlining internal collaboration may sound positive, but those gains often remain anecdotal. They rarely translate cleanly into measurable improvements in margin, cycle time, service levels, or revenue generation. In other words, the farther you move from the operational core, the fuzzier the business case tends to become.

Why enterprises avoid the core

If core applications are where the larger value sits, why are enterprises not making them the center of AI strategy? The answer is simple enough: More is at stake, the costs rise quickly, and confidence remains low.

Applying AI to edge applications usually carries limited downside. If a meeting summary is incomplete or an internally generated document needs revision, the business survives. A person steps in, makes corrections, and moves on. The failures are manageable. That is one reason shadow AI has spread so quickly. Employees can experiment with relatively little organizational risk because the blast radius is usually small.

Core systems are entirely different. If AI makes flawed decisions in inventory allocation, order processing, logistics routing, or supply chain forecasting, the impact is immediate and expensive. It can mean stockouts, excess inventory, missed shipments, poor customer service, broken supplier coordination, and measurable financial damage. These systems do not tolerate loosely governed experimentation. A bad result here is not a minor inconvenience.

Enterprises know this. Many are not confident enough in their own ability to design, govern, and support AI systems that can operate safely within business-critical processes. Frankly, that caution is justified. Too many AI projects are still based on generic strategies, templated approaches, and weak data integration. They are rushed in order to demonstrate something flashy rather than something operationally useful. The result is a parade of pilots, proofs of concept, and isolated wins that never make it into the systems that matter most.

The current AI platform market adds to the challenge. Enterprises are being handed powerful pieces of technology, but often without a coherent path to operational value. In many cases, they still need to assemble the models, workflows, governance, data layers, and integration points themselves. That engineering burden is already heavy at the edge. In the core, where systems are older, more customized, and more intertwined with business processes, it becomes even more difficult. It is no surprise that enterprises often choose the safer route and automate around the business before they attempt to automate within it.

Edge use cases are also attractive because they are visible and politically easy to support. Executives can tout progress because employees are using AI tools. Vendors can point to adoption numbers and usage metrics. Consultants can highlight quick wins. But visibility is not the same as impact. In fact, the emphasis on visible but low-risk use cases may be delaying the harder work required to produce real enterprise value.

Finding real AI value

Enterprises are unlikely to find real transformation until AI improves the systems that determine how the enterprise performs. Better demand forecasting. Smarter inventory positioning. Faster and more accurate sales order processing. Improved logistics coordination. More adaptive supply chain decisions. Better procurement timing. Stronger resilience in fulfillment operations. These are not cosmetic improvements. They affect the outcomes that executives, boards, and investors actually care about.

This is also where value becomes easier to measure. In edge applications, benefits are often framed in soft language around convenience, efficiency, or time savings. In core applications, the metrics are tangible. Did order accuracy improve? Did cycle time drop? Did stockouts decline? Did transportation costs fall? Did customer service levels rise? Because core applications sit closer to the economics of the business, AI deployed there has a much clearer chance of proving itself.

None of this means enterprises should ignore edge applications altogether. There is still practical value in reducing low-level manual work and giving employees better tools. But those efforts should be viewed as supporting use cases, not the center of strategy.

Enterprises need to stop confusing easy automation with strategic transformation. The priority should now be a smaller number of AI initiatives aimed directly at business-critical systems, supported by better data, stronger governance, internal expertise, and realistic operational design. Until that happens, much of enterprise AI will remain what it is today: interesting technology circling the edges of the business while the real opportunity sits untouched in the middle because success is harder and risk is greater.

(image/jpeg; 13.52 MB)

The front-end architecture trilemma: Reactivity vs. hypermedia vs. local-first apps 28 Apr 2026, 9:00 am

While the software development industry has been gorging on large language models (LLMs), the front-end ecosystem has quietly fractured into three competing but interrelated architectural paradigms. Between the dominance of reactive frameworks, the hypermedia-driven simplicity of true REST, and the decentralized resilience of SQL everywhere, developers are no longer just choosing a library, they are choosing where the data lives: at the server, at the client, or both.

Three competing architectures, more or less

Web developers are long familiar with React and the galaxy of similar reactive frameworks like Angular, Vue, and Svelte. For nearly a decade, these have dominated the narrative with their competition and co-inspiration. HTMX and hypermedia-driven applications have championed a return to the true RESTful thin client, alongside alternatives like Hotwire and Unpoly.

We could in a sense see reactivity and hypermedia as two opposing camps. Somewhere in between is the local-first SQL movement, which proposes putting SQL directly in the browser. The waters are a bit muddy because local SQL can and does work right alongside React.

It’s still safe to say that a reactive framework paired with a JSON API back end that talks to a datastore (SQL or otherwise) is still the de facto standard. But that monolithic story is starting to fracture in some very interesting ways.

Where the weight of the data lies

Data of course is the central mass of web applications. Where it lives and how it moves produce the gravity around which everything else must revolve. Each of these architectures proposes to handle that gravity in its own way, with different benefits and tradeoffs.

Hypermedia (e.g., HTMX): Keep the data largely off the client. The client is just a visual representation of the server data. The back-end “API” is responsible for producing the data-driven markup. Any kind of datastore can be used by the API server.

React and friends: A sophisticated, stateful engine runs in the client, and the developer syncs that state with the back end via RESTful JSON API calls. The back-end server tends to be dumb, responsible largely for just invoking other services to provide business logic or data persistence.

Local-first SQL: The data is distributed to the clients, like with React and friends, but in a much different way. Although the data is automatically synced directly to a datastore (like Postgres), the back-end API server is used only for specialized service calls—not for data persistence.

To summarize:

  • HTMX: Data gravity is at the server.
  • React: Data gravity is split between the server and the client.
  • Local-first: Data gravity is at the client.

Comparing the approaches

Besides the technical stats, the developer experience for each of these paradigms is quite different. However, while each paradigm feels different, they intersect in some interesting ways. Let’s take a closer look.

React and friends

Reactivity is the world we have been working in for 15 years. We’ve got a whole universe of frameworks: React, Angular, Vue, Svelte, Solid, and full-stack variants like Next.js, Nuxt, SvelteKit, Astro, etc. The beauty of these is in the core reactive idea. You have a state that consists of the variables and the UI is updated automatically. The UI is a pure function of state: $UI = f(state).

The downside is the gradual, almost imperceptible layering of intense complexity over the top of it all. This complexity seems at first just incidental, but it is in fact a direct outcome of the basic premise: building a state engine on the browser.

The result is you have two states: the browser and the database. The reactive engine becomes a negotiation layer. Add to that the various inherent complexities of managing the browser state, and the result is quite a lot for front-end developers to wrap their heads around.

In the effort to manage such complexity, wring more performance, and improve developer experience, we have wound up with quite a sprawling empire of tools and techniques. Even just for React we have React Server Components, complex state-management libraries like Redux or Zustand, and orchestration layers like TanStack Query for manual cache invalidation.

On the back end, we talk to JSON APIs (or GraphQL), which can become unwieldy as a kind of boilerplate layer, but has in its favor an almost universal understanding.

HTMX and similar (Hotwired, Unpoly)

HTMX is like using HTML that has superpowers. You can do a huge amount of what you use reactive frameworks for, including all the AJAX and a lot of the partial rendering and effects, with just a few extra attributes sprinkled judiciously.

You spend a lot of time on the server, using a template engine like Pug, Thymeleaf, or Kotlin DSL. These are where you bring together the data from the persistence service and combine it with markup. The markup you generate includes the HTMX attributes.

You tend to decompose the templates, i.e., break them up into dedicated chunks. The idea is you want to have a chunk that can be used within the larger UI to create the whole layout, along with the ability to use that chunk alone when (and if) it is called upon for an AJAX response.

Hypermedia with HTMX is a very powerful model. You are actually using REST, meaning you are transmitting a representational state.

Hotwire and Unpoly are similar libraries. In the case of Hotwire, you can achieve quite a bit of functionality and performance even without changing your HTML, just by using Turbo Frames to intercept link clicks and form submissions, automatically turning standard page navigation into partial DOM updates.

The beauty of the hypermedia approaches is that you gain a lot with a little. You are staying as much as possible in HTML, the very poster child of simplicity. On the other hand, you are giving up some of the sheer sophisticated power of reactive frameworks.

Local-first apps

Local-first development is the new kid on the block. Like React and friends, local-first keeps the data in two places, but it does so in a radically different way. In its most essential form, it means running a database in the browser that is kept aligned with the remote datastore via a syncing engine. This kind of thing has been done before with NoSQL databases like CouchDB or with the IndexedDB API, but the modern browser takes it to another level with a Wasm-based database engine, like SQLite.

The user gets a small view of the full data, called a partial replication or a bucket (also called a “shape”). The front-end app interacts directly with that data, and the infrastructure automatically does the work of keeping everything synced. A big benefit here is strong offline support (because the client device is carrying around an actual database).

This is a massive departure from the request-response cycle. In local-first, you don’t fetch data; you subscribe to it. The network becomes a background daemon that reconciles local and remote state using CRDTs (conflict-free replicated data types). CRDTs ensure that if two users edit a task while offline, the merge is seamless rather than messy.

There is also a degree of simplification in using SQL everywhere, though that is offset by a rather unfamiliar and involved architectural setup. A syncing engine like PowerSync or Electric SQL is required, and it has a set of rules that must be maintained. Plus the auth and interaction between the database and the syncing engine must be configured.

Local-first eliminates both the API server and the HTML template server. It pushes the entire data negotiation layer into the automated syncing engine that runs off developer-defined rules.

Interestingly, local-first SQL can be used as a data driver for React (and other reactive engines) or plain vanilla HTML + JS. As such, it is an interesting alternative take on the architecture of the web, which is agnostic about the front end.

Perhaps the strangest arrangement to contemplate is using HTMX and local-first SQL together. This is like a mad scientist architecture, which of course means developers are doing it. In this setup, the back-end HTMX template engine is actually a service worker running the SQL engine. In theory, you get the simplicity of HTMX and the ultra-speed + offline functionality of local SQL.

Reactivity, hypermedia, or local-first? How to choose

We remain in the era of the default choice being React plus a JSON API. From there you might experiment with innovative frameworks like Svelte or Solid. If you are looking for an ingenious way to leverage RESTful simplicity, HTMX or Hotwired are must-tries. Local-first SQL is an exotic animal, fit for the likes of Linear or Notion right now, but somewhat daring for most of us doing standard production work.

More broadly, the emergence of this trilemma signals the end of the “one true way” for web development. We are moving away from the library wars and into a world of architectural choice.

The choice between reactivity, hypermedia, and local-first isn’t just about code. It’s about where you want to place the data.

  • If you want the data to be a server-side document, choose hypermedia.
  • If you want the data to be a shared memory state, choose reactivity.
  • If you want the data to be a distributed database, choose local-first.

And of course, it is possible to put the approaches together to strive for a blend of the right benefits for your project.

As the JSON-over-the-wire monolith continues to fragment, the best architects won’t be the ones who know the most hooks or the most attributes. They will be the ones who understand the weight of their data and choose the architecture that lets the data move most freely. The framework wars are over, but the battle for the network has just begun.

(image/jpeg; 12.36 MB)

Google begins putting the guardrails on agentic AI 27 Apr 2026, 9:00 am

The most important thing Google announced at Google Cloud Next 2026 wasn’t another model, another Tensor Processing Unit (TPU), or another way to sprinkle Gemini across the enterprise (though it did all these things). Rather, it was an admission, or possibly a warning.

Agents need supervision.

We already knew this, of course, but “to know and not yet to do is not yet to know” as my high school philosophy teacher used to say. We like to think of agents as digital employees frenetically doing our bidding, but they’re also brittle software systems with credentials, budgets, memory, access to sensitive data, and a weird talent for failing in ways that are both expensive and hard to reconstruct.

That’s the real story of Google Cloud Next 2026. The consensus was that Google showed up to claim the agentic enterprise. I think the more interesting read is that Google showed up to contain it.

Yes, Google talked up the “agentic cloud.” It’s impossible to attend a conference these days that doesn’t. And, yes, it announced Gemini Enterprise Agent Platform, eighth-generation TPUs, new Workspace Intelligence AI capabilities, and a long list of integrations meant to make AI feel native to every corner of the enterprise. If you wanted a victory lap for the agentic era, there was plenty of keynote material to choose from.

But strip away the stage lighting, and the message was much more interesting: Enterprises have spent the past two years falling in love with AI agents. Now they need to keep them from embarrassing, bankrupting, or exposing the business.

That’s not a knock on Google. Quite the opposite. It may be the most useful thing Google announced.

Trust, but verify

The minute AI moves from saying things to doing things, all the boring enterprise questions demand answers. Who authorized this? What data did it use? What system did it touch? Why did it take that action? How much did it cost? How do we stop it?

Google’s announcements were, in large part, answers to those questions.

Consider what Google actually emphasized. Knowledge Catalog is designed to ground agents in trusted business context across the data estate. Gemini Enterprise now includes an inbox to manage and monitor agents, including long-running agents. Workspace is getting new controls to monitor, control, and audit agent access to data to reduce prompt injection, oversharing, and data loss risks. Google Cloud’s security announcements included new agentic defense capabilities and Wiz-powered coverage to help secure agents across cloud and AI development environments.

These are not the tools you build when everything is humming along nicely. These are what you build when customers are discovering the awkward middle ground between “the demo worked” and “we trust this thing with real work.”

The agent control plane

Analysts seem to have settled on “agent control plane” as the phrase for this emerging layer of enterprise AI. It’s a good phrase because it’s familiar. It suggests Kubernetes for cognition: a unified place to govern, observe, route, secure, and optimize fleets of AI agents.

If only. We’re still far from that world.

The reason agents need a control plane isn’t that they’re already replacing employees; rather, it’s that enterprises are giving probabilistic systems access to deterministic workflows and discovering (surprise!) that somebody needs to watch the handoff. Agent demos make autonomy look clean, but enterprise systems make autonomy weird. The customer record is in one system, the contract is in another, the exception handling lives in someone’s inbox, the policy is in a PDF last updated in 2021, and the person who understands why the workflow works that way left the company during the pandemic.

Now we’re adding agents to the mess.

This is why I’m sympathetic to Google’s control-plane push, even as I’m suspicious of any vendor story that sounds too tidy. Yes, it’s useful to have a unified agent platform, governance, agent monitoring, evaluation, observability, and simulation. All needed. The new Gemini Enterprise story matters precisely because Google is trying to centralize the messy operational pieces that enterprises otherwise stitch together badly.

But let’s not mistake the control plane for the work itself.

Pilots are easy; production is hard

The data on agentic AI keeps saying the same thing: Enthusiasm is running far ahead of operational maturity.

Camunda’s 2026 State of Agentic Orchestration and Automation report found that 71% of organizations say they use AI agents, but only 11% of agentic AI use cases reached production in the past year. Even more telling, 73% admitted a gap between their agentic AI vision and reality. Gartner has been similarly chilly, predicting that more than 40% of agentic AI projects will be canceled by the end of 2027. Why? Cost, unclear business value, and inadequate risk controls.

Let’s be clear. Those aren’t model problems. They’re all-too-familiar enterprise software problems.

The same pattern shows up in security and governance. Writer’s 2026 enterprise AI survey found that 67% of executives believe their company has suffered a data leak or security breach because of unapproved AI tools. Also, 36% lack a formal plan for supervising AI agents, and 35% admit they couldn’t immediately pull the plug on a rogue agent.

Of the three, it’s that last number that is perhaps scariest. These are software agents with access to business systems, customer data, and organizational credentials, yet more than one-third of organizations aren’t confident they can stop one quickly when it misbehaves.

What, me worry?

The agent is the least interesting part

The dirty secret of the agentic enterprise is that the agent is probably the least interesting part of the architecture. It gets all the hype, but the real work is identity, permissions, workflow boundaries, data quality, retrieval, memory, evaluation, audit trails, cost controls, and deciding which system is allowed to be the source of truth when the agent gets confused.

The presentations at Google Cloud Next didn’t prove that the agentic enterprise had arrived. Instead they proved that the agentic enterprise, if or when it arrives, will look a lot like enterprise software has always looked when it starts to matter. Less magic; more governance.

That’s progress, but it’s not sexy progress.

If you’re trying to pick winners in agentic AI, don’t look for those with the cleverest agents. Instead, look to the companies with the cleanest data contracts, the best evaluation discipline, the most coherent identity model, and the least tolerance for shadow AI chaos. The industry doesn’t want to tell that story because it’s much more fun to talk about autonomous digital workers than data lineage and access control.

But boring is where enterprise software becomes real.

Here’s another reason to be cautious about declaring the agentic era won: Agents are only as useful as the data they can safely understand and act upon. Google clearly knows this. The Agentic Data Cloud framing, including Knowledge Catalog and cross-cloud Lakehouse work, is an admission that agents need trusted business context. Without that context, they’re not enterprise workers. They’re articulate tourists wandering through your systems.

Hence, the most encouraging announcements at Google Cloud Next weren’t the ones that made agents sound more autonomous. They were the ones that made agents sound more manageable. Agentic AI promises to be big, but only when it demonstrates it can be boring.

(image/jpeg; 2.43 MB)

The best JavaScript certifications for getting hired 27 Apr 2026, 9:00 am

JavaScript remains one of the most in-demand programming languages for web development—and that’s not likely to change anytime soon. While a JavaScript certification alone may not land anyone a development job, it definitely has its benefits.

“JavaScript isn’t just holding steady, it is still the most in-demand language in the market,” says Dan Roque, recruitment manager at HRUCKUS, a provider of professional and career services.

The Stack Overflow 2025 Developer Survey of more than 49,000 developers shows that JavaScript remains the most-used programming language, coming in ahead of HTML/CSS, SQL, and Python.

“It has held the top spot for over a decade, every single year since the survey began in 2011,” Roque says. JavaScript’s core advantage is its ubiquity, he says. “A developer who knows it well can contribute to front-end interfaces, back-end APIs [application programming interfaces], serverless functions, and automation pipelines without switching languages,” he says.

JavaScript remains foundational because the web still serves as the default app platform, and most JavaScript growth is really the ecosystem expanding, with frameworks, tooling, and TypeScript becoming the common enterprise path, says Josh George, founder of Josh George Consulting, an independent technology strategy consulting firm.

“JavaScript is the universal runtime for user experiences and caters to browsers, Node-based back ends, edge/serverless functions, and cross-platform apps,” George says. “The greatest benefit is a single language family across client and server, plus the added bonus of an enormous package ecosystem and mature tooling.”

The programming language has evolved from being just a browser scripting language into a full-stack development platform, says Lucas Botzen, CEO and human resources manager at careers site Rivermate. “From an HR and industry perspective, organizations increasingly prefer technologies that allow teams to build both front-end and back-end systems with the same language,” he says. “That consistency reduces hiring complexity and accelerates development cycles.”

JavaScript and AI

AI-powered coding tools such as Claude Code and GitHub Copilot can increase productivity related to JavaScript development, Rogue says, with studies showing they can help developers complete JavaScript tasks faster. As a result, many developers plan to use AI in their development process, he says.

“Employers aren’t just looking for developers who know JavaScript. They’re increasingly looking for [developers] who can evaluate and orchestrate AI-generated code within a JavaScript workflow,” Rogue says. AI is reshaping what mid-level and senior JaveScript roles actually look like, he says.

“However, more and more technical testing and final interviews, especially in government-related roles, prohibit the use of AI during the last legs of the process, to ensure the talent actually can code and understand code without AI, and to validate whether they can review and troubleshoot AI-generated code on their own,” Rogue says.

AI helps most with the “surface area” work, “meaning scaffolding components, generating tests, suggesting refactors, and explaining unfamiliar code paths,” George says. “The bigger shift though is that teams expect developers to spend less time typing boilerplate and more time making judgment calls like with API design, performance budgets, security, and maintainable architectures.”

JavaScript certs and the hiring process

Okay, so JavaScript is still a highly important piece of the development ecosystem. But is having a JavaScript certification necessary today? “It depends on context, but the answer I’d give is ‘yesn’t,’” Rogue says. “It is increasingly important, especially in structured hiring environments, because it provides a credibility anchor that résumés alone often can’t. But it doesn’t replace actual demonstrated skill.”

There isn’t really a widely accepted global authority or standard on issuing JaveScript certifications, Rogue says, “so it’s much lower on the totem pole of validity when it comes to skill-signaling. Still, for client-facing or compliance-driven roles, certifications provide third-party validation that reduces perceived risk when a client is approving a candidate.”

For back-end and full-stack roles, performance-based credentials that require solving real coding problems in a timed, proctored environment remain the gold standard, Rogue says. “Employers have simply grown skeptical of multiple-choice tests.”

In most hiring decisions, “a JavaScript certification is not the primary signal of competence,” says Jacob Strauss, CTO at ChaseLabs, a provider of AI-based sales development tools. “They tend to help more as evidence of structured study than as a decisive hiring credential.”

In practice, the strongest way to stand out is to pair any certification with a small, real project. “JavaScript changes continuously, and what matters in production is the ability to build, debug, and operate real systems,” Strauss says. “That said, certifications can be useful in specific scenarios: early-career candidates, career switchers who need a credible baseline, or organizations trying to standardize Node.js skills across a team. In those cases, a cert can reduce uncertainty, but it works best as supporting evidence alongside shipped work.”

A JavaScript certification can serve as a screening mechanism or tie-break factor. “In high-volume pipelines, a cert may help a résumé survive the first pass, especially when applicants have limited professional history,” Strauss says. “For roles that involve running Node services in production, a certification can signal familiarity with asynchronous patterns, HTTP fundamentals, debugging, and service reliability concerns.”

Even then, however, strong portfolios tend to outweigh credentials, Strauss says. A clean repository with typed code, sensible architecture, tests, and clear commit history is a far more predictive indicator than a generic certificate,” he says.

Certifications can be influential in high-volume hiring, where recruiters need fast signals, or in regulated and enterprise contexts that prefer standardized training, George says. “It can also help when teams need specific competency areas, such as modern framework fundamentals, testing practices, secure coding basics, etc., and want a consistent baseline,” he says.

In-demand JavaScript certifications

CIW JavaScript Specialist. This certification covers core JavaScript topics such as functions, variables, Document Object Model (DOM) manipulation, form validation, event handling, AJAX/JSON asynchronous communication, and object-oriented programming. Candidates learn how to tackle complex scripting scenarios and debug code, according to Certification Partners, which owns and manages CIW certifications. The CIW JavaScript Specialist certification equips candidates with practical skills to build dynamic, interactive, and user-friendly websites using JavaScript, says Certification Partners.

JavaScript Algorithms and Data Structures. Administered by FreeCodeCamp, this is considered an ideal certification for JavaScript beginners. Candidates will learn the JavaScript fundamentals such as variables, arrays, objects, loops, functions, DOM, and more. They’ll also learn about object-oriented programming, functional programming, and algorithmic thinking.

JavaScript Developer Certificate. This certification validates proficiency in JavaScript programming, HTML DOM manipulation, and web development fundamentals. Before applying for the exam, candidates need to have a fundamental knowledge of JavaScript and HTML DOM, according to W3Schools, which offers the certification. The certification tests ability to manipulate HTML DOM and validates proficiency in dynamic website development.

JSA—Certified Associate JavaScript Programmer. This certification from the JS Institute, which is managed by OpenEDG, demonstrates a candidate’s proficiency in object-oriented analysis, design, and programming and the more advanced use of functions in the JavaScript language, according to the institute. Becoming JSA-certified ensures that individuals are acquainted with how JavaScript enables them to design, develop, deploy, refactor, and maintain JavaScript programs and applications.

JSE—Certified Entry-Level JavaScript Programmer. Also available from the JS Institute, this certification demonstrates a candidate’s understanding of the JavaScript core syntax and semantics, as well as their proficiency in using the most essential elements of the language, tools, and resources to design, develop, and refactor simple JavaScript programs. Becoming JSE-certified ensures that individuals are acquainted with the most essential means provided by the core JavaScript language, so they can work toward the intermediate level, the institute says.

Mimo JavaScript Certification. This certification verifies a candidate’s ability to build interactive features, manage data, and understand the logic that powers modern front-end and full-stack development, according to Mimo. It shows mastery of variables, functions, loops, conditionals, arrays, objects, and ES6 features, and provides hands-on experience solving coding challenges. Participants work through practical projects such as dynamic web pages, forms, and interactive user interface components.

OpenJS Node.js Application Developer (JSNAD). Created by the OpenJS Foundation, but retired in September 2025, the JSNAD certification demonstrates the ability to manage Node.js core modules, implement technical asynchronous logic, handle technical streams and buffers, and configure technical security protocols. Candidates were tested on solving complex back-end development problems and implementing scalable server-side applications in real world scenarios, according to Udemy, which still offers classes and practice exams related to JSNAD certification. The program explores the technical aspects of buffers, streams, and file system operations within the Node.js runtime, Udemy says.

Senior JavaScript Developer. Offered by Certificates.dev, the Senior JavaScript Developer certification validates mastery in JavaScript, including prototypes, inheritance, and performance optimization techniques, according to Certificates.dev. It demonstrates proficiency in advanced asynchronous programming, testing, and security vulnerability mitigation. It’s designed for experienced developers aiming for technical leadership roles.

(image/jpeg; 0.23 MB)

Meta’s compute grab continues with agreement to deploy tens of millions of AWS Graviton cores 24 Apr 2026, 10:47 pm

Meta is continuing its compute grab as the agentic AI race accelerates to a sprint.

Today, the company announced a partnership with Amazon Web Services (AWS) that will bring “tens of millions” of AWS Graviton5 cores (one chip contains 192 cores) into its compute portfolio, with the option to expand as its AI capabilities grow. This will make the Llama builder one of the largest Graviton customers in the world.

The move builds on Meta’s expansive partnerships with nearly every chip and compute provider in the business. It’s working with Nvidia, Arm, and AMD, as well as building its own internal training and inference accelerator chip.

“It feels very difficult to keep track of what Meta is doing, with all of these chip deals and announcements around in-house development,” said Matt Kimball, VP and principal analyst at Moor Insights & Strategy. This makes for “exciting times that tell us just how incredibly valuable silicon is right now.”

Controlling the system, not just scale

Graphics processing units (GPUs) are essential for large language model (LLM) training, but agentic AI requires a whole new workload capability. CPUs like Graviton5 are rising to this challenge, supporting intensive workloads like real-time reasoning, multi-step tasks, frontier model training, code generation, and deep research.

AWS says Graviton5 has the ability to handle “billions of interactions” and to coordinate complex, multi-stage agentic tasks. It is built on the AWS Nitro System to support high performance, availability, and security.

“This is really about control of the AI system, not just scale,” said Kimball. As AI evolves toward persistent, agentic workloads, the role of the CPU becomes “quite meaningful;” it serves as the control plane, handling orchestration, managing memory, scheduling, and other intensive tasks across accelerators.

“This is especially true in agentic environments, where the workloads will be less linear and more stateful,” he pointed out. So, ensuring a supply of these resources just makes sense.

Reflecting Meta’s diversified approach to hardware

The agreement builds on Meta’s long-standing partnership with AWS, but also reflects what the company calls its “diversified approach” to infrastructure. “No single chip architecture can efficiently serve every workload,” the company emphasized.

Proving the point, Meta recently announced four new generations of its MTIA training and inference accelerator chip and signed a massive deal with AMD to tap into 6GW worth of CPUs and AI accelerators. It also entered into a multi-year partnership with Nvidia to access millions of Blackwell and Rubin GPUs and to integrate Nvidia Spectrum-X Ethernet switches into its platform, and was also one of Arm’s first major CPU customers.

In the wake of all this, Nabeel Sherif, a principal advisory director at Info-Tech Research Group, posed the burning question: “What are they going to do with all this capacity?”

Primarily it will support Meta’s internal experimentation and innovation, he said, but it also lays the groundwork and provides the capacity for Meta to offer its own agentic AI services, for instance, its Llama AI model as an API, to the market.

“What those [services] will look like and what platforms and tools they’ll use, as well as what guardrails they’ll provide to users, is still unclear, but it’s going to be interesting to see it develop,” said Sherif.

The expanded capacity will enable a diversity of use cases and experimentation across various architectures and platforms, he said. Meta will have many options, and access to supply in an environment currently characterized not only by a wide variety of new CPU approaches, but by significant supply chain constraints. The AWS deal should be viewed as a complement to its partnerships and investments in other platforms like ARM, Nvidia, and AMD.

Kimball agreed that the move is “most definitely additive,” not a replacement or substitution. Meta isn’t moving off GPUs or accelerators, it’s building around them. “This is about assembling a heterogeneous system, not picking a single winner,” he said. “In fact, I think for most, heterogeneity is critical to long term success.”

Nvidia still dominates training and a lot of inference, while AMD is becoming “more and more relevant at scale,” Kimball noted. Arm, meanwhile, whether through CPU, custom silicon or other efforts, gives Meta architectural control, and Graviton5 fits into that mix as a “cost- and efficiency-optimized general-purpose compute layer.”

A question of strategy

The more interesting question is around strategy: Does this signal Meta is becoming a compute provider? Kimball doesn’t think so, noting that it’s likely the company isn’t looking to directly compete with hyperscalers as a general-purpose cloud. “This is more about vertical integration of their own AI stack,” he said.

The move gives them the ability to support internal workloads more efficiently, as well as providing the infrastructure foundation to expose more of that capability externally, whether through APIs, partnerships, or other means, he said.

And there’s a cost dynamic here, too, Kimball noted. As inference becomes persistent, especially with agentic systems, economics shift away from peak floating-point operations per second (FLOPS) (a measure of compute performance) and toward sustained efficiency and total cost of ownership (TCO).

CPUs like Graviton5 are well positioned for the parts of that workload that don’t require accelerators, but still need to run continuously. “At Meta’s scale, even small efficiency gains per workload compound quickly,” Kimball pointed out.

For developers and enterprise IT, the signal is pretty clear, he noted: The AI stack is getting more heterogeneous, not less so. Enterprises are going to see tighter coupling between CPUs, GPUs, and specialized accelerators, with workloads increasingly split across them based on behavior (prefill versus decode, stateless versus stateful, burst versus persistent).

“The implication is that infrastructure decisions have to become more workload-aware,” said Kimball. “It’s less about ‘which cloud?’ and more about ‘where does this specific part of the application run most efficiently?’”

This article originally appeared on NetworkWorld.

(image/jpeg; 3.67 MB)

Germany’s sovereign AI hope changes hands 24 Apr 2026, 6:33 pm

As Europe seeks to assert its technological independence from the US vendors Aleph Alpha, once seen as Germany’s sovereign AI hope, is the target of a transatlantic takeover.

Aleph Alpha is set to merge with Canada’s Cohere in a deal that will bring together Cohere’s global AI clout and Aleph Alpha’s background in research. The two companies hope they will be able to develop an AI powerhouse, with backing from their Canadian and German ecosystems

“Organizations globally are demanding uncompromising control over their AI stack. This transatlantic partnership unlocks the massive scale, robust infrastructure, and world-class R&D talent required to meet that demand,” said ” said Cohere CEO Aidan Gomez in a news release that artfully presents the deal as a merger of equals but that, according to a footnote, only requires the approval of the German company’s shareholders, a sure sign of a one-sided takeover.

The combined companies will be looking to offer customized AI in highly-regulated sectors including finance, defense and healthcare. By pooling their talents and offerings, theu hope to offer AI solutions to organizations according to local laws, cultural contexts, and institutional requirements.

The move comes at a time when businesses across the word are looking at non-US options as a reaction to the Trump administration’s policy on tariffs and the uncertainty caused by the war with Iran.

There have been several initiatives within Europe to counteract the US dominance. The EU’s Eurostack plan looked to make sure that major projects had a European option. Aleph Alpha was one of the companies highlighted within the scheme. The EU also launched Open  Euro LLM, an attempt to slow down the US and China’s lead in AI.

This article first appeared on CIO.

(image/jpeg; 4.97 MB)

Former OpenAI research scientist launches new AI model for Tencent 24 Apr 2026, 5:22 pm

Tencent has updated its Hunyuan AI model, its first major release since it recruited Yao Shunyu, a leading AI scientist from OpenAI. Tencent’s Hy3 model, currently available in preview, offers improvements in areas from complex reasoning to coding.

The Chinese technology conglomerate is playing catch-up with other Chinese AI developers including ByteDance, Alibaba and DeepSeek. China is betting big on open-source AI to offer alternatives to major US players. Back in 2023, Tencent claimed its then-new Hunyuan LLM was a more powerful and intelligent option than the versions of ChatGPT and Llama available at the time.

Tencent has backed AI start-ups including Moonshot AI and StepFun, hoping that they will boost its cloud computing division. The company has also restructured its research team to improve the quality of training data. It aims to double its investment in AI to more than $5 billion this year.

Not to be outdone, DeepSeek announced its V4 Flash and V4 Pro Series, the newest versions of its LLM model. DeepSeek became an overnight hit in January 2025 with the launch of its R1 AI model and has gone on to develop other models since. It said the V4 model upgrades will offer users advances in reasoning and agentic tasks, while a new feature called Hybrid Attention Architecture improves the ability of the AI platform to remember queries across long conversations.

(image/jpeg; 0.54 MB)

Where to begin a cloud career 24 Apr 2026, 9:00 am

Cloud computing is a key foundation of modern business, yet many approach learning it in overly complicated ways. New learners often believe they need costly boot camps, certification bundles, or long technical courses before they can get a job in the field. This approach discourages potential entrants and creates false barriers around a discipline that’s already difficult to navigate. The truth is simpler: Free cloud courses are among the best starting points because they reduce risk, boost confidence, and introduce learners to the vocabulary, platforms, and models that define the cloud era.

The first stage of any cloud journey is orientation, not mastery. People need to understand what cloud computing means in practice, how infrastructure differs from platform services, why elasticity matters, where governance fits, and how major providers organize their offerings. A well-rounded exposure to the depth and breadth of a subject matter is a proven benefit of traditional higher education models. However, the traditional approach to course investigation and experimentation has become too costly for many prospective cloud students.

Free courses allow learners to explore without financial pressure. If someone discovers they prefer architecture over administration, or are more drawn to security than to development, the learning experience still creates value. Nothing is wasted.

Do cloud providers offer their free foundational learning as a charity? Of course not. This is ecosystem development. When AWS, Microsoft, Google Cloud, Oracle, IBM, and others publish free learning paths, they are hoping to cultivate future architects, engineers, analysts, and decision-makers.

Where to start

Start your cloud career by reviewing free introductory courses from vendors or leading educational platforms. These courses aim not only to teach features but also to foster the mental models providers want new users to develop. Free courses are also a good starting point because they let learners compare platforms before choosing one. This matters because beginners often hear strong opinions about the “best” cloud provider. The better question is which provider suits a learner’s goals. Someone focused on enterprise Windows may prefer Microsoft Azure. Those interested in startups or market demand may choose AWS. Those focused on data, analytics, or Kubernetes may opt for Google Cloud. Free courses enable this comparison without forcing early specialization.

The list below is a practical starting point. It includes reputable, free or free-to-start courses and learning paths from major providers and platforms. Some offer entirely free access to core material, while others allow enrollment at no cost but may charge separately for certificates or advanced lab features. That distinction is not a drawback for beginners. Early on, what matters most is understanding the landscape, learning the terminology, and gaining enough fluency to decide what should come next.

Courses from AWS

Courses from Microsoft Learn

Courses from Google Cloud Skills Boost

Courses from IBM

Course from Oracle University/Learn Oracle

Course from Alibaba Cloud Academy

Why free courses work so well

Effective courses aren’t just about price; they’re about structure. Good introductory cloud courses progress from concepts to examples to platform navigation, teaching learners to think about regions, zones, VMs, storage, identity, networking, and managed services before actual implementation skills are required. Many new learners fail by jumping into tools too soon. They try to deploy before they can explain. Free foundation courses avoid this by establishing context first, making hands-on learning more effective.

People entering the cloud market from nontraditional backgrounds should note that not all future cloud professionals need coding skills. Many successful cloud careers start in systems administration, security, project delivery, business analysis, operations, data management, or technical sales. Free courses help by focusing on concepts and platform literacy rather than deep engineering, making the field more accessible. This accessibility is a strength, helping cloud expand across industries.

Treat free courses as a starting point in a broader strategy, not the whole journey. They provide a good foundation. For example, you could start with an IBM overview, followed by AWS or Azure fundamentals to gain familiarity with a major provider, then Google Cloud to expand horizons. Next, engage in hands-on labs, architecture diagrams, small deployments, and role-based learning in areas like security, networking, AI, data engineering, or finops. Free courses are the launch point, not the end point.

The best first move

Commit. Begin your cloud journey this week. Start with one foundational course from a major provider and finish it. Then take a second course from a different provider to compare terminology, service models, and user experience. That simple approach gives you momentum, perspective, and a much clearer sense of where to invest your time next.

Procrastination is not your friend in an industry that evolves quickly and rewards curiosity. Don’t pay for depth before you earn breadth. Also realize that as the cloud industry matures and skills shortages ease, the availability of these free courses is likely to decline. Carpe diem.

(image/jpeg; 10.18 MB)

Why world models are AI’s next frontier 24 Apr 2026, 9:00 am

To many people, AI manifests one of sci-fi’s central plot points: built intelligence or machines that think and act independently of a human supervisor. But from my perspective, we haven’t quite achieved the true fulfilment of that vision.

For this reason, many thought leaders describe world models as AI’s next big paradigm shift. These models learn from the full physical environment — synthetic or real — and can understand the spatial and physics complexities of worlds, unlike LLMs, which are restricted to language and images.

AMI’s Yann LeCun is such a strong believer that he quit his role as chief AI scientist at Meta to found his own organization to advance world models. “I’ve not been making friends in various corners of Silicon Valley, including at Meta, saying that within three to five years, [world models] will be the dominant model for AI architectures, and nobody in their right mind would use LLMs of the type that we have today,” LeCun said.

Apparently, LLMs have achieved groundbreaking results. They can only improve with more compute and more data, which are increasingly expensive, unwieldy and deliver only diminishing returns.

World models are the vital prerequisite for AGI

I believe that world models have the potential to actualize many of the capabilities that sci-fi dreams about.

To truly achieve artificial general intelligence (AGI), world models will need to go beyond pattern recognition to capture how the world actually works. A system capable of general reasoning must understand relationships – physical, social and causal — well enough to transfer knowledge between unfamiliar situations.

Without that holistic perspective, a model may perform impressively when conditions perfectly match those described in its training, yet it will fail when those conditions suddenly change. To be effective “generally,” AI needs the ability to revise its internal understanding when it encounters new situations.

A comprehensive world model allows an agent to simulate outcomes, reason about constraints and adapt to new environments, turning static predictions into flexible problem-solving.

With the right levels of adaptability, an agent can update its beliefs, reinterpret context and devise new strategies rather than relying on static rules. This capacity mirrors human intelligence, where prior knowledge is continuously reshaped to handle new situations, from learning unfamiliar technologies to navigating entirely new cultures.

After all, real-world decisions are rarely isolated. Actions interact with physics, timing, goals and human behavior, all at once. To plan effectively, an AGI must anticipate consequences, identify causation, and integrate knowledge across domains. Replicating humans’ integrated understanding and open-ended problem solving is what separates narrow intelligence from general intelligence.

A world model is worlds apart from an LLM

In short, world models provide AI with common sense to understand how things operate in a given environment — and what might happen if conditions or objects are altered.

For example, Meta’s JEPA was built towards this goal, focusing on predicting abstract representations rather than raw pixels, and it serves as a key building block for future world models.

Large language models, or LLMs, seem very powerful today, but they are dwarfed by world models. World models are multimodal AI models that are self-learning, capable of general reasoning and spatially aware. LLMs are just very good at predicting what comes next in a pattern.

Here’s my take on the main differences between a world model and an LLM:

  • Learning methods. World models use continuous reinforcement learning to train themselves by observing their environment and inferring missing data, such as the PlaNet model-based reinforcement learning system. In contrast, LLMs are inefficient and require extensive training on massive datasets.
  • Spatial awareness. World models like Genie 3 interact dynamically with multidimensional environments, enabling them to imagine and generate 3D, 4D and 5D visualizations of consistent, interactive worlds. LLMs, on the other hand, don’t have any awareness of space.
  • Deep understanding. World models extrapolate from partial information to understand concepts like cause and effect and object permanence, whereas LLMs are limited by a shallow understanding of the world. They can predict the next word based on learned patterns, but they don’t understand what that word means.
  • Long-term planning. By executing thousands of simulations, agents like those based on the DreamerV3 model can find the optimal sequence to achieve a goal, allowing them to plan for different contingencies and make informed decisions in new circumstances. LLM long-term planning, on the other hand, is fragile and unreliable.
  • Multimodal inputs and outputs. World models are able to consume inputs in diverse forms, and also produce outputs in many different modes. For example, World Labs’ Marble is a multimodal world model that can reconstruct and simulate 3D environments from still images. LLMs are restricted to 2D inputs and outputs.

How does a world model work?

A world model is made up of three connected modules:

  • The perception module. This section takes raw sensory inputs such as images, video and proprioception and encodes them into a compact latent representation of the environment.
  • The prediction module. This is a dynamics model which handles probability distribution and captures causality and temporal structure. It probabilistically predicts the next latent state and the expected results of any actions.
  • The planning (control) module. This module uses the output of the prediction model to simulate future trajectories and select actions that optimize achievements towards a goal.

“At its core, a world model is an internal representation that an AI system constructs to simulate the external environment. By continuously processing sensory data, a robot builds a dynamic blueprint of its surroundings,” explains Aurorain founder Luhui Hu. “This fusion of perception, prediction and planning mirrors cognitive processes in humans, setting the stage for more advanced robotic behavior.”

World models open up immense possibilities

There seem to be almost no limits to the potential waiting within world models, even if we set aside AGI aspirations for the moment. Here are just a few of the many ways world models could impact our lives.

Immersive visual experiences

With world models, it is finally becoming possible to build convincing worlds that you can interact with and experience. These are the very first capabilities that are coming on line, thanks to models like those developed by Decart, which can even be used as playable, game engine-free simulations.

“Because what’s running your game or your environment is an AI, you can interact with it in the ways we’re used to interacting with AI,” says Dean Leitersdorf, Decart’s CEO and cofounder.

“You’d be able to say, ‘Hey, can you turn this into Elsa themed?’ And then, boom, everything becomes Elsa-themed. ‘And can you add a flying elephant?’ And there’s a flying elephant in the game. And it’s not just there as a picture. You can actually interact with it. You can, I don’t know, punch the elephant, it’ll punch you back, or whatever you can do with an elephant.”

Fast iteration for innovations

Interactive, consistent world generation has consequences that go far beyond entertainment.

Models like Marble and Oasis that can generate persistent, downloadable 3D environments from text prompts, photos, videos, 3D layouts, or panoramic images currently focus on gaming and VR, but they also open the door to robotics training in simulated environments.

Multi-dimensional computational modeling enables use cases like exploring molecular chemistry, developing novel biomedical treatments, probing the makeup of the universe, designing earthquake-proof buildings, understanding complex climate patterns and researching new materials.

Video that obeys real-world laws

Among the use cases for world models that are most exciting to me, creating hyper-realistic AI-generated video certainly stands out as especially compelling.

As AI systems improve their understanding of physical dynamics, the distinction between video generation and world models is becoming less clear.

Runway’s GWM-1 general world model is a good example. It simulates reality through autoregressive, frame-by-frame video generation, a step Runway positions toward “general world models” that fully replicate the physics of simulated environments. Luma AI’s Modify Video has a similar goal.

Safer, more accurate decisions

Because world models can extrapolate from partial information, rapidly simulate many possible outcomes of multiple decisions and accurately forecast consequences, they can significantly improve decision-making across a wide range of use cases.

Possibilities include complicated multi-factor economic modeling, understanding climate patterns that are currently unpredictable and supporting complex long-term planning for regional and international policy decisions.

They also improve the safety of self-driving cars by enabling them to predict the outcomes of actions, such as changing lanes, to avoid collisions.

Realistic robots

Robots that serve as lab assistants, carers, 24/7 industrial workers and explorers in inaccessible and/or hostile environments are an old sci-fi dream. World models can help overcome a serious ongoing obstacle to making “physical AI” possible: the lack of relevant training data.

NVIDIA’s Cosmos platform 2.5 was built to predict and generate physics-aware videos of future environment states, producing synthetic training data generation for autonomous vehicles and robotics at massive scale.

“Unlike language models, training data is scarce for today’s robotic research. World models will play a defining role in this,” says Fei-Fei Li, CEO and founder of World Labs. “As they increase their perceptual fidelity and computational efficiency, outputs of world models can rapidly close the gap between simulation and reality. This will in turn, help train robots across simulations of countless states, interactions and environments.”

World models rest at AI’s next frontier

With so much power and so many possibilities, world models promise to take AI technology to a great leap forward beyond LLMs, making some of our long-held sci-fi wishes come true.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 3.25 MB)

The agentic AI frenzy increases as more vendors stake their claims 24 Apr 2026, 1:17 am

The AI agent introduction frenzy continued at a torrid pace this week, with OpenAI launching what it called workspace agents in ChatGPT and Microsoft adding hosted agents to its Foundry Agent Service.

Both launched on the same day that Google both updated its Gemini Enterprise app to provide new ways for office workers to build, manage, and interact with AI agents, and launched the Gemini Enterprise Agent Platform, which the company said is designed to build, scale, govern, and optimize agents.

This trio of offerings follows Anthropic’s early April introduction of Claude Managed Agents, a suite of composable APIs for building and hosting cloud-hosted agents, which is now in public beta.

In its announcement, OpenAI said, “workspace agents are an evolution of GPTs. Powered by Codex, they can take on many of the tasks people already do at work—from preparing reports, to writing code, to responding to messages. They run in the cloud, so they can keep working even when you’re not. They’re also designed to be shared within an organization, so teams can build an agent once, use it together in ChatGPT or Slack, and improve it over time.”

Microsoft, meanwhile, stated in a blog that its latest move “brings agent-optimized compute and services designed for production-grade enterprise agents.” After its preview of hosted agents last year at Microsoft Ignite, the company said, “this refresh is a fundamentally different experience: secure per-session sandboxes with filesystem persistence, integrated identity, and scale-to-zero economics.”

Announcements are connected

Jason Andersen, principal analyst at Moor Insights & Strategy, said, “these four announcements are connected, as the frenzy around agents continues. What OpenAI is announcing is the native ability to support the creation and sharing of agents.”

This is new functionality for OpenAI, which is a bit late to the game; Google, Microsoft, Anthropic and others have had this capability for some time, and are in fact moving farther ahead with these other announcements, he said.

 “What we are seeing with Anthropic and Microsoft is that, as agents become more powerful, they will go to great lengths to solve the problem they are posed with, and sometimes that includes the agent writing code and doing other tasks,” he pointed out. “This increases complexity and concerns about agents and models being well managed while running. The hosting options both of these vendors provide are a more advanced infrastructure for agents to run.”

Right now, he added, “many agents are being treated as simply a more advanced front end. These newer options provide the ability for an agent to do things like spin up a dedicated container, and they can support semi-autonomous and, in some cases, autonomous operations. These two announcements are more infrastructure-related, whereas OpenAI is more about agent building.”

He described the Google launch as being “something in between.”

He noted, “OpenAI’s announcement is very similar to last year’s announcement of Gemini Enterprise from Google. This year, Google took steps forward to enable a management control plane for agents called Gemini Enterprise Agent Platform, which enables a much richer sharing experience and a number of management and governance capabilities.”

On the whole, Andersen said, “the agent space is getting very hot, and some who have been later to the party are getting on board, and those who have been investing are evolving to provide end customers more scale, operations, and security capabilities.”

Brian Jackson, principal research director at Info-Tech Research Group, said that with the flurry of announcements “we’re seeing a race to gain critical mass as the agentic platform becomes the daily work interface for the enterprise. Anthropic and OpenAI are coming at it from their AI startup positioning, while Google, Microsoft, and Amazon are leveraging their entrenched hyperscaler and enterprise platform positions.”

Jackson pointed out that the differentiation in what these tech firms offer is most clear in who they are targeting and their delivery model.

He noted that OpenAI’s Workspace Agents are designed for non-technical business teams. They provide templates for agents that can automate tasks from lead scoring to vendor research reports. Users can “prompt” their way to work automation without worrying about the behind-the-scenes mechanics – what model is being used, what APIs are being called, how data is retrieved and written, or how permissions are granted.

Anthropic is taking a different approach, he said. Rather than going directly to business users, it is providing tools to enterprise development teams to build their own agents and provide a custom interface to their users. Anthropic’s Managed Agents are a group of composable APIs that developers can use. The approach is more flexible, but it requires more effort to produce value.

Microsoft and Google, on the other hand, are both vertically integrated platforms providing an agentic layer on top of an extensive stack. Microsoft’s Foundry is similar to Anthropic’s offering, but offers even more flexibility by remaining model-agnostic and allowing developers to choose their preferred agentic framework.

New problems as the market develops

As the agentic platform market develops, Jackson observed, “we are seeing new problems crop up regarding observability. Detecting and observing agents will be rooted in the identity system used to provision them. However, since each platform uses its own identity system, it will be difficult for any one platform to see all agents created in an enterprise, or worse, those created by a rogue user (‘Shadow AI’).”

Furthermore, he added, “agentic workflows imply significantly higher AI token consumption to complete work. We are already seeing AI capacity constraints and price increases due to high demand. Because agents require multiple ‘reasoning’ steps to complete a single task, it is very hard to predict what a workflow you automate today might cost to run one year from now.”

This means that IT leaders need to decide where they will build the agentic layer of their stack. “You don’t want to get it wrong, because becoming entrenched in one platform means significant vendor lock-in,” he said. “We already worry about lock-in with systems and data, but when you add an intelligence layer, you are essentially building a brain with neuronal pathways to your workflows. It is not going to be easy to do a ‘brain transplant’ to another platform later.”

(image/jpeg; 2.19 MB)

Google pitches Agentic Data Cloud to help enterprises turn data into context for AI agents 23 Apr 2026, 4:49 pm

Google is recasting its data and analytics portfolio as the Agentic Data Cloud, an architecture it says is aimed at moving enterprise AI from pilot to production by turning fragmented data into a unified semantic layer that agents can reason over and act on more reliably at scale.

The new architecture builds on Google’s existing data platform strategy, bringing together services such as BigQuery, Dataplex, and Vertex AI, and elevating their capabilities in metadata, governance, and cross-cloud interoperability into what the company describes as a shared intelligence layer.

That intelligence layer strategy is underpinned by the new Knowledge Catalog, an evolution of Dataplex Universal Catalog, that the company said uses new capabilities to extend its metadata foundation into a semantic layer mapping business meaning and relationships across data sources.

These capabilities include native support for third-party catalogs, applications such as Salesforce, Palantir, Workday, SAP, and ServiceNow, and the option to move third-party data to Google’s lakehouse, which automatically maps the data to Knowledge Catalog.

To capture business logic more directly for data stored inside Google Cloud, the company is adding tools including a LookML-based agent, currently in preview, that can derive semantics from documentation, and a new feature in BigQuery, also in preview, that allows enterprises to embed that business logic for faster data analysis.

Beyond aggregation, the catalog itself is designed to continuously enrich semantic context by analyzing how data is used across an enterprise, senior google executives wrote in a blog post.

This includes profiling structured datasets as well as tagging and annotating unstructured content stored in Google Cloud Storage, the executives pointed out, adding that the catalog’s underlying system can also infer missing structure in data by  using its Gemini models to generate schemas and identify relationships.

Turning data into business context the next battleground for AI

For analysts, Google’s focus on semantics targets one of the biggest barriers to production AI for enterprises.

“The hardest AI problem is inconsistent meaning,” said Dion Hinchcliffe, lead of the CIO practice at The Futurum Group, noting that a unified semantic layer could help CIOs establish consistent business context across systems while reducing the need for developers to manually stitch together metadata and lineage.

That focus on semantic context also reflects a broader shift in how hyperscalers are approaching enterprise AI. Microsoft with Fabric IQ and AWS with Nova Forge are pursuing similar strategies, building semantic context layers over enterprise data to make AI systems more consistent and easier to operationalize at scale.

While Microsoft’s approach is to wrap AI applications and agents with business context and semantic intelligence in its Fabric IQ and Work IQ offerings, AWS want enterprises to blend business context into a foundational LLM by feeding it their proprietary data.

Mike Leone, principal analyst at Moor Insights and Strategy, said Google’s approach, though closer to Microsoft’s, places the data gravity one layer above the lakehouse, within its data catalog and semantic graph capabilities.

“Google and Microsoft are solving the same problem from different angles, Fabric through a unified data foundation and Google through a unified semantic and context layer,” Leone said.

Even data analytics software vendors are converging on the idea of offering a catalog that can map semantic context from a variety of data sources, Leone added, pointing to Databricks’ Unity Catalog and Snowflake’s Horizon Catalog.

Semantic accuracy could pose challenges for CIOs

However, Google’s approach to building an intelligent semantic layer, especially its evolved Knowledge Catalog, comes with its own set of risks for CIOs.

The new catalog’s automated semantic context refinement capability, according to Jim Hare, VP analyst at Gartner, could amplify governance challenges, especially around metadata management: “In complex enterprise domains, errors in inferred relationships or definitions will require ongoing human domain oversight to maintain trust.”

Hare also warned of operational and cost management challenges.

“Agent-driven workflows spanning analytical and operational data, potentially across clouds, will introduce new challenges in observability, debugging, and cost predictability,” he said. “Dynamic agent behavior can generate opaque consumption patterns, requiring chief data and analytics officers (CDAOs) to closely manage cost attribution, usage limits, and operational guardrails as these capabilities mature.”

Adopting Google’s new architectural approach could increase dependence at the orchestration layer, resulting in issues around portability, he warned: “Exiting Google-managed semantics, Gemini agents, or BigQuery abstractions may be harder than migrating data alone.”

Bi-directional federation as strategic play

Even so, the trade-offs may be acceptable for enterprises prioritizing tighter data integration over flexibility.

As part of the new architecture, Google is also offering cross-platform data interoperability via the Apache Iceberg REST Catalog that it says will allow bi-directional federation, in turn letting enterprises access, query, and govern data across environments such as Databricks, Snowflake, and AWS without requiring data movement or cost in egress fees.

For Stephanie Walter, practice leader of the AI stack at HyperFRAME Research, this interoperability will be strategically important for enterprises scaling agents in production, especially ones that have heterogenous data environments.

Moor Insights and Strategy’s Leone, though, sees it as a different strategic play to address enterprises’ demand to access Databricks, Snowflake, and hyperscaler environments without costly data movement.

Google’s Agentic Data Cloud architecture also includes a Data Agent Kit, currently in preview, which the company says is designed to help enterprises build, deploy, and manage data-aware AI agents that can interact with governed datasets, apply business logic, and execute workflows across systems.

Robert Kramer, managing partner at KramerERP, said the Data Agent Kit will help data practitioners abstract t daily tasks, in turn lowering the barrier to operationalizing agentic AI across workflows.

However, Gartner’s Hare warned that enterprises should guard against over delegating critical data management decisions to automated agents without sufficient observability, validation controls, and human review, particularly where downstream AI systems depend on these agents for continuous data operations.

(image/jpeg; 21.15 MB)

Offer customers passkeys by default, UK’s NCSC tells enterprises 23 Apr 2026, 1:12 pm

The UK’s National Cyber Security Centre (NCSC) is recommending passkeys as the default authentication method for businesses to offer consumers, citing industry progress that now makes them a more secure and user-friendly alternative to passwords.

In a blog post published this week, the agency said passkeys can now be recommended to both the public and businesses as a primary authentication method.

“Passkeys should now be consumers’ first choice of login,” the UK cybersecurity authority said in a blog post, adding that passwords are “no longer resilient enough for the contemporary world.”

“Passkeys are a newer method for logging into online accounts which do much of the heavy lifting for users, only requiring user approval rather than needing to input a password. This makes passkeys quicker and easier to use and harder for cyber attackers to compromise,” the NCSC added in the blog.

The agency said passkeys should be used wherever supported, describing them as resistant to phishing and eliminating risks associated with password reuse.

Focus on phishing-resistant authentication

The guidance is based on the agency’s assessment of how authentication methods perform against real-world attacks.

The NCSC said its analysis examines common techniques, including phishing, credential reuse, and session hijacking, and evaluates how credentials are exposed across their lifecycle, from creation and storage to use.

“Passkeys are resistant to phishing attacks and remove the risks associated with password reuse,” the agency said.

In its accompanying technical paper, the NCSC said traditional authentication methods, including passwords combined with one-time codes, remain “inherently phishable.”

By contrast, FIDO2-based credentials such as passkeys are “as secure or more secure than traditional MFA against all common credential attacks observed in the wild,” the agency said.

However, NCSC cautioned in the technical paper that “while much of the analysis in this paper also applies to enterprise authentication scenarios (for example staff authenticating to a Single Sign On), the different threat model and usage scenarios mean this paper is not intended for enterprise risk assessment.”

How passkeys change the attack model

The NCSC added that passkeys reduce risk by removing reliance on shared secrets and binding authentication to the legitimate service.

According to the agency, this prevents credential reuse and relay attacks, as authentication cannot be intercepted and reused by an attacker.

Passkeys use cryptographic key pairs stored on a user’s device, with authentication tied to device-based verification such as biometrics or PINs, the agency said.

Shift in user-level authentication

For organizations that provide online services to customers, the guidance signals a shift in how authentication is implemented at the user interface level.

“This is a fundamental architectural change, not an incremental authentication upgrade,” said Madelein van der Hout, senior analyst at Forrester. “It moves organizations beyond the passwords-plus-MFA paradigm toward a phishing-resistant foundation.”

Van der Hout said passkeys eliminate risks associated with credential theft by using device-bound cryptographic authentication rather than shared secrets.

“Organizations that treat this as a credential swap will underinvest,” she said. “Those who treat it as a broader identity modernization opportunity will get ahead.”

The NCSC said organizations should also consider how authentication is implemented across the full user journey, including account recovery and fallback mechanisms.

While passkeys reduce reliance on passwords, the agency noted that weaker processes, such as password resets or account recovery flows, can still introduce risk if not properly secured.

Adoption challenges remain

The NCSC said passkeys are not yet universally supported and recommended password managers and multi-factor authentication where passkeys cannot be used.

“Where a particular service does not support passkeys, the NCSC’s advice to consumers is to use a password manager to create stronger passwords and keep using two-step verification,” NCSC noted in the blog post.

Van der Hout said implementation challenges are likely, particularly for organizations operating across multiple platforms and user environments.

“Legacy systems and fragmented identity environments present significant obstacles,” she said.

She added that organizations must also consider non-human identities. “Any passkey strategy that ignores the machine identity layer will create new security gaps,” she said.

Device requirements and account recovery processes may also affect how passkeys are deployed, she said.

Hybrid model is expected during the transition

A full transition away from passwords is unlikely in the near term, analysts believe.

“Expect a hybrid model lasting several years,” van der Hout said, as organizations continue to support both passkeys and traditional authentication methods.

During this period, organizations will need to manage authentication across multiple login options while ensuring that fallback methods do not weaken overall security, she added

The NCSC similarly advised maintaining strong authentication practices where passkeys are not yet available.

Policy signal strengthens shift toward passwordless login

The guidance adds to broader efforts to move away from passwords in consumer authentication.

“The guidance matters because it gives security leaders leverage,” van der Hout said, including in discussions with vendors and internal stakeholders.

The NCSC said that moving toward phishing-resistant authentication could reduce a major cause of cyber compromise, particularly in services that rely on user login credentials.

The article originally appeared in CSO.

(image/jpeg; 11.24 MB)

Microsoft taps Anthropic’s Mythos to strengthen secure software development 23 Apr 2026, 9:28 am

Microsoft plans to integrate Anthropic’s Mythos AI model into its Security Development Lifecycle, a move that suggests advanced generative AI is beginning to play a direct role in how major software vendors identify vulnerabilities and harden code against attack.

The company said it will use Mythos Preview, along with other advanced models, as part of a broader push to strengthen secure coding and vulnerability detection earlier in the software development process.

The announcement comes as Anthropic’s Mythos heightens concerns that advanced AI models could dramatically shrink the time between finding a software flaw and exploiting it. Analysts say Mythos marks a notable leap in AI-driven vulnerability research, with the ability to uncover thousands of serious flaws across major operating systems and browsers.

OpenAI has also entered the space with GPT-5.4-Cyber, a version of its flagship model tailored for defensive cybersecurity work. Keith Prabhu, founder and CEO of Confidis, said a future OpenAI model, which he referred to as “Spud,” could emerge as an even stronger rival.

The move matters beyond Microsoft’s own engineering organization. For enterprise security leaders, it offers a clear sign that frontier AI models are starting to move from experimental use into core cybersecurity workflows.

That could change how software vendors build products and how defenders view the risks and benefits of using the same AI tools attackers may also exploit.

“This marks a seminal turning point in the secure software development lifecycle process,” Prabhu said. “While earlier tools were only capable of static code scanning for vulnerabilities, with AI, there is a possibility of a dynamically learning model which can also perform dynamic vulnerability and even penetration testing in real time.”

Over time, Prabhu said, the pressure to adopt AI-assisted security tools is likely to spread beyond the largest software vendors.

Why Microsoft’s move matters

Neil Shah, vice president for research at Counterpoint Research, said more than 95% of Fortune 500 companies use Microsoft Azure in some capacity, while Azure AI and the Copilot suite are entrenched across about 65% of those companies. Millions of businesses also rely on multiple Microsoft products and cloud services.

“Using Mythos in Microsoft’s Security Development Lifecycle could help strengthen and harden products like Windows, Azure, Microsoft 365, and developer tools,” Shah said. “Every enterprise running those products could benefit from the security improvement without needing direct Mythos access themselves.”

Prabhu noted that Microsoft said it had evaluated Mythos using its open-source benchmark for real-world detection engineering tasks, with results showing substantial improvements over prior models.

“Such a claim coming from Microsoft does suggest that these new AI models are becoming materially better at identifying exploitable flaws than earlier generations,” Prabhu added. “However, as with any AI tool, the strength of the tool lies in its ability to analyze code quickly based on past learning. There is a possibility that it could miss new types of vulnerabilities that only a ‘human-in-the-loop’ could identify.”

The article originally appeared in CSO.

(image/jpeg; 9.91 MB)

How I doubled my GPU efficiency without buying a single new card 23 Apr 2026, 9:00 am

Late last year I got pulled into a capacity planning exercise for a global retailer that had wired a 70B model into their product search and recommendation pipeline. Every search query triggered an inference call. During holiday traffic their cluster was burning through GPU-hours at a rate that made their cloud finance team physically uncomfortable. They had already scaled from 24 to 48 H100s and latency was still spiking during peak hours. I was brought in to answer a simple question: Do we need 96 GPUs for the January sale or is something else going on?

I started where I always start with these engagements: profiling. I instrumented the serving layer and broke the utilization data down by inference phase. What came back changed how I think about GPU infrastructure.

During prompt processing — the phase where the model reads the entire user input in parallel — the H100s were running at 92% compute utilization. Tensor cores fully saturated. Exactly what you want to see on a $30K GPU. But that phase lasted about 200 milliseconds per request. The next phase, token generation, ran for 3 to 9 seconds. During that stretch the same GPUs dropped to 30% utilization. The compute cores sat idle while the memory bus worked flat out reading the attention cache.

We were paying H100-hour rates for peak compute capability and getting peak performance for roughly 5% of every request’s wall time. The other 95% was a memory bandwidth problem wearing a compute-priced GPU.

The pattern hiding in plain sight

Once I saw it, I couldn’t unsee it. LLM inference is two workloads pretending to be one. Prompt processing (the industry calls it prefill) is a dense matrix multiplication that lights up every core on the chip. Token generation (decode) is a sequential memory read that touches a fraction of the compute. They alternate on the same hardware inside the same scheduling loop. I’ve worked on carrier-scale Kubernetes clusters and high-throughput data pipelines, and I’ve never seen a workload profile this bimodal running on hardware this expensive.

If you ran a database this way — provisioning for peak write throughput and then using the server 90% of the time for reads — you’d split, it into a write primary and read replicas without a second thought. But most teams serving LLMs haven’t made that connection yet.

The monitoring tools make it worse. Every inference dashboard I looked at reported a single “GPU utilization” number: The average of both phases blended together. Our cluster showed 55%. Looks fine. Nobody panics at 55%. But 55% was the average of 92% for a few hundred milliseconds and 30% for several seconds. The dashboards were hiding a bimodal distribution behind a single number.

Researchers at UC San Diego’s Hao AI Lab published a paper called DistServe at OSDI 2024 that laid out the problem with numbers I could have pulled from my own profiling. Their measurements on H100s showed the same pattern: Prefill at 90–95% utilization, decode at 20–40%. They also proposed the fix.

Splitting the work in two

The fix is called disaggregated inference. Instead of running both phases on the same GPU pool you stand up two pools: One tuned for compute throughput (prompt processing) and one tuned for memory bandwidth (token generation). A routing layer in front sends each request to the right pool at the right time and the attention cache transfers between them over a fast network link.

When I first proposed this to the customer, they were skeptical. Two pools mean more operational complexity. A cache transfer protocol adds a network dependency that monolithic serving doesn’t have. Fair objections. So, I pointed them at who’s already running it.

Perplexity built their entire production serving stack on disaggregated inference using RDMA for cache transfers. Meta runs it. LinkedIn runs it. Mistral runs it. By early 2026 NVIDIA shipped an orchestration framework called Dynamo that treats prefill and decode as first-class pool types. The open-source engines — vLLM and SGLang — both added native disaggregated serving modes. Red Hat and IBM Research open-sourced a Kubernetes-native implementation called llm-d that maps the architecture onto standard cluster management workflows.

This isn’t a research prototype waiting for someone brave enough to try it. It’s the default architecture at the companies serving more LLM traffic than anyone else on the planet.

What changed when we split the pools

We ran a two-week proof of concept. I split the cluster into two pools: Eight GPUs dedicated to prompt processing and the remaining GPUs handling token generation. No new hardware, no new cluster — just a configuration change in the serving layer and a routing policy that sent each request to the right pool based on its inference phase. The prompt-processing pool hit 90–95% compute utilization consistently because that’s all it did. No token generation competing for scheduling slots. No decode requests sitting idle while a prefill burst hogged the cores.

The token-generation pool was the bigger surprise. By batching hundreds of concurrent decode requests together the memory reads got amortized across more work. Bandwidth utilization climbed above 70% — far better than the 30% we’d been seeing when decode requests were interleaved with prefill on the same GPU. Overall compute efficiency roughly doubled.

The cost math followed. The customer was spending about $2M annually on inference GPU-hours. After disaggregation they were on track to cut that by $600–800K while serving the same request volume at the same latency targets. No new hardware purchased. Same GPUs, same cluster, same model weights — different architecture.

The latency story was just as good. In the monolithic setup every time a new prompt arrived its processing burst would stall active token-generation requests. Users watching streaming responses would see the text pause mid-sentence while someone else’s prompt got processed. After the split: Steady token cadence with no prefill-induced stalls. P99 inter-token latency flattened out completely.

There are workloads where this doesn’t pay off. Short prompts under 512 tokens with short outputs don’t generate enough cache to justify a network transfer. Multi-turn conversations where 80%+ of the cache already lives on the decode worker from a previous turn are better served locally. And if you have fewer than a dozen GPUs the scheduling overhead of two pools can eat into whatever you save on utilization. But the teams complaining about GPU shortages and GPU bills are not running 4-GPU deployments with 512-token prompts. They’re running dozens to hundreds of GPUs at enterprise scale where the utilization waste adds up to millions per year.

The industry spends a lot of energy on the GPU supply side: Build more fabs, design better chips, negotiate bigger cloud contracts. Those things matter. But I keep coming back to what I saw in that profiling data. If the teams running monolithic LLM inference today switched to disaggregated serving the effective GPU supply would roughly double overnight. No new silicon required. The tools are ready. The proof points are in production. The only thing missing is the profiling step that makes the waste visible.

If you haven’t broken your inference utilization down by phase yet, do it this week. Add per-phase instrumentation to your serving layer. Plot prefill utilization and decode utilization separately over a 24-hour window. If the two lines look like they belong on different charts — and they will — you have your answer. You’ll stop paying for compute you’re not using.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 2.52 MB)

Is your Node.js project really secure? 23 Apr 2026, 9:00 am

JavaScript and Node.js teams do not lack security tools. What they still lack is a dependency security workflow that developers will actually use before release.

That is the real gap. A package gets installed, CI (continuous integration) runs, a scanner executes somewhere in the pipeline, and eventually a report appears. From a distance, that can look like maturity. In practice, it often means developers learn about dependency risks too late, too indirectly, and with too little clarity to act while the fix is still easy.

The real problem in JavaScript and Node.js security is no longer detection. It is actionability.

That is why so many teams can say they scan dependencies and still struggle to answer the questions that matter right before release. What exactly is vulnerable? Is it direct or transitive? Is there a fixed version? Can I fix it in my own project, or am I blocked behind an upstream dependency? Which finding deserves attention first?

Those are not edge cases. That is the real work.

In Node.js projects, the problem is easy to hide. A team may manage a reasonable number of direct dependencies while shipping hundreds or thousands of resolved packages through a lockfile. At that point, the challenge is no longer whether a scanner can produce output. Most can. The challenge is whether the result is understandable enough, local enough, and actionable enough to help a developer make a release decision before the issue turns into pipeline noise or last-minute triage.

That is where many workflows still fail. Detection exists. Usability often does not. Node.js teams do not have a scanner shortage. They have a workflow shortage.

What is missing is a fixability-first view of dependency security. Teams do not just need to know that something is vulnerable. They need to know what is directly actionable now, what is buried in transitive dependencies, and what kind of remediation path they are actually dealing with.

What CVE Lite CLI does differently

This is the problem I have been exploring through CVE Lite CLI, an open source tool built around the local dependency workflow JavaScript developers actually need.

CVE Lite CLI is not trying to win the platform race. It is trying to solve the moment where a developer needs a clear answer before release.

Its scope is intentionally narrow. It does not try to do exploitability analysis, runtime reachability, container scanning, secret scanning, or infrastructure scanning. It focuses on a more practical job: scanning JavaScript and TypeScript projects locally from their lockfiles, identifying known OSV-backed dependency issues, separating direct from transitive findings, showing dependency paths, surfacing fixed-version guidance, and producing output a developer can actually use before release.

That narrower scope is not a weakness. It is the reason the tool is useful.

Too much security tooling is built around organizational visibility. The CVE Lite CLI workflow is built around developer decision-making. Its value is not simply that it tells you vulnerabilities exist. Its value is that it makes dependency risk understandable early enough to change developer behavior.

That distinction matters. A warning that arrives late in CI may be technically correct, but operationally weak. A warning that appears locally, with direct versus transitive separation and dependency paths, is much closer to a plan for a fix.

This is the gap CVE Lite CLI aims to address. It moves dependency security closer to the point where engineering decisions are actually made.

Recent work on CVE Lite CLI pushes that workflow further by surfacing the exact package command for direct fixes where available. That makes the tool more useful at the moment developers move from detection to action.

In the stronger cases, providing the package command turns the tool from a scanner into a local remediation loop: scan, apply the suggested package change, and rescan immediately without waiting for branch-and-pipeline feedback.

That shift is bigger than convenience. It changes the feel of dependency security from a distant report into an active engineering loop. It lets the developer stay in the same working session, make a change, verify the result, and keep moving.

Local-first vulnerability scanning with CVE Lite CLI

In April 2026, I ran CVE Lite CLI against three public open source projects: Nest, pnpm, and release-it. The goal was not to single out those projects. Well-maintained projects can still surface dependency issues, and scan results can change over time. The point was to test whether a local-first tool could give developers something concrete enough to shape action.

The Nest run has now evolved into a fuller case study that makes the larger point clearer: the value of a local-first tool is not just that it detects issues, but that it helps developers move from scan output to a realistic remediation path in the same working session.

In Nest, CVE Lite CLI parsed 1,626 packages from package-lock.json and found 25 packages with known OSV matches: one high-severity issue, four medium, and 20 low. More important than the count was the structure. Twelve findings looked directly fixable in the project. Thirteen were transitive.

That is the kind of distinction raw counts hide. Twenty-five findings may sound alarming, but the real engineering question is how many of those can be acted on immediately. A fixability-first workflow makes that visible.

What the fuller Nest case study shows is that remediation is often iterative, not one-and-done. In one dependency path, resolving the issue required several tar upgrades in sequence as the dependency graph changed after each install. That is exactly where a local scan-fix-rescan loop becomes more useful than a CI-only workflow. Instead of upgrading, pushing a branch, waiting for a pipeline scanner, and discovering the next required upgrade later, the developer can keep working through the path locally until the dependency state is clean.

One of the strongest findings was diff@2.2.3, a high-severity transitive issue appearing through gulp-diff. The same scan also surfaced diff@4.0.2 as a medium-severity direct dependency and diff@7.0.0 as a medium-severity transitive dependency through mocha. That is a realistic picture of Node.js dependency management: the same package appearing in multiple forms, through multiple parents, with different remediation implications.

A weaker tool would simply tell the developer that vulnerabilities were found. CVE Lite CLI did something more useful. It exposed the dependency paths clearly enough to show why the remediation work was different in each case.

The same Nest scan surfaced tar@6.2.1 as a medium-severity direct dependency with fixed-version guidance, and form-data@2.3.3 as a medium-severity transitive issue through request. Those are not the same category of problem. One points toward a direct upgrade decision. The other points toward upstream dependency pressure. That is where dependency scanning stops being a checklist exercise and starts becoming real engineering work.

And that is where this kind of local-first dependency workflow performs well. It does not just report that something is wrong. It shows the developer what kind of wrong they are dealing with.

The release-it scan reinforced the same point on a smaller scale. CVE Lite CLI parsed 545 packages and found 10 packages with known OSV matches: four medium-severity and six low. Six appeared directly fixable. Four were transitive.

Two direct findings stood out immediately: @isaacs/brace-expansion@5.0.0 and flatted@3.3.3. Those are the kinds of issues a developer can reason about quickly. But the scan also found two minimatch findings arriving transitively through different parent chains, one through @npmcli/map-workspaces and another through glob.

That matters because it shows the tool is not only useful in large, messy dependency graphs. It is also useful in smaller projects where the real value comes from turning a vague dependency concern into a specific, inspectable remediation path.

The pnpm scan mattered for the opposite reason. CVE Lite CLI parsed 563 packages from pnpm-lock.yaml and returned no known OSV matches. That kind of result is easy to undervalue, but it should not be. A serious local workflow should not exist only to generate alerts. It should also be able to give developers confidence quickly when there is nothing obvious to fix.

That clean-result case is one of the reasons a lightweight local tool belongs in the workflow. Developers do not just need early warning. They also need fast reassurance.

Bringing dependency security into the developer workflow

The larger lesson here is not that open source projects are failing. It is that the developer workflow around dependency security is still immature. Teams have learned how to collect results. They have not learned how to make those results usable at the point where developers choose packages, update lockfiles, and prepare releases.

That is why CVE Lite CLI matters beyond the tool itself. It addresses a workflow problem that many JavaScript teams still live with every day.

The bigger issue is not one project or one scanner. It is whether dependency security becomes a normal part of everyday engineering practice.

CVE Lite CLI takes steps in that direction. It gives developers a local release check instead of forcing them to wait for CI. It gives them direct versus transitive visibility instead of flattening everything into one alarming list. It gives them dependency paths instead of vague package names with no remediation context. It gives them fixed-version guidance where possible instead of leaving them to infer the next move.

And because CVE Lite CLI is intentionally lightweight and narrow in scope, it is easier to trust, easier to adopt, and easier to add to a normal Node.js toolchain.

That point matters. Developers are already overloaded with tooling. The next tool that earns a place in the workflow will not be the one that makes the biggest promises. It will be the one that solves a real problem cleanly, honestly, and without forcing teams into a larger platform commitment.

That is why CVE Lite CLI has real potential. It meets developers where they already work.

More importantly, it points toward a broader shift in how dependency security should be understood. Security tooling is moving from vulnerability detection to vulnerability interpretation, from counting issues to understanding risk in context. That is where developer workflow becomes more important than dashboard volume.

A missing link in the developer toolchain

Dependency security should not feel like a special event. It should feel like linting, testing, or checking build output before release. In other words, it should become a normal part of the engineering loop.

That is the strongest case for CVE Lite CLI. It helps move security from a distant control function into an everyday developer habit.

For dependency paths that require more than one adjustment, a local-first, scan-fix-rescan workflow can be materially faster than relying on repeated CI feedback alone. If developers can scan lockfile-backed dependency state locally, understand what is direct, understand what is transitive, see the dependency paths, and get a credible sense of what to fix before release, then dependency security stops being abstract policy and starts becoming practical engineering.

That is what the JavaScript ecosystem needs more of.

Node.js does not need more theatrical security output. It needs better developer workflow infrastructure. It needs tools that can give clear, immediate, low-friction answers while there is still time to act. It needs tools that make dependency risk visible in the same place where dependency decisions are made.

A local-first, lockfile-aware workflow points in that direction.

And if the goal is to make dependency security a real part of everyday software engineering practice, then local-first lockfile scanning should stop being treated as a niche extra. It should become a normal part of the developer toolchain.

(image/jpeg; 10.39 MB)

How open source ideals must expand for AI 23 Apr 2026, 9:00 am

Open source has never been just a licensing model. Rather, it’s also a philosophy about shared effort, shared transparency, and shared agency. The shared goal is to make an impact in the world. In the age of AI‑assisted development and agents, there is a line of thinking that AI slop, specifically mass-produced and submitted code, is the downfall of open source projects. On the contrary, I think open source is headed for a resurgence like we’ve never seen before, as long as we emphasize the aspects of open source beyond code submissions.

In this new world, the philosophy, ethics, and morals of open source are more relevant than ever. However, the focus of open source needs to evolve past raw code: Specification files (spec files) and governance documents (constitutions) are becoming as important as the source itself. The challenge is not to choose between open source and AI, but to recognize that open source is now a community-based control and scope mechanism for open technologies.

Let me break this down further. Specs describe intent and outcomes, while code shows how that intent is actually realized. When something goes wrong, you still need to trace the path from spec to implementation. Trust is earned, not inferred, so the promise that, for example, an app values your privacy or that an agent never sends data to third parties must be backed by code and built on pipelines that anyone can inspect and verify (e.g. Acquacotta Constitution).

Governance complements spec files, in showing how a spec is created, enforced and followed. It’s the “people decisions” around a project — who makes the final decision? If there’s a vote on something, who votes? How do they vote? It’s these seemingly pedantic yet crucial decisions that emerge from governance, and they are the backbone for how spec files are created and followed when code contributions become as simple as writing a basic agent prompt.

Open means open, even for AI

The main criticism of AI with open source is that code contributions become open to everyone, not just those with deep technical knowledge. This makes the pillars of an open source community outside of the code more important than ever. Users fundamentally become contributors when anyone can create code, which means they need agency in specification. Additionally, this new class of community members needs to be able to help influence and change governance and spec, just like “normal” contributors in the pre-AI days. The spec files submitted with their AI-generated code must be open for inspection and reproducible, just like a more traditional code contribution. The ability to fork the implementation and run it on outside infrastructure also remains, enabling contributors to further refine their own customizations, integrations, and optimizations.

This is how organizations retain real agency in an era where code is a commodity. In other words, spec files broaden what we need to keep open; they do not replace the need to keep code, build systems, and dependency trees open and inspectable. The future is not specs instead of open source; it is open source plus open specs and open governance.

You version, review, and discuss specs in the same way you review and discuss code. You make architecture and governance artifacts part of the public record so that others can learn from, reuse, and improve them. This creates a richer set of open source assets. The repository does not just contain code; it contains the constitution of the project, the architectural reasoning, and the guardrails that keep AI tools and AI-driven code contributions on the rails.

This lowered barrier cannot be overstated. Truly anyone can now contribute, not just those who understand the code-based components of a project. Domain experts, designers, and operations specialists can propose changes at the spec level that AI agents then help implement. For a community that leads with open source values, and robust testing frameworks (coded in the constitution), this fuels the creation of high‑quality software while preserving the transparency, reviewability, and forkability that open source depends on.

‘Real’ open source

We do risk the convergence of open source and specs turning into a two-sided purity test. Some argue that if AI wrote most of the code, it is not “real” open source. This argument implies that the provenance of each line of syntax matters more than whether the system as a whole is transparent, forkable, and governed in the open. Meanwhile, the opposing side suggests that if we can regenerate the implementation from a spec, we no longer need to worry as much about licensing and code openness — as if an elegant constitution makes inspection of the actual machinery optional.

Both positions miss the point. If you care about user agency, security, and long‑term sustainability, as all open source projects should, you need both open code and open build pipelines, so anyone can inspect, reproduce, and harden what is running. You need open specs and governance, so anyone can understand what the system is supposed to do, how it is supposed to behave, and how decisions get made over time.

The new “definition” of open must consider implementation, specification, and governance as three critical factors that must be woven together. Open implementation means the source, dependencies, and build system are available under an open source license so you can rebuild, audit, and run the software yourself. Open specification means the requirements, architecture, and project constitution are documented, versioned, and public, so others can reuse them, learn from them, and adapt them to their own needs. Open governance means the processes by which changes are proposed, reviewed, and accepted — whether at the spec level or in code — are transparent and participatory.

The path forward for open source communities is not to retreat from spec‑driven, AI‑assisted development, nor to declare the old mission obsolete. It is to lead in defining and practicing what open specification, governance, and implementation look like together in an AI‑first world — and to do so with the confidence to dream bigger than incremental automation.

It’s this ability for individuals and organizations to dream bigger that makes it possible to tackle problems that were previously too big, too weird, or too niche to justify a full project team. A UX designer who’s never committed a line of code can suddenly create a serious tool. A security engineer can prototype a new threat‑hunting pipeline end‑to‑end. An architect can stand up a reference implementation of a safer integration pattern, then let others clone and extend it. In each case, AI doesn’t replace expertise; it amplifies it. The most valuable people in the room spend less time grinding out boilerplate and more time on intent, constraints, and trade‑offs.

AI tooling isn’t a one way street

However, there’s one really important thing to remember about dreaming bigger with AI tools: You are not the only one who has easy access to them — so do your competitors. Perhaps more importantly, the bad guys have them too.

When it comes to the former, dreaming bigger means dreaming bigger than the folks who would take business away from you. It’s about making sure that your customers stay your customers and that you can attract new ones on a continual basis.

When it comes to the latter, it means considering that the people who would do you and your customers harm have these same tools. And, if anyone dreams bigger, it’s the people who want to break your systems, defraud your customers, and manipulate your users.

That’s one reason the traditional “open source = open code” framing is starting to feel too small. If attackers can continuously remix powerful AI tooling in the shadows, defenders need open, inspectable patterns for detection, response, and governance that anyone can adopt and improve.

Open source’s big tent

The same openness that once turned a loose collection of hackers into the engine of the modern software stack can now be applied to a new layer of specs and agents. This is a new way forward to make open source even more accessible to a new type of contributor. For open source, it’s not about fighting AI slop. Instead, we as a global community need to push the positives of AI-driven contributions forward, so that the good far outweighs the bad.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 8.49 MB)

Claude Mythos signals a new era in AI-driven security, finding 271 flaws in Firefox 23 Apr 2026, 1:33 am

The Claude Mythos Preview appears to be living up to the hype, at least from a cybersecurity standpoint. The model, which Anthropic rolled out to a small group of users, including Firefox developer Mozilla, earlier this month, has discovered 271 vulnerabilities in version 148 of the browser. All have been fixed in this week’s release of Firefox 150, Mozilla emphasized.

These findings set a new precedent in AI’s ability to unearth bugs, and could turbocharge cybersecurity efforts.

“Nothing Mythos found couldn’t have been found by a skilled human,” said David Shipley of Beauceron Security. “The AI is not finding a new class of AI-exclusive super bugs. It’s just finding a lot of stuff that was missed.”

However, the news comes as Anthropic is reportedly investigating unauthorized use of Mythos by a small group who reportedly gained access via a third party vendor environment, revealing the double-edged nature of AI.

Closing the fuzzing gap

Firefox has previously pointed AI tools, notably Anthropic’s Claude Opus 4.6, at its browser in a quest for vulnerabilities, but Opus discovered just 22 security-sensitive bugs in Firefox 148, while Mythos uncovered more than ten times that many.

Firefox CTO Bobby Holley described the sense of “vertigo” his team felt when they saw that number. “For a hardened target, just one such bug would have been red-alert in 2025,” he wrote in a blog post, “and so many at once makes you stop to wonder whether it’s even possible to keep up.”

Firefox uses a defense-in-depth strategy, with internal red teams applying multiple layers of “overlapping defenses” and automated analysis techniques, he explained. Teams run each website in a separate process sandbox.

However, no layer is impenetrable, Holley noted, and attackers combine bugs in the rendering code with bugs in the sandboxes in an attempt to gain privileged access. While his team has now adopted a more secure programming language, Rust, the developers can’t afford to stop and rewrite the decades’ worth of existing C++ code, “especially since Rust only mitigates certain, (very common) classes of vulnerabilities.”

While automated analysis techniques like fuzzing, which uncovers vulnerabilities or bugs in source code, are useful, some bits of code are more difficult to fuzz than others, “leading to uneven coverage,” Holley pointed out. Human teams can find bugs that AI can’t by reasoning through source code, but this is time-consuming, and is bottlenecked due to limited human resources.

Now, Claude Mythos Preview is closing this gap, detecting bugs that fuzzing doesn’t surface.

“Computers were completely incapable of doing this a few months ago, and now they excel at it,” Holley noted. Mythos Preview is “every bit as capable” as human researchers, he asserted, and there is no “category or complexity” of vulnerability that humans can find that Mythos can’t.

Defenders now able to win ‘decisively’?

Gaps between human-discoverable and AI-discoverable bugs favor attackers, who can afford to concentrate months of human effort to find just one bug they can exploit, Holley noted. Closing this gap with AI can help defenders erode that long-term advantage.

The industry has largely been fighting security “to a draw,” he acknowledged, and security has been “offensively-dominant” due to the size of the attack surface, giving adversaries an “asymmetric advantage.” In the face of this, both Mozilla and security vendors have “long quietly acknowledged” that bringing exploits to zero was “unrealistic.”

But now with Mythos (and likely subsequent models), defenders have a chance to win, “decisively,” Holley asserted. “The defects are finite, and we are entering a world where we can finally find them all.”

What security teams should do now

Finding 271 flaws in a mature codebase like Firefox illustrates the fact that AI-driven vulnerability discovery is now operating at a scale and depth that can outpace traditional human-led review, noted Ensar Seker, CISO at cyber threat intelligence company SOCRadar.

Holley’s “vertigo,” he said, was because defenders are realizing the attack surface is larger, and “more rapidly discoverable than previously assumed.”

Security teams must respond by shifting from periodic testing to continuous validation, Seker advised. That means integrating AI-assisted code analysis into continuous integration/continuous delivery (CI/CD) pipelines, prioritizing “patch velocity over perfection,” and assuming that any externally reachable code path will eventually be discovered and weaponized.

“The goal is no longer just finding vulnerabilities first, but reducing the window between discovery and remediation,” he said.

Shipley agreed that any company building software must evaluate resourcing so it can quickly and proactively find and fix vulnerabilities. “But stuff will happen,” he acknowledged. So, in addition to doing proactive work, enterprises must regularly exercise their incident response playbooks.

“The next few years are going to be a marathon, not a sprint,” said Shipley.

Dual-use nature of AI is a challenge

However, the dual-use nature of these systems present a big challenge. The same capability that helps defenders identify hundreds of flaws can be turned against them if the model or its outputs are exposed, Seker pointed out.

The reported unauthorized access to Mythos “reinforces that AI systems themselves are now high-value targets, effectively becoming part of the attack surface,” he said.

It’s not at all surprising that people found a way to access Mythos, Shipley agreed; it was inevitable. “Nor does Anthropic have some unique, insurmountable or exclusive AI capability for hacking,” he said, pointing out that OpenAI is already catching up in that regard, and others will “catch and surpass” Mythos.

Striking a balance requires treating AI models like privileged infrastructure, Seker noted. Enterprises need strict access controls, output monitoring, and isolation of sensitive workflows. Developers, meanwhile, must adapt by writing code that is resilient to automated scrutiny; this requires stronger input validation, safer defaults, and “fewer assumptions about obscurity.”

“In this paradigm, security isn’t just about defending systems; it’s about defending the tools that are now capable of breaking them at scale,” Seker emphasized.

This article originally appeared on CSOonline.

(image/jpeg; 0.74 MB)

Page processed in 0.443 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.