Intro to Hotwire: HTML over the wire | InfoWorld

Technology insight for the enterprise

Intro to Hotwire: HTML over the wire 31 Dec 2025, 9:00 am

If you’ve been watching the JavaScript landscape for a while, you’ve likely noticed the trend toward simplicity in web application development. An aspect of this trend is leveraging HTML, REST, and HATEOAS (hypermedia as the engine of application state) to do as much work as possible. In this article, we’ll look at Hotwire, a collection of tools for building single-page-style applications using HTML over the wire.

Hotwire is a creative take on front-end web development. It’s also quite popular, with more than 33,000 stars on GitHub and 493,000 weekly NPM downloads as of this writing.

Hotwire: An alternative to HTMX

Hotwire is built on similar principles to HTMX and offers an alternative approach to using HTML to drive the web. Both projects strive to eliminate boilerplate JavaScript and let developers do more with simple markup. Both embrace HATEOAS and the original form of REST. The central insight here is that application markup can contain both the state (or data) and the structure of how data is to be displayed. This makes it possible to sidestep the unnecessary logistics of marshaling JSON at both ends.

This concept isn’t new—in fact, it is the essence of representational state transfer (REST). Instead of converting to a special data format (JSON) on the server, then sending that over to the client where it is converted for the UI (HTML), you can just have the server send the HTML.

Technologies like HTMX and Hotwire streamline the process, making it palatable for developers and users who are acclimated to the endless micro-interactions spawned by Ajax.

Hotwire has three primary JavaScript components, but we are mainly interested in the first two:

  • Turbo: Allows for fine-grained control of page updates.
  • Stimulus: A concise library for client-side interactivity.
  • Native: A library for creating iOS- and Android-native apps from Turbo and Stimulus.

In this article, we will look at Turbo and Stimulus. Turbo has several components that make interactivity with HTML more powerful:

  • Turbo Drive avoids full page reloads for links and form submits.
  • Turbo Frame lets you define areas of the UI that can be loaded independently (including lazy loading).
  • Turbo Streams allows for arbitrarily updating specific page segments (using WebSockets, server-side events, or a form response).

Turbo Drive: Merging pages, not loading pages

In standard HTML, when you load a page, it completely obliterates the existing content and paints all the content anew as it arrives from the server. This is incredibly inefficient and makes for a bad user experience. Turbo Drive takes a different approach by dropping in a JavaScript link, which merges the page contents instead of reloading them.

Think of merging like diffing the current page with the coming page. The header information is updated rather than being wholesale replaced. Modern Turbo even “morphs” the and elements, providing a much smoother transition. (For obvious reasons, this approach is especially effective for page reloads.)

All you have to do is include the turbo script in your page:



It is also important to point out that browser actions like back, forward, and reload all work normally. Merging is a low-cost, low-risk way of improving page navigation and reloads in web pages.

Turbo Frames: Granular UI development

The basic idea in Frames is to decompose the layout of a web page into elements. You then update these frames piecemeal, and only as needed. The overall effect is like using JSON responses to drive reactive updates to the UI, but in this case we are using HTML fragments.

Take this page as an example:


  
Links that change the entire page

Thimbleberry (Rubus parviflorus)

A delicate, native berry with large, soft leaves.

Edit this description
Found a large patch by the creek.
The berries are very fragile.
...

Here we have a top navigation pane with links that will affect the entire page (useable with Turbo Drive). Then there are two interior elements that can be modified in place, without the entire page reload.

The elements capture events within them. So, when you click the link to edit the field notes, the server can respond with a chunk to provide an editable form:


  

Field Notes

Found a large patch by the creek.
The berries are very fragile.

This chunk would be rendered as a live form. The user can make updates and submit the new data, and the server would reply with a new fragment containing the updated frame:


  

Field Notes

Found a large patch by the creek.
The berries are very fragile.
Just saw a bear!

Turbo takes the ID on the arriving frame content and ensures it replaces the same frame on the page (so it is essential that the server puts the correct ID on the fragments it sends). Turbo is smart enough to extract and place only the relevant fragment, even if an entire page is received from the server.

Turbo Streams: Compound updates

Turbo Drive is a simple and effective mechanism for handling basic server interactions. Sometimes, we need more powerful updates that interact with multiple portions of the page, or that are triggered from the server side. For that, Turbo has Streams.

The basic idea is that the server sends a stream of fragments, each with the ID of the part of the UI that will change, along with the content needed for the change. For example, we might have a stream of updates for our wilderness log:


  



  



  

Here, we are using streams instead of frames to handle the notes update. The idea is that each section that needs updating, like the new note, the note counter, and the live form section receive their content as a stream item. Notice the stream items each has an “action” and a “target” to describe what will happen.

Streams can target multiple elements by using the targets (notice the plural here) and a CSS selector to identify the elements that will be affected.

Turbo will automatically handle responses from the server (like for a form response) that contain a collection of elements, placing them correctly into the UI. This will handle many multi-change requirements. Notice also that in this case, when you are using streams, you don’t need to use a . In fact, mixing the two is not recommended. As a rule of thumb, you should use frames for simplicity whenever you can, and upgrade to streams (and dispense with frames) only when you need to.

Reusability

A key benefit to both Turbo Frames and Turbo Streams is being able to reuse the server-side templates that render UI elements both initially and for updates. You simply decompose your server-side template (like RoR templates or Thymeleaf or Kotlin DSL or Pug—whatever tool you are using) into the same chunks the UI needs. Then you can just use them to render both the initial and ongoing states of those chunks.

For example, here’s a simple Pug template that could be used as part of the whole page or to generate update chunks:

turbo-frame#field_notes

  h2 Field Notes

  //- 1. The List: Iterates over the 'notes' array
  div#notes_list
    each note in notes
      div(id=`note_${note.id}`)= note.content

  //- 
    2. The Form: On submission, this fragment is re-rendered
    -  by the server, which includes a fresh, empty form.

  form(action="/berries/thimbleberry/notes", method="post")
    
    div
      label(for="note_content") Add a new note:
    
    div
      //- We just need the 'name' attribute for the server
      textarea(id="note_content", name="content")
    
    div
      input(type="submit", value="Save note")

Server push

It’s also possible to provide background streams of events using the element:

This element automatically connects to a back-end API for SSE or WebSocket updates. These broadcast updates would have the same structure as before:




Which will automatically connect to a back end API for SSE or WebSocket updates.  These broadcast updates would have the same structure as before:


  

Client-side magic with Stimulus

HTMX is sometimes paired with Alpine.js, with the latter giving you fancier front-end interactivity like accordions, drag-and-drop functionality, and so forth. In Hotwire, Stimulus serves the same purpose.

In Stimulus, you use HTML attributes to connect elements to “controllers,” which are chunks of JavaScript functionality. For example, if we wanted to provide a clipboard copy button, we could do something like this:

Thimbleberry (Rubus parviflorus)

A delicate, native berry with large, soft leaves.

Notice the data-contoller attribute. That links the element to the clipboard controller. Stimulus uses a filename convention, and in this case, the file would be: clipboard_controller.js, with contents something like this:

import { Controller } from "@hotwired/stimulus"

export default class extends Controller {

  // Connects to data-clipboard-target="source" 
  // and data-clipboard-target="feedback"
  static targets = [ "source", "feedback" ]

  // Runs when data-action="click->clipboard#copy" is triggered
  copy() {
    // 1. Get text from the "source" target
    const textToCopy = this.sourceTarget.textContent
    
    // 2. Use the browser's clipboard API
    navigator.clipboard.writeText(textToCopy)

    // 3. Update the "feedback" target to tell the user
    this.feedbackTarget.textContent = "Copied!"

    // 4. (Optional) Reset the button after 2 seconds
    setTimeout(() => {
      this.feedbackTarget.textContent = "Copy Name"
    }, 2000)
  }
}

The static target member provides those elements to the controller to work with, based on the data-clipboard-target attribute in the markup. The controller then uses simple JavaScript to perform the clipboard copy and a timed message to the UI.

The basic idea is you keep your JavaScript nicely isolated in small controllers that are linked into the markup as needed. This lets you do whatever extra client-side magic to enhance the server-side work in a manageable way.

Conclusion

The beauty of Hotwire is in doing most of what you need with a very small footprint. It does 80% of the work with 20% of the effort. Hotwire doesn’t have the extravagant power of a full-blown framework like React or a full-stack option like Next, but it gives you most of what you’ll need for most development scenarios. Hotwire also works with any back end with typical technologies.

(image/jpeg; 4.65 MB)

Nvidia licenses Groq’s inferencing chip tech and hires its leaders 30 Dec 2025, 3:22 pm

Nvidia has licensed intellectual property from inferencing chip designer Groq, and hired away some of its senior executives, but stopped short of an outright acquisition.

“We’ve taken a non-exclusive license to Groq’s IP and have hired engineering talent from Groq’s team to join us in our mission to provide world-leading accelerated computing technology,” an Nvidia spokesman said Tuesday, via email. But, he said, “We haven’t acquired Groq.”

Groq designs and sells chips optimized for AI inferencing. These chips, which Groq calls language processing units (LPUs), are lower-powered, lower-priced devices than the GPUs Nvidia designs and sells, which these days are primarily used for training AI models. As the AI market matures, and usage shifts from the creation of AI tools to their use, demand for devices optimized for inferencing is likely to grow.

The company also rents out its chips, operating an inferencing-as-a-service business called GroqCloud.

Groq itself announced the deal and the executive moves on Dec. 24, saying “it has entered into a non-exclusive licensing agreement with Nvidia for Groq’s inference technology” and that, as part fo the agreement, “Jonathan Ross, Groq’s Founder, Sunny Madra, Groq’s President, and other members of the Groq team will join Nvidia to help advance and scale the licensed technology.”

The deal could be worth as much as $20 billion, TechCrunch reported.

A way out of the memory squeeze?

There’s tension throughout the supply chain for chips used for AI applications, leading to Nvidia’s CFO reporting in its last earnings call that some of its chips are “sold out” or “fully utilized.” One of the factors contributing to this identified by analysts is a shortage of high-bandwidth memory. Finding ways to make their AI operations less dependent on scarce memory chips is becoming a key objective for AI vendors and enterprise buyers alike.

A significant difference between Groq’s chip designs and Nvidia’s is the type of memory each uses. Nvidia’s fastest chips are designed to work with high-bandwidth memory, the price of which – like that of other fast memory technologies — is soaring due to limited production capacity and rising demand in AI-related applications. Groq, meanwhile, integrates static RAM into its chip designs. It says SRAM is faster and less power-hungry than the dynamic RAM used by competing chip technologies — and another advantage is that it’s not (yet) as scarce as the high-bandwidth memory or DDR5 DRAM used elsewhere. Licensing Groq’s technology opens the way for Nvidia to diversify its memory sourcing.

Not an acquisition

By structuring its relationship with Groq as an IP licensing deal, and hiring the engineers it is most interested in rather than buying their employer, Nvidia avoids taking on the GroqCloud service business just as it is reportedly stepping back from its own service business, DGX cloud, and restructuring it as an internal engineering service. It could also escape much of the antitrust scrutiny that would have accompanied a full-on acquisition.

Nvidia did not respond to questions about the names and roles of the former Groq executives it has hired.

However, Groq’s founder, Jonathan Ross, reports on his LinkedIn profile that he is now chief software architect at Nvidia, while that of Groq’s former president, Sunny Madra, says he is now Nvidia’s VP of hardware.

What’s left of Groq will be run by Simon Edwards, formerly CFO at sales automation software vendor Conga. He joined Groq as CFO just three months ago.

This article first appeared on Network World.

(image/jpeg; 0.13 MB)

2026: The year we stop trusting any single cloud 30 Dec 2025, 9:00 am

For more than a decade, many considered cloud outages a theoretical risk, something to address on a whiteboard and then quietly deprioritize during cost cuts. In 2025, this risk became real. A major Google Cloud outage in June caused hours-long disruptions to popular consumer and enterprise services, with ripple effects into providers that depend on Google’s infrastructure. Microsoft 365 and Outlook also faced code failures and notable outages, as did collaboration platforms like Slack and Zoom. Even security platforms and enterprise backbones suffered extended downtime.

None of these incidents, individually, was apocalyptic. Collectively, they changed the tone in the boardroom. Executives who once saw cloud resilience as an IT talking point suddenly realized that a configuration change in someone else’s platform could derail support queues, warehouse operations, and customer interactions in one stroke.

Relying on one provider is risky

The real story is not that cloud platforms failed; it’s that enterprises quietly allowed those platforms to become single points of failure for entire business models. In 2025, many organizations discovered that their digital transformation had traded physical single points of failure for logical ones in the form of a single region, a single provider, or even a single managed database. When a hyperscaler region had trouble, companies learned the hard way that “highly available within a region” is not the same as “business resilient.”

What caught even seasoned teams off guard was the hidden dependency chain. Organizations that thought they were cloud-agnostic because they used a SaaS provider discovered that the SaaS itself was entirely dependent on a single cloud region. When that region faltered, so did the SaaS—and by extension, the business. This is why 2026 will be the year where dependence itself, not just uptime numbers, becomes a primary design concern.

Resilience gets its own budget line

Every downturn and major outage reshapes budgets. The 2025 incidents are doing that right now. I’m seeing CIOs and CFOs move away from the idea that resilience is something you squeeze in if there’s leftover budget after cost optimization. Instead, resilience is getting explicit funding, with line items for multiregion architectures, modernized backup and restore, and cross-cloud or hybrid continuity strategies.

This is a shift in mindset as much as in money. We once justified resilience in terms of compliance or technical best practices. In 2026, we’ll look for direct revenue protection and risk reduction, often backed by concrete numbers from the 2025 outages: lost transactions, missed service-level agreements, overtime for remediation, and reputational damage. Once those numbers are quantified, resilience stops being a nice to have and becomes a board-sanctioned business control.

Relocation is back

For years, enterprises talked about cloud portability and avoiding lock-in. They then deeply embedded themselves in proprietary services for speed and convenience. 2026 is when many of those same organizations will take a second look and start moving selected workloads and data into more portable, resilient architectures. That does not mean a mass exodus from the major clouds; it means being far more deliberate about which workloads live where and why.

Expect to see targeted workload shifts that move critical customer-facing systems from single-region to multi-region or cross-cloud setups, re-architecting data platforms with replicated storage and active-active databases (meaning that we have two running, with one backing up the other). Also, relocating some systems to private or colocation environments based on risk. Systems that could significantly halt revenue or operations will have their placement and dependencies reassessed.

Redundancy stops being a luxury

In the early cloud days, active-active architectures across regions—or worse, across providers—were viewed as exotic and expensive. In 2026, for selected tiers of applications and data, they will be considered baseline engineering hygiene. The outages of 2025 demonstrated that running “hot–warm” with manual failover often means you are functionally down for hours when you can least afford it.

The response will include more active-active patterns: stateless services across regions managed globally, multi-region data stores with conflict resolution, and messaging layers resilient to provider issues. Enterprises will adopt chaos engineering and failure testing as ongoing practices, requiring continuous resilience proof beyond disaster recovery records.

Rethinking third-party services

One of the more uncomfortable lessons from 2025 was that indirect cloud dependence can hurt just as much as direct dependence. Several SaaS and platform providers marketed themselves as simplifying complexity and insulating customers from cloud details, yet internally ran everything in a single cloud, sometimes a single region. When their underlying cloud experienced issues, customers found they had no visibility, no leverage, and no alternative.

In 2026, smart enterprises will start asking their vendors the hard questions. Which regions and providers do you use? Do you have a tested failover strategy across regions or providers? What happens to my data and SLAs if your primary cloud has a regional incident? Many will diversify not just across hyperscalers, but across SaaS and managed services, deliberately avoiding over-concentration on any provider that cannot demonstrate meaningful redundancy.

Embracing resilience in 2026

If 2025 was the wake-up call, 2026 will be the year to act with discipline. That starts with an honest dependency inventory: not just which clouds you use directly, but which clouds and regions sit beneath your SaaS, security, networking, and operations tools. From there, you can classify systems by business criticality and map appropriate resilience patterns to each class, reserving the most expensive mechanisms, such as cross-region active-active, for systems where downtime is truly existential.

Equally important is organizational change. Resilience is not only an architectural problem; it is an operations, finance, and governance problem. In 2026, the enterprises that succeed will be the ones that align architecture, site reliability engineering, security, and finance around a shared goal: reduce single points of failure in both technology and vendors, validate failover and recovery as rigorously as new features, and treat cloud dependence as a managed business risk rather than a hidden assumption. The cloud is not going away, nor should it, but our blind trust in any single piece of it must stop.

(image/jpeg; 5.25 MB)

How to build RAG at scale 30 Dec 2025, 9:00 am

Retrieval-augmented generation (RAG) has quickly become the enterprise default for grounding generative AI in internal knowledge. It promises less hallucination, more accuracy, and a way to unlock value from decades of documents, policies, tickets, and institutional memory. Yet while nearly every enterprise can build a proof of concept, very few can run RAG reliably in production.

This gap has nothing to do with model quality. It is a systems architecture problem. RAG breaks at scale because organizations treat it like a feature of large language models (LLMs) rather than a platform discipline. The real challenges emerge not in prompting or model selection, but in ingestion, retrieval optimization, metadata management, versioning, indexing, evaluation, and long-term governance. Knowledge is messy, constantly changing, and often contradictory. Without architectural rigor, RAG becomes brittle, inconsistent, and expensive.

RAG at scale demands treating knowledge as a living system

Prototype RAG pipelines are deceptively simple: embed documents, store them in a vector database, retrieve top-k results, and pass them to an LLM. This works until the first moment the system encounters real enterprise behavior: new versions of policies, stale documents that remain indexed for months, conflicting data in multiple repositories, and knowledge scattered across wikis, PDFs, spreadsheets, APIs, ticketing systems, and Slack threads.

When organizations scale RAG, ingestion becomes the foundation. Documents must be normalized, cleaned, and chunked with consistent heuristics. They must be version-controlled and assigned metadata that reflects their source, freshness, purpose, and authority. Failure at this layer is the root cause of most hallucinations. Models generate confidently incorrect answers because the retrieval layer returns ambiguous or outdated knowledge.

Knowledge, unlike code, does not naturally converge. It drifts, forks, and accumulates inconsistencies. RAG makes this drift visible and forces enterprises to modernize knowledge architecture in a way they’ve ignored for decades.

Retrieval optimization is where RAG succeeds or fails

Most organizations assume that once documents are embedded, retrieval “just works.” Retrieval quality determines RAG quality far more than the LLM does. As vector stores scale to millions of embeddings, similarity search becomes noisy, imprecise, and slow. Many retrieved chunks are thematically similar but semantically irrelevant.

The solution is not more embeddings; it is a better retrieval strategy. Large-scale RAG requires hybrid search that blends semantic vectors with keyword search, BM25, metadata filtering, graph traversal, and domain-specific rules. Enterprises also need multi-tier architectures that use caches for common queries, mid-tier vector search for semantic grounding, and cold storage or legacy data sets for long-tail knowledge.

The retrieval layer must behave more like a search engine than a vector lookup. It should choose retrieval methods dynamically, based on the nature of the question, the user’s role, the sensitivity of the data, and the context required for correctness. This is where enterprises often underestimate the complexity. Retrieval becomes its own engineering sub-discipline, on par with devops and data engineering.

Reasoning, grounding, and validation protect answers from drift

Even perfect retrieval does not guarantee a correct answer. LLMs may ignore context, blend retrieved content with prior knowledge, interpolate missing details, or generate fluent but incorrect interpretations of policy text. Production RAG requires explicit grounding instructions, standardized prompt templates, and validation layers that inspect generated answers before returning them to users.

Prompts must be version-controlled and tested like software. Answers must include citations with explicit traceability. In compliance-heavy domains, many organizations route answers through a secondary LLM or rule-based engine that verifies factual grounding, detects hallucination patterns, and enforces safety policies.

Without a structure for grounding and validation, retrieval is only optional input, not a constraint on model behavior.

A blueprint for enterprise-scale RAG

Enterprises that succeed with RAG rely on a layered architecture. The system works not because any single layer is perfect, but because each layer isolates complexity, makes change manageable, and keeps the system observable.

Below is the reference architecture that has emerged through large-scale deployments across fintech, SaaS, telecom, healthcare, and global retail. It illustrates how ingestion, retrieval, reasoning, and agentic automation fit into a coherent platform.

To understand how these concerns fit together, it helps to visualize RAG not as a pipeline but as a vertically integrated stack, one that moves from raw knowledge to agentic decision-making:

RAG stack

Foundry

This layered model is more than an architectural diagram: it represents a set of responsibilities. Each layer must be observable, governed, and optimized independently. When ingestion improves, retrieval quality improves. When retrieval matures, reasoning becomes more reliable. When reasoning stabilizes, agentic orchestration becomes safe enough to trust with automation.

The mistake most enterprises make is collapsing these layers into a single pipeline. That decision works for demos but fails under real-world demands.

Agentic RAG is the next step toward adaptive AI systems

Once the foundational layers are stable, organizations can introduce agentic capabilities. Agents can reformulate queries, request additional context, validate retrieved content against known constraints, escalate when confidence is low, or call APIs to augment missing information. Instead of retrieving once, they iterate through the steps: sense, retrieve, reason, act, and verify.

This is what differentiates RAG demos from AI-native systems. Static retrieval struggles with ambiguity or incomplete information. Agentic RAG systems overcome those limitations because they adapt dynamically.

The shift to agents does not eliminate the need for architecture, it strengthens it. Agents rely on retrieval quality, grounding, and validation. Without these, they amplify errors rather than correct them.

Where RAG fails in the enterprise

Despite strong early enthusiasm, most enterprises confront the same problems. Retrieval latency climbs as indexes grow. Embeddings drift out of sync with source documents. Different teams use different chunking strategies, producing wildly inconsistent results. Storage and LLM token costs balloon. Policies and regulations change, but documents are not re-ingested promptly. And because most organizations lack retrieval observability, failures are hard to diagnose, leading teams to mistrust the system.

These failures all trace back to the absence of a platform mindset. RAG is not something each team implements on its own. It is a shared capability that demands consistency, governance, and clear ownership.

A case study in scalable RAG architecture

A global financial services company attempted to use RAG to support its customer-dispute resolution process. The initial system struggled: retrieval returned outdated versions of policies, latency spiked during peak hours, and agents in the call center received inconsistent answers from the model. Compliance teams raised concerns when the model’s explanations diverged from the authoritative documentation.

The organization re-architected the system using a layered model. They implemented hybrid retrieval strategies that blended semantic and keyword search, introduced strict versioning and metadata policies, standardized chunking across teams, and deployed retrieval observability dashboards that exposed cases where documents contradicted each other. They also added an agent that automatically rewrote unclear user queries and requested additional context when initial retrieval was insufficient.

The results were dramatic. Retrieval precision tripled, hallucination rates dropped sharply, and dispute resolution teams reported significantly higher trust in the system. What changed was not the model but the architecture surrounding it.

Retrieval is the key

RAG is often discussed as a clever technique for grounding LLMs, but in practice it becomes a large-scale architecture project that forces organizations to confront decades of knowledge debt. Retrieval, not generation, is the core constraint. Chunking, metadata, and versioning matter as much as embeddings and prompts. Agentic orchestration is not a futuristic add-on, but the key to handling ambiguous, multi-step queries. And without governance and observability, enterprises cannot trust RAG systems in mission-critical workflows.

Enterprises that treat RAG as a durable platform rather than a prototype will build AI systems that scale with their knowledge, evolve with their business, and provide transparency, reliability, and measurable value. Those who treat RAG as a tool will continue to ship demos, not products.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 5.21 MB)

Understanding AI-native cloud: from microservices to model-serving 29 Dec 2025, 8:12 pm

Cloud computing has fundamentally transformed the way enterprises operate. Initially built for more basic, everyday computing tasks, its capabilities have expanded exponentially with the advent of new technologies (such as machine learning and analytics).

But AI — particularly generative AI and the emerging class of AI agents — presents all-new challenges for cloud architectures. It is resource-hungry, demands ultra-fast latency, and requires new compute pathways and data access. These capabilities can’t simply be bolted on to existing cloud infrastructures.

Simply put, AI has upended the traditional cloud computing paradigm, leading to a new category of infrastructure: AI-native cloud.

Understanding AI-native cloud

AI-native cloud, or cloud-native AI, is still a new concept, but it is broadly understood as an extension of cloud native. It is infrastructure built with AI and data as cornerstones, allowing forward-thinking enterprises to infuse AI into their operations, strategies, analysis, and decision-making processes from the very start.

Differences between AI-native and traditional cloud models

Cloud computing has become integral to business operations, helping enterprises scale and adopt new technologies. In recent years, many organizations have shifted to a ‘cloud native’ approach, meaning they are building and running apps directly in the cloud to take full advantage of its benefits and capabilities. Many of today’s modern applications live in public, private, and hybrid clouds.

According to the Cloud Native Computing Foundation (CNCF), cloud native approaches incorporate containers, service meshes, microservices, immutable infrastructure, and declarative APIs. “These techniques enable loosely coupled systems that are resilient, manageable, and observable,” CNCF explains.

5 things you need to know about AI-native cloud

  1. AI is the core technology: In a traditional cloud, AI is an add-on. In an AI-native cloud, every layer—from storage to networking — is designed to handle the high-throughput, low-latency demands of large models.
  2. GPU-first orchestration: AI-native clouds prioritize GPUs and TPUs. This requires advanced orchestration tools like Kubernetes for AI to manage distributed training and inference economics
  3. The vector foundation: Data modernization is the price of entry. AI-native clouds rely on vector databases to provide long-term memory for AI models, allowing them to access proprietary enterprise data in real-time without hallucinating
  4. Rise of neoclouds: 2026 will seei the rise of specialized neocloud providers (like CoreWeave or Lambda) that offer GPU-centric infrastructure that hyperscalers are often struggling to match in terms of raw performance and cost.
  5. From AIOps to agenticops: The goal isn’t just a faster system; it’s a self-operating one. AI-native cloud allows for agentic AI can autonomously manage network traffic, resolve IT tickets, and optimize cloud spend.

AI-native cloud is an evolution of this strategy, applying cloud-native patterns and principles to build and deploy scalable, repeatable AI apps and workloads. This can help devs and builders overcome key challenges and limitations when it comes to building, running, launching, and monitoring AI workloads with traditional infrastructures.

The challenges with AI in the cloud

The cloud is an evolution of legacy infrastructures, but it was largely built with software-as-a-service (SaaS) and other as-a-service models in mind. In this setting, AI, ML, and advanced analytics become just another workload, as opposed to a core, critical component.

But AI is much  more demanding than traditional workflows, which, when run in the cloud, can lead to higher computing costs, data bottlenecks, hampered performance, and other critical issues.

Generative AI, in particular, requires the following:

  • Specialized hardware and significant computational power
  • Infrastructure that is scalable and flexible
  • Massive and diverse datasets for iterative training
  • High-performance storage, high bandwidth and throughput, diverse data sets, and low-latency access to data

AI data needs are significant and continue to escalate as systems become more complex; data must be processed, handled, managed, transferred, and analyzed rapidly and accurately to ensure the success of AI projects. Distributed computing, parallelism (splitting AI tasks across multiple CPUs or GPUs), ongoing training and iteration, and efficient data handling are essential — but traditional cloud infrastructures can struggle to keep up.

Existing infrastructure simply lacks the flexibility demanded by more intense, complex AI and ML workflows. It can also fragment the user experience, meaning devs and builders have to move back and forth between numerous interfaces, instead of a unified plane.

Essential components of AI-native cloud

Rather than the traditional “lift and shift” cloud migration strategy — where apps and workloads are quickly moved to the cloud “as-is” without redesign — AI-native cloud requires a fundamental redesign and rewiring of infrastructures for a clean slate.

This refactoring involves many of the key principles of cloud-native builds, but in a way that supports the development of AI applications. It requires the following:

  • Microservices architecture
  • Containerized packaging and orchestration
  • Continuous integration/continuous delivery (CI/CD) DevOps practices
  • Observability tools
  • Dedicated data storage
  • Managed services and cloud-native products (like Kubernetes, Terraform, or OpenTelemetry)
  • More complex infrastructures like vector databases

Data modernization is critical for AI; systems require data flow in real time from data lakes, lakehouses or other stores, the ability to connect data and provide context for models, and clear rules for how to use and manage data.

AI workloads must be built in from the start, with training, iteration, deployment, monitoring, and version control capabilities all part of the initial cloud setup. This allows models to be managed just like any other service.

AI-native cloud infrastructures must also support continuous AI evolution. Enterprises can incorporate AIOps, MLOps, and FinOps practices to support efficiency, flexibility, scalability, and reliability. Monitoring tools can flag issues with models (like drift, or performance degradation over time), and security and governance guardrails can support encryption, identity verification, regulatory compliance, and other safety measures.

According to CNCF, AI-native cloud infrastructures can use the cloud’s underlying computing network (CPUs, GPUs, or Google’s TPUs) and storage capabilities to accelerate AI performance and reduce costs.

Dedicated, built-in orchestration tools can do the following:

  • Automate model delivery via CI/CD pipelines
  • Enable distributed training
  • Support scalable data science to automate ML
  • Provide infrastructure for model serving
  • Facilitate data storage via vector databases and other data architectures
  • Enhance model, LLM, and workload observability

The benefits of AI-native cloud and business implications

There are numerous benefits when AI is built in from the start, including the following:

  • Automation of routine tasks
  • Real-time data processing and analytics
  • Predictive insights and predictive maintenance
  • Supply chain management
  • Resource optimization
  • Operational efficiency and scalability
  • Hyper-personalization at scale for tailored services and products
  • Continuous learning, iteration and improvement through ongoing feedback loops.

Ultimately, AI-native cloud allows enterprises to embed AI from day one, unlocking automation, real-time intelligence, and predictive insights to support efficiency, scalability, and personalized experiences.

Paths to the AI-native cloud

Like any technology, there is no one-size-fits-all for AI-native cloud infrastructures.

IT consultancy firm Forrester identifies five “paths” to the AI-native cloud that align with key stakeholders including business leaders, technologists, data scientists, and governance teams. These include:

The open-source AI ecosystem

The cloud embedded Kubernetes into enterprise IT, and what started out as an open-source container orchestration system has evolved into a “flexible, multilayered platform with AI at the forefront,” according to Forrester.

The IT firm identifies different domains in open-source AI cloud, including model-as-a-service, and predicts that devs will shift from local compute to distributed Kubernetes clusters, and from notebooks to pipelines. This “enables direct access to open-source AI innovation.”

AI-centric neo-PaaS

Cloud platform-as-a-service (PaaS) streamlined cloud adoption. Now, Kubernetes-based PaaS provides access to semifinished or prebuilt platforms that abstract away “much or all” of the underlying infrastructure, according to Forrester. This supports integration with existing data science workflows (as well as public cloud platforms) and allows for flexible self-service AI development.

Public cloud platform-managed AI services

Public clouds have taken a distinctly enterprise approach, bringing AI “out of specialist circles into the core of enterprise IT,” Forrester notes. Initial custom models have evolved into widely-used platforms including Microsoft Azure AI Foundry, Amazon Bedrock, Google Vertex, and others. These provided early, easy entry points for exploration, and now serve as the core of many AI-native cloud strategies, appealing to technologists, data scientists, and business teams.

AI infrastructure cloud platforms (neocloud)

AI cloud platforms, or neoclouds, are providing platforms that minimize the use of CPU-based cloud tools (or eliminate it altogether). This approach can be particularly appealing for AI startups and enterprises with “aggressive AI programs,” according to Forrester, and is also a draw for enterprises with strong and growing data science programs.

Data/AI cloud platforms

Data infrastructure providers like Databricks and Snowflake have been using cloud infrastructures from leading providers to hone their own offerings. This has positioned them to provide first-party gen AI tools for model building, fine-tuning, and deployment. This draws on the power of public cloud platforms while insulating customers from those complex infrastructures. This “data/AI pure play” is attractive to enterprises looking to more closely align their data scientists and AI devs with business units, Forrester notes.

Ultimately, when pursuing AI-native cloud options, Forrester advises the following:

  • Start with your primary cloud vendor: Evaluate their AI services and develop a technology roadmap before switching to another provider. Consider adding new vendors if they “dangle a must-have AI capability” your enterprise can’t afford to wait for. Also, tap your provider’s AI training to grow skills throughout the enterprise.
  • Resist the urge of “premature” production deployments: Projects can go awry without sufficient reversal plans, so adopt AI governance that assesses model risk in the context of a particular use case.
  • Learn from your AI initiatives: Take stock of what you’ve done and assess whether your technology needs a refresh or an “outright replacement,” and generalize lessons learned to share across the business.
  • Scale AI-native cloud incrementally based on success in specific domains: Early adoption focused on recommendation and information retrieval and synthesis; internal productivity-boosting apps have since proved advantageous. Start with strategy and prove that the technology can work in a particular area and be translated elsewhere.
  • Take advantage of open-source AI: Managed services platforms like AWS Bedrock, Azure OpenAI, Google Vertex, and others were early entrants in the AI space, but they also offer various open-source opportunities that enterprises of different sizes can customize to their particular needs.

Conclusion

AI-native cloud represents a whole new design philosophy for forward-thinking enterprises. The limits of traditional cloud architectures are becoming increasingly clear, and tomorrow’s complex AI systems can’t be treated as “just another workload.” Next-gen AI-native cloud infrastructures put AI at the core and allow systems to be managed, governed, and improved just like any other mission-critical service.

(image/jpeg; 0.29 MB)

React2Shell: Anatomy of a max-severity flaw that sent shockwaves through the web 29 Dec 2025, 11:03 am

The React 19 library for building application interfaces was hit with a remote code vulnerability, React2Shell, about a month ago. However, as researchers delve deeper into the bug, the larger picture gradually unravels.

The vulnerability enables unauthenticated remote code execution through React Server Components, allowing attackers to execute arbitrary code on affected servers via a crafted request. In other words, a foundational web framework feature quietly became an initial access vector.

What followed was a familiar but increasingly compressed sequence. Within hours of disclosure, multiple security firms confirmed active exploitation in the wild. Google’s Threat Intelligence Group (GTIG) and AWS both reported real-world abuse, collapsing the already-thin gap between vulnerability awareness and compromise.

“React2Shell is another reminder of how fast exploitation timelines have become,” said Nathaniel Jones, field CISO at Darktrace. “The CVE drops, a proof-of-concept is circulating, and within hours you’re already seeing real exploitation attempts.”

That speed matters because React Server Components are not a niche feature. They are embedded into default React and Next.js deployments across enterprise environments, meaning organizations inherited this risk simply by adopting mainstream tooling.

Different reports add new signals

While researchers agreed on the root cause, multiple individual reports have emerged, sharpening the overall picture.

For instance, early analysis by cybersecurity firm Wiz demonstrated how easily an unauthenticated input can traverse the React Server Components pipeline and reach dangerous execution paths, even in clean, default deployments. Unit 42 has expanded on this by validating exploit reliability across environments and emphasizing the minimal variation attackers needed to succeed.

Google and AWS have added operational context by confirming exploitation by multiple threat categories, including state-aligned actors, shortly after disclosure. That validation moved React2Shell out of the “potentially exploitable” category and into a confirmed active risk.

A report from Huntress has shifted focus by documenting post-exploitation behavior. Rather than simple proof-of-concept shells, attackers were observed deploying backdoors and tunneling tools, signalling that React2Shell was already being used as a durable access vector rather than a transient opportunistic hit, the report noted.

However, not all findings amplified urgency. Patrowl’s controlled testing showed that some early exposure estimates were inflated due to version-based scanning and noisy detection logic.

Taken together, the research painted a clearer, more mature picture within days (not weeks) of disclosure.

What the research quickly agreed on

Across early reports from Wiz, Palo Alto Networks’ Unit 42, Google AWS, and others, there was a strong alignment on the core mechanics of React2Shell. Researchers independently confirmed that the flaw lives inside React’s server-side rendering pipeline and stems from unsafe deserialization in the protocol used to transmit component data between client and server.

Multiple teams confirmed that exploitation does not depend on custom application logic. Applications generated using standard tools were vulnerable by default, and downstream frameworks such as Next.js inherited the issue rather than introducing it independently. That consensus reframed React2Shell from a “developer mistake” narrative into a framework-level failure with systemic reach.

This was the inflection point. If secure-by-design assumptions no longer hold at the framework layer, the defensive model shifts from “find misconfigurations” to “assume exposure.”

Speed-to-exploit as a defining characteristic

One theme that emerged consistently across reports was how little time defenders had to react. Jones said Darktrace’s own honeypot was exploited in under two minutes after exposure, strongly suggesting attackers had automated scanning and exploitation workflows ready before public disclosure. “Threat actors already had scripts scanning for the vulnerability, checking for exposed servers, and firing exploits without any humans in the loop,” he said.

Deepwatch’s Frankie Sclafani framed this behavior as structural rather than opportunistic. The rapid mobilization of multiple China-linked groups, he noted, reflected an ecosystem optimized for immediate action. In that model, speed-to-exploit is not a secondary metric but a primary measure of operational readiness. “When a critical vulnerability like React2Shell is disclosed, these actors seem to execute pre-planned strategies to establish persistence before patching occurs,” he said.

This matters because it undercuts traditional patch-response assumptions. Even well-resourced enterprises rarely patch and redeploy critical systems within hours, creating an exposure window that attackers now reliably expect.

What exploitation looked like in practice

Almost immediately after the December 3 public disclosure of React2Shell, active exploitation was observed by multiple defenders. Within hours, automated scanners and attacker tools probed internet-facing React/Next.js services for the flaw.

Threat intelligence teams confirmed that China-nexus state-aligned clusters, including Earth Lumia and Jackpot Panda, were among the early actors leveraging the defect to gain server access and deploy follow-on tooling. Beyond state-linked activity, reports from Unit42 and Huntress detailed campaigns deploying Linux backdoors, reverse proxy tunnels, cryptomining kits, and botnet implants against exposed targets. This was a sign that both espionage and financially motivated groups are capitalizing on the bug.

Data from Wiz and other responders indicates that dozens of distinct intrusion efforts have been tied to React2Shell exploitation, with compromised systems ranging across sectors and regions. Despite these confirmed attacks and public exploit code circulating, many vulnerable deployments remain unpatched, keeping the window for further exploitation wide open.

The lesson React2Shell leaves behind

React2Shell is ultimately less about React than about the security debt accumulating inside modern abstractions. As frameworks take on more server-side responsibility, their internal trust boundaries become enterprise attack surfaces overnight.

The research community mapped this vulnerability quickly and thoroughly. Attackers moved even faster. For defenders, the takeaway is not just to patch, but to reassess what “default safe” really means in an ecosystem where exploitation is automated, immediate, and indifferent to intent.

React2Shell is rated critical, carrying a CVSS score of 10.0, reflecting its unauthenticated remote code execution impact and broad exposure across default React Server Components deployments. React maintainers and downstream frameworks such as Next.js have released patches, and researchers broadly agree that affected packages should be updated immediately.

Beyond patching, they warn that teams should assume exploitation attempts may already be underway. Recommendations consistently emphasize validating actual exposure rather than relying on version checks alone, and actively hunting for post-exploitation behavior such as unexpected child processes, outbound tunneling traffic, or newly deployed backdoors. The message across disclosures is clear: React2Shell is not a “patch when convenient” flaw, and the window for passive response has already closed.

The article first appeared on CSO.

(image/jpeg; 3.85 MB)

AI’s trust tax for developers 29 Dec 2025, 9:00 am

Andrej Karpathy is one of the few people in this industry who has earned the right to be listened to without a filter. As a founding member of OpenAI and the former director of AI at Tesla, he sits at the summit of AI and its possibilities. In a recent post, he shared a view that is equally inspiring and terrifying: “I could be 10X more powerful if I just properly string together what has become available over the last ~year,” Karpathy wrote. “And a failure to claim the boost feels decidedly like [a] skill issue.”

If you aren’t ten times faster today than you were in 2023, Karpathy implies that the problem isn’t the tools. The problem is you. Which seems both right…and very wrong. After all, the raw potential for leverage in the current generation of LLM tools is staggering. But his entire argument hinges on a single adverb that does an awful lot of heavy lifting:

“Properly.”

In the enterprise, where code lives for decades, not days, that word “properly” is easy to say but very hard to achieve. The reality on the ground, backed by a growing mountain of data, suggests that for most developers, the “skill issue” isn’t a failure to prompt effectively. It’s a failure to verify rigorously. AI speed is free, but trust is incredibly expensive.

A vibes-based productivity trap

In reality, AI speed only seems to be free. Earlier this year, for example, METR (Model Evaluation and Threat Research) ran a randomized controlled trial that gave experienced open source developers tasks to complete. Half used AI tools; half didn’t. The developers using AI were convinced the LLMs had accelerated their development speed by 20%. But reality bites: The AI-assisted group was, on average, 19% slower.

That’s a nearly 40-point gap between perception and reality. Ouch.

How does this happen? As I recently wrote, we are increasingly relying on “vibes-based evaluation” (a phrase coined by Simon Willison). The code looks right. It appears instantly. But then you hit the “last mile” problem. The generated code uses a deprecated library. It hallucinates a parameter. It introduces a subtle race condition.

Karpathy can induce serious FOMO with statements like this: “People who aren’t keeping up even over the last 30 days already have a deprecated worldview on this topic.” Well, maybe, but as fast as AI is changing, some things remain stubbornly the same. Like quality control. AI coding assistants are not primarily productivity tools; they are liability generators that you pay for with verification. You can pay the tax upfront (rigorous code review, testing, threat modeling), or you can pay it later (incidents, data breaches, and refactoring). But you’re going to pay sooner or later.

Right now, too many teams think they’re evading the tax, but they’re not. Not really. Veracode’s GenAI Code Security Report found that 45% of AI-generated code samples introduced security issues on OWASP’s top 10 list. Think about that.

Nearly half the time you accept an AI suggestion without a rigorous audit, you are potentially injecting a critical vulnerability (SQL injection, XSS, broken access control) into your codebase. The report puts it bluntly: “Congrats on the speed, enjoy the breach.” As Microsoft developer advocate Marlene Mhangami puts it, “The bottleneck is still shipping code that you can maintain and feel confident about.”

In other words, with AI we’re accumulating vulnerable code at a rate manual security reviews cannot possibly match. This confirms the “productivity paradox” that SonarSource has been warning about. Their thesis is simple: Faster code generation inevitably leads to faster accumulation of bugs, complexity, and debt, unless you invest aggressively in quality gates. As the SonarSource report argues, we’re building “write-only” codebases: systems so voluminous and complex, generated by non-deterministic agents, that no human can fully understand them.

We increasingly trade long-term maintainability for short-term output. It’s the software equivalent of a sugar high.

Redefining the skills

So, is Karpathy wrong? No. When he says he can be ten times more powerful, he’s right. It might not be ten times, but the performance gains savvy developers gain from AI are real or have the potential to be so. Even so, the skill he possesses isn’t just the ability to string together tools.

Karpathy has the deep internalized knowledge of what good software looks like, which allows him to filter the noise. He knows when the AI is likely to be right and when it is likely to be hallucinating. But he’s an outlier on this, bringing us back to that pesky word “properly.”

Hence, the real skill issue of 2026 isn’t prompt engineering. It’s verification engineering. If you want to claim the boost Karpathy is talking about, you need to shift your focus from code creation to code critique, as it were:

  • Verification is the new coding. Your value is no longer defined by lines of code written, but by how effectively you can validate the machine’s output.
  • “Golden paths” are mandatory. As I’ve written, you cannot allow AI to be a free-for-all. You need golden paths: standardized, secured templates. Don’t ask the LLM to write a database connector; ask it to implement the interface from your secure platform library.
  • Design the security architecture yourself. You can’t just tell an LLM to “make this secure.” The high-level thinking you embed in your threat modeling is the one thing the AI still can’t do reliably.

“Properly stringing together” the available tools doesn’t just mean connecting an IDE to a chatbot. It means thinking about AI systematically rather than optimistically. It means wrapping those LLMs in a harness of linting, static application security testing (SAST), dynamic application security testing (DAST), and automated regression testing.

The developers who will actually be ten times more powerful next year aren’t the ones who trust the AI blindly. They are the ones who treat AI like a brilliant but very junior intern: capable of flashes of genius, but requiring constant supervision to prevent them from deleting the production database.

The skill issue is real. But the skill isn’t speed. The skill is control.

(image/jpeg; 19.86 MB)

4 New Year’s resolutions for devops success 29 Dec 2025, 9:00 am

It has been a dramatic and challenging year for developers and engineers working in devops organizations. More companies are using AI and automation for both development and IT operations, including for writing requirements, maintaining documentation, and vibe coding. Responsibilities have also increased, as organizations expect devops teams to improve data quality, automate AI agent testing, and drive operational resiliency.

AI is driving new business expectations and technical capabilities, and devops engineers must keep pace with the speed of innovation. At the same time, many organizations are laying off white-collar workers, including more than 120,000 tech layoffs in 2025.

Devops teams are looking for ways to reduce stress and ensure team members remain positive through all the challenges. At a recent event I hosted on how digital trailblazers reduce stress, speakers suggested several stress reduction mechanisms, including limiting work in progress, bringing humor into the day, and building supportive relationships.

As we head into the new year, now is also a good time for engineers and developers to set goals for 2026. I asked tech experts what New Year’s resolutions they would recommend for devops teams and professionals.

1. Fully embrace AI-enabled software development

Developers and automation engineers have had their world rocked over the last two years, with the emergence of AI copilots, code generators, and vibe coding. Developers typically spend time deepening their knowledge of coding languages and broadening their skills to work across different cloud architectures. In 2026, more of this time should be dedicated to learning AI-enabled software development.

“Develop a growth mindset that AI models are not good or bad, but rather a new nondeterministic paradigm in software that can both create new issues and new opportunities,” says Matthew Makai, VP of developer relations at DigitalOcean. “It’s on devops engineers and teams to adapt to how software is created, deployed, and operated.”

Concrete suggestions for this resolution involve shifting both mindset and activities:

  • Makai suggests automating code reviews for security issues and technical defects, given the rise in AI coding tools that generate significantly more code and can transfer technical debt across the codebase.
  • Nic Benders, chief technical strategist at New Relic, says everyone needs to gain experience with AI coding tools. “For those of us who have been around a while, think of vibe coding as the Perl of today. Go find an itch, then have fun knocking out a quick tool to scratch it.”
  • John Capobianco, head of developer relations at Selector, suggests devops teams should strive to embrace vibe-ops. “We can take the principles and the approach that certain software engineers are using with AI to augment software development in vibe-ops and apply those principles, much like devops to net-devops and devops to vibe-ops, getting AI involved in our pipelines and our workflows.”
  • Robin Macfarlane, president and CEO of RRMac Associates, suggests engineers begin to rethink their primary role not as code developers but as code orchestrators, whether working on mainframes or in distributed computing. “This New Year, resolve to learn the programming language you want AI to code in, resolve to do your own troubleshooting, and become the developer who teaches AI instead of the other way around.”

Nikhil Mungel, director of AI R&D at Cribl, says the real AI skill is learning to review, challenge, and improve AI-generated work by spotting subtle bugs, security gaps, performance issues, and incorrect assumptions. “Devops engineers who pair frequent AI use with strong review judgment will move faster and deliver more reliable systems than those who simply accept AI suggestions at face value.”

Mungel recommends that devops engineers commit to the following practices:

  • Tracing the agent decision graph, not just API calls.
  • Building AI-aware security observability around OWASP LLM Top 10 and MCP risks.
  • Capturing A-specific lineage and incidents in CI/CD and ops runbooks.

Resolution: Develop the skills required to use AI for solving development and engineering challenges.

2. Strengthen knowledge of outcome-based, resilient operations

While developers focus on AI capabilities, operational engineers should target resolutions focused on resiliency. The more autonomous systems are in responding to and recovering from issues, the fewer priority incidents devops teams will have to manage, which likely means fewer instances where teams have to join bridge calls in the middle of the night.

A good place to start is improving observability across APIs, applications, and automations.

“Developers should adopt an AI-first, prevention-first mindset, using observability and AIops to move from reactive fixes to proactive detection and prevention of issues,” says Alok Uniyal, SVP and head of process consulting at Infosys. “Strengthen your expertise in self-healing systems and platform reliability, where AI-driven root-cause analysis and autonomous remediation will increasingly define how organizations meet demanding SLAs.”

As more businesses become data-driven organizations and invest in AI as part of their future of work strategy, another place to start building resiliency is in dataops and data pipelines.

“In 2026, devops teams should get serious about understanding the systems they automate, especially the data layer,” says Alejandro Duarte, developer relations engineer at MariaDB. “Too many outages still come from pipelines that treat databases as black boxes. Understanding multi-storage-engine capabilities, analytical and AI workload support, native replication, and robust high availability features will make the difference between restful weekends and late-night firefights.”

At the infrastructure layer, engineers have historically focused on redundancy, auto-scaling, and disaster recovery. Now, engineers should consider incorporating AI agents to improve resiliency and performance.

“For devops engineers, the resolution shouldn’t be about learning another framework, but about mastering the new operating model—AI-driven self-healing infrastructure,” says Simon Margolis, associate CTO AI and ML at SADA. “Your focus must shift from writing imperative scripts to creating robust observability and feedback loops that can enable an AI agent to truly take action. This means investing in skills that help you define intent and outcomes—not steps—which is the only way to unlock true operational efficiency and leadership growth.”

Rather than learning new AI tools, experts suggest reviewing opportunities to develop new AI capabilities within the platforms already used by the organization.

“A sound resolution for the new year is to stop trying to beat the old thing into some new AI solution and start using AI to augment and improve what we already have,” says Brett Smith, distinguished software engineer at SAS. “We need to finally stop chasing the ‘I can solve this with AI’ hype and start focusing on ‘How can AI help me solve this better, faster, cheaper?’”

Resolution: Shift the operating mindset from problem detection, resolution, and root-cause analysis to resilient, self-healing operations.

3. Learn new technology disciplines

It’s one thing to learn a new product or technology, and it’s a whole other level of growth to learn a new discipline. If you’re an application developer, one new area that requires more attention is understanding accessibility requirements and testing methodologies for improving applications for people with disabilities.

“Integrating accessibility into the devops pipeline should be a top resolution, with accessibility tests running alongside security and unit tests in CI as automated testing and AI coding tools mature,” says Navin Thadani, CEO of Evinced. “As AI accelerates development, failing to fix accessibility issues early will only cause teams to generate inaccessible code faster, making shift-left accessibility essential. Engineers should think hard about keeping accessibility in the loop, so the promise of AI-driven coding doesn’t leave inclusion behind.”

Data scientists, architects, and system engineers should also consider learning more about the Model Context Protocol for AI agent-to-agent communications. One place to start is learning the requirements and steps to configure a secure MCP server.

“Devops should focus on mastering MCP, which is set to create an entirely new app development pipeline in 2026,” says Rishi Bhargava, co-founder of Descope. “While it’s still early days for production-ready AI agents, MCP has already seen widespread adoption. Those who learn to build and authenticate MCP-enabled applications now securely will gain a major competitive edge as agentic systems mature over the next six months.”

Resolution: Embrace being a lifelong learner: Study trends and dig into new technologies that are required for compliance or that drive innovation.

4. Develop transformation leadership skills

In my book, Digital Trailblazer, I wrote about the need for transformation leaders, what I call digital trailblazers, “who can lead teams, evolve sustainable ways of working, develop technologies as competitive differentiators, and deliver business outcomes.”

Some may aspire to CTO roles, while others should consider leadership career paths in devops. For engineers, there is tremendous value in developing communication skills and business acumen.

Yaad Oren, managing director of SAP Labs U.S. and global head of research and innovation at SAP, says leadership skills matter just as much as technical fundamentals. “Focus on clear communication with colleagues and customers, and clear instructions with AI agents. Those who combine continuous learning with strong alignment and shared ownership will be ready to lead the next chapter of IT operations.”

For engineers ready to step up into leadership roles but concerned about taking on direct reports, consider mentoring others to build skills and confidence.

“There is high-potential talent everywhere, so aside from learning technical skills, I would challenge devops engineers to also take the time to mentor a junior engineer in 2026,” says Austin Spires, senior director of developer enablement at Fastly. “Guiding engineers early in their career, whether on hard skills like security or soft skills like communication and stakeholder management, is fulfilling and allows them to grow into one of your best colleagues.”

Another option, if you don’t want to manage people, is to take on a leadership role on a strategic initiative. In a complex job market, having agile program leadership skills can open up new opportunities.

Christine Rogers, people and operations leader at Sisense, says the traditional job description is dying. Skills, not titles, will define the workforce, she says. “By 2026, organizations will shift to skills-based models, where employees are hired and promoted based on verifiable capabilities and adaptability, often demonstrated through real projects, not polished resumes.”

Resolution: Find an avenue to develop leadership confidence, even if it’s not at work. There are leadership opportunities at nonprofits, local government committees, and even in following personal interests.

Happy New Year, everyone!

(image/jpeg; 7.56 MB)

High severity flaw in MongoDB could allow memory leakage 26 Dec 2025, 8:12 pm

Document database vendor MongoDB has advised customers to update immediately following the discovery of a flaw that could allow unauthenticated users to read uninitialized heap memory.

Designated CVE-2025-14847, the bug, mismatched length fields in zlib compressed protocol headers, could allow an attacker to execute arbitrary code and potentially seize control of a device.

The flaw affects the following MongoDB and MongoDB Server versions:

  • MongoDB 8.2.0 through 8.2.3
  • MongoDB 8.0.0 through 8.0.16
  • MongoDB 7.0.0 through 7.0.26
  • MongoDB 6.0.0 through 6.0.26
  • MongoDB 5.0.0 through 5.0.31
  • MongoDB 4.4.0 through 4.4.29
  • All MongoDB Server v4.2 versions
  • All MongoDB Server v4.0 versions
  • All MongoDB Server v3.6 versions

In its advisory, MongoDB “strongly suggested” that users upgrade immediately to the patched versions of the software: MongoDB 8.2.3, 8.0.17, 7.0.28, 6.0.27, 5.0.32, or 4.4.30.

However, it said, “if you cannot upgrade immediately, disable zlib compression on the MongoDB Server by starting mongod or mongos with a networkMessageCompressors or a net.compression.compressors option that explicitly omits zlib.”

MongoDB, one of the most popular NoSQL document databases for developers, says it currently has more than 62,000 customers worldwide, including 70% of the Fortune 100.

(image/jpeg; 8.92 MB)

Reader picks: The most popular Python stories of 2025 26 Dec 2025, 9:00 am

Python 3.14 was the star of the show in 2025, bringing official support for free-threaded builds, a new all-in-one installation manager for Windows, and subtler perks like the new template strings feature. Other great updates this year included a growing toolkit of Rust-backed Python tools, several new options for packaging and distributing Python applications, and a sweet little trove of third-party libraries for parallel processing in Python. Here’s our list of the 10 best and most-read stories for Python developers in 2025. Enjoy!

What is Python? Powerful, intuitive programming
Start here, with a top-down view of what makes Python a versatile powerhouse for modern software development, from data science and machine learning to web development and systems automation.

The best new features and fixes in Python 3.14
Released in October 2025, the latest edition of Python makes free-threaded Python an officially supported feature, adds experimental JIT powers, and brings new tools for managing Python versions.

Get started with the new Python Installation Manager
The newest versions of Python on Microsoft Windows come packaged with this powerful all-in-one tool for installing, updating, and managing multiple editions of Python on the same system.

How to use template strings in Python 3.14
One of Python 3.14’s most powerful new features delivers a whole new mechanism for formatting data in strings, more programmable and powerful than the existing “f-string” formatting system.

PyApp: An easy way to package Python apps as executables
This Rust-powered utility brings to life a long-standing dream in the Python world: It turns hard-to-package Python programs into self-contained click-to-runs.

The best Python libraries for parallel processing
Python’s getting better at doing more than one thing at once, and that’s thanks to its “no-GIL” edition. But these third-party libraries give you advanced tools for distributing Python workloads across cores, processors, and multiple machines.

Amp your Python superpowers with ‘uv run’ | InfoWorld
Astral’s uv utility lets you set up and run Python packages with one command, no setup, no fuss, and nothing to clean up when you’re done.

3 Python web frameworks for beautiful front ends
Write Python code on the back end and generate good-looking HTML/CSS/JavaScript-driven front ends, automatically. Here are three ways to Python-code your way to beautiful front ends.

How to boost Python program performance with Zig
The emerging Zig language, making a name as a safer alternative to C, can also be coupled closely with Python—the better to create Python libraries that run at machine-native speed.

PythoC: A new way to generate C code from Python
This new project lets you use Python as a kind of high-level macro system to generate C-equivalent code that can run as standalone programs, and with some unique memory safety features you won’t find in C.

(image/jpeg; 0.33 MB)

A small language model blueprint for automation in IT and HR 25 Dec 2025, 9:00 am

Large language models (LLMs) have grabbed the world’s attention for their seemingly magical ability to instantaneously sift through endless data, generate responses, and even create visual content from simple prompts. But their “small” counterparts aren’t far behind. And as questions swirl about whether AI can actually generate meaningful returns (ROI), organizations should take notice. Because, as it turns out, small language models (SLMs), which use far fewer parameters, compute resources, and energy than large language models to perform specific tasks, have been shown to be just as effective as their much larger counterparts.

In a world where companies have invested ungodly amounts of money on AI and questioned the returns, SLMs are proving to be an ROI savior. Ultimately, SLM-enabled agentic AI delivers the best of both SLMs and LLMs together — including higher employee satisfaction and retention, improved productivity, and lower costs. And given a report from Gartner that said over 40% of agentic AI projects will be cancelled by the end of 2027 due to complexities and rapid evolutions that often lead enterprises down the wrong path, SLMs can be an important tool in any CIO’s chest.

Take information technology (IT) and human resources (HR) functions for example. In IT, SLMs can drive autonomous and accurate resolutions, workflow orchestration, and knowledge access. And for HR, they’re enabling personalized employee support, streamlining onboarding, and handling routine inquiries with privacy and precision. In both cases, SLMs are enabling users to “chat” with complex enterprise systems the same way they would a human representative.

Given a well-trained SLM, users can simply write a Slack or Microsoft Teams message to the AI agent (“I can’t connect to my VPN,” or “I need to refresh my laptop,” or “I need proof of employment for a mortgage application”), and the agent will automatically resolve the issue. What’s more, the responses will be personalized based on user profiles and behaviors and the support will be proactive and anticipatory of when issues might occur.

Understanding SLMs

So, what exactly is an SLM? It’s a relatively ill-defined term, but generally it is a language model with somewhere between one billion and 40 billion parameters, versus 70 billion to hundreds of billions for LLMs. They can also exist as a form of open source where you have access to their weights, biases, and training code.

There are also SLMs that are “open-weight” only, meaning you get access to model weights with restrictions. This is important because a key benefit with SLMs is the ability to fine-tune or customize the model so you can ground it in the nuance of a particular domain. For example, you can use internal chats, support tickets, and Slack messages to create a system for answering customer questions. The fine-tuning process helps to increase the accuracy and relevance of the responses.

Agentic AI will leverage SLMs and LLMs

It’s understandable to want to use state-of-the-art models for agentic AI. Consider that the latest frontier models score highly on math, software development and medical reasoning, just to name a few categories. Yet the question every CIO should be asking: do we really need that much firepower in our organization? For many enterprise use cases, the answer is no.

And even though they are small, don’t underestimate them. Their small size means they have lower latency, which is critical for real-time processing. SLMs can also operate on small form factors, like edge devices or other resource-constrained environments. 

Another advantage with SLMs is that they are particularly effective with handling tasks like calling tools, API interactions, or routing. This is just what agentic AI was meant to do: carry out actions. Sophisticated LLMs, on the other hand, may be slower, engage in overly reasoned handling of tasks, and consume large amounts of tokens.

In IT and HR environments, the balance among speed, accuracy, and resource efficiency for both employees and IT or HR teams matters. For employees, agentic assistants built on SLMs provide fast, conversational help to solve problems faster. For IT and HR teams, SLMs reduce the burden of repetitive tasks by automating ticket handling, routing, and approvals, freeing staff to focus on higher-value strategic work. Furthermore, SLMs also can provide substantial cost savings as these models use relatively smaller levels of energy, memory, and compute power. Their efficiency can prove enormously beneficial when using cloud platforms. 

Where SLMs fall short

Granted, SLMs are not silver bullets either. There are certainly cases where you need a sophisticated LLM, such as for highly complex multi-step processes. A hybrid architecture — where SLMs handle the majority of operational interactions and LLMs are reserved for advanced reasoning or escalations — allows IT and HR teams to optimize both performance and cost. For this, a system can leverage observability and evaluations to dynamically decide when to use an SLM or LLM. Or, if an SLM fails to get a good response, the next step could then be an LLM. 

SLMs are emerging as the most practical approach to achieving ROI with agentic AI. By pairing SLMs with selective use of LLMs, organizations can create balanced, cost-effective architectures that scale across both IT and HR, delivering measurable results and a faster path to value. With SLMs, less is more.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 4.8 MB)

Microsoft is not rewriting Windows in Rust 24 Dec 2025, 3:15 pm

A job posting by a Microsoft engineer sparked excitement about a project “to eliminate every line of C and C++ from Microsoft by 2030”, replacing it with Rust — but alas for fans of the memory-safe programming language, it turns out this is a personal goal, not a corporate one, and Rust isn’t necessarily even the final target.

Microsoft Distinguished Engineer Galen Hunt posted about his ambitious goal on LinkedIn four days ago, provoking a wave of excitement and concern.

Now he’s been forced to clarify: “My team’s project is a research project. We are building tech to make migration from language to language possible,” he wrote in an update to his LinkedIn post. His intent, he said, was to find like-minded engineers, “not to set a new strategy for Windows 11+ or to imply that Rust is an endpoint.”

Hunt’s project is to investigate how AI can be used to assist in the translation of code from one language to another at scale. “Our North Star is ‘1 engineer, 1 month, 1 million lines of code’,” he wrote.

He’s recruiting an engineer to help build the infrastructure to do that, demonstrating the technology using Rust as the target language and C and C++ as the source.

The successful candidate will join the Future of Scalable Software Engineering team in Microsoft’s CoreAI group, building static analysis and machine learning tools for AI-assisted translation and migration.

Pressure to ditch C and C++ in favor of memory-safe languages such as Rust comes right from the top, with research by Google and Microsoft showing that around 70 percent of all security vulnerabilities in software are caused by memory safety issues.

However, using AI to rewrite code, even in a memory-safe language, may not make things more secure: AI-generated code typically contains more issues than code written by humans, according to research by CodeRabbit.

That’s not stopping some of the biggest software developers pushing ahead with AI-powered software development, though. Already, AI writes 30% of Microsoft’s new code, Microsoft CEO Satya Nadella said in April.

(image/jpeg; 20.13 MB)

Get started with Python’s new native JIT 24 Dec 2025, 9:00 am

JITing, or “just-in-time” compilation, can make relatively slow interpreted languages much faster. Until recently, JITting was available for Python only in the form of specialized third-party libraries, like Numba, or alternate versions of the Python interpreter, like PyPy.

A native JIT compiler has been added to Python over its last few releases. At first it didn’t provide any significant speedup. But with Python 3.15 (still in alpha but available for use now), the core Python development team has bolstered the native JIT to the point where it’s now showing significant performance gains for certain kinds of program.

Speedups from the JIT range widely, depending on the operation. Some programs show dramatic performance improvements, others not at all. But the work put into the JIT is beginning to pay off, and users can start taking advantage of it if they’re willing to experiment.

Activating the Python JIT

By default, the native Python JIT is disabled. It’s still considered an experimental feature, so it has to be manually enabled.

To enable the JIT, you set the PYTHON_JIT environment variable, either for the shell session Python is running in, or persistently as part of your user environment options. When the Python interpreter starts, it checks its runtime environment for the variable PYTHON_JIT. If PYTHON_JIT is unset or set to anything but 1, the JIT is off. If it’s set to 1, the JIT is enabled.

It’s probably not a good idea to enable PYTHON_JIT as a persistent option. If you’re doing this with a user environment where you’re only running Python with the JIT enabled, it might be useful. But for the most part, you’ll want to set PYTHON_JIT manually — for instance, as part of a shell script to configure the environment.

Verifying the JIT is working

For versions of Python with the JIT (Python 3.13 and above), the sys module in the standard library has a new namespace, sys._jit. Inside it are three utilities for inspecting the state of the JIT, all of which return either True or False. The three utilities:

  • sys._jit.is_available(): Lets you know if the current build of Python has the JIT. Most binary builds of Python shipped will now have the JIT available, except the “free-threaded” or “no-GIL” builds of Python.
  • sys._jit.is_enabled(): Lets you know if the JIT is currently enabled. It does not tell you if running code is currently being JITted, however.
  • sys._jit.is_active(): Lets you know if the topmost Python stack frame is currently executing JITted code. However, this is not a reliable way to tell if your program is using the JIT, because you may end up executing this check in a “cold” (non-JITted) path. It’s best to stick to performance measurements to see if the JIT is having any effect.

For the most part, you will want to use sys._jit.is_enabled() to determine if the JIT is available and running, as it gives you the most useful information.

Python code enhanced by the JIT

Because the JIT is in its early stages, its behavior is still somewhat opaque. There’s no end-user instrumentation for it yet, so there’s no way to gather statistics about how the JIT handles a given piece of code. The only real way to assess the JIT’s performance is to benchmark your code with and without the JIT.

Here’s an example of a program that demonstrates pretty consistent speedups with the JIT enabled. It’s a rudimentary version of the Mandelbroit fractal:

from time import perf_counter
import sys

print ("JIT enabled:", sys._jit.is_enabled())

WIDTH = 80
HEIGHT = 40
X_MIN, X_MAX = -2.0, 1.0
Y_MIN, Y_MAX = -1.0, 1.0
ITERS = 500

YM = (Y_MAX - Y_MIN)
XM = (X_MAX - X_MIN)

def iter(c):
    z = 0j
    for _ in range(ITERS):
        if abs(z) > 2.0:
            return False
        z = z ** 2 + c
    return True

def generate():
    start = perf_counter()
    output = []

    for y in range(HEIGHT):
        cy = Y_MIN + (y / HEIGHT) * YM
        for x in range(WIDTH):
            cx = X_MIN + (x / WIDTH) * XM
            c = complex(cx, cy)
            output.append("#" if iter(c) else ".")
        output.append("\n")
    print ("Time:", perf_counter()-start)
    return output

print("".join(generate()))

When the program starts running, it lets you know if the JIT is enabled and then produces a plot of the fractal to the terminal along with the time taken to compute it.

With the JIT enabled, there’s a fairly consistent 20% speedup between runs. If the performance boost isn’t obvious, try changing the value of ITERS to a higher number. This forces the program to do more work, so should produce a more obvious speedup.

Here’s a negative example — a simple recursively implemented Fibonacci sequence. As of Python 3.15a3 it shows no discernible JIT speedup:

import sys
print ("JIT enabled:", sys._jit.is_enabled())
from time import perf_counter

def fib(n):
    if n 1:
        return n
    return fib(n-1) + fib(n-2)

def main():
    start = perf_counter()
    result = fib(36)
    print(perf_counter() - start)

main()

Why this isn’t faster when JITted isn’t clear. For instance, you might be inclined to think using recursion makes the JIT less effective, but even a non-recursive version of the algorithm doesn’t provide any speedup either.

Using the experimental Python JIT

Because the JIT is still considered experimental, it’s worth approaching it in the same spirit as the “free-threaded” or “no-GIL” builds of Python also now being shipped. You can conduct your own experiments with the JIT to see if provides any payoff for certain tasks, but you’ll always want to be careful about using it in any production scenario. What’s more, each alpha and beta revision of Python going forward may change the behavior of the JIT. What was once performant might not be in the future, or vice versa!

(image/jpeg; 7.1 MB)

AI power tools: 6 ways to supercharge your terminal 24 Dec 2025, 9:00 am

The command line has always been the bedrock of the developer’s world. Since time immemorial, the CLI was a static place defined by the REPL (read-evaluate-print-loop). But now modern AI tools are changing that.

The CLI tells you in spartan terms what is happening with your program, and it does exactly what you tell it to. The lack of frivolity and handholding is both the command-line’s power and its one major drawback. Now, a new class of AI tools seeks to preserve the power of the CLI while upgrading it with a more human-friendly interface.

These tools re-envision the REPL (the read-evaluate-print-loop) as a reason-evaluate loop. Instead of telling your operating system what to do, you just give it a goal and set it loose. Rather than reading the outputs, you can have them analyzed with AI precision. For the lover of the CLI—and everyone else who programs—the AI-powered terminal is a new and fertile landscape.

Gemini CLI

Gemini CLI is an exceptionally strong agent that lets you run AI shell commands. Able to analyze complex project layouts, view outputs, and undertake complex, multipart goals, Gemini CLI isn’t flawless, but it warms the command line-enthusiast’s heart.

A screenshot of the Google Gemini CLI.
Google’s Gemini comes to the command line.

Matthew Tyson

Gemini CLI recently added in-prompt interactivity support, like running vi inside the agent. This lets you avoid dropping out of the AI (or launching a new window) to do things like edit a file or run a long, involved git command. The AI doesn’t retain awareness during your interactions (you can use Ctrl-f to shift focus back to it), but it does observe the outcome when you are done, and may take appropriate actions such as running unit tests after closing vi.

Copilot is rumored to have better Git integration, but I’ve found Gemini performs just fine with git commands.

Like every other AI coding assistant, Gemini CLI can get confused, spin in circles, and spawn regressions, but the actual framing and prompt console are among the best. It feels fairly stable and solid. It does require some adjustments, such as being unable to navigate the file system (e.g., cd /foo/bar) because you’re in the agent’s prompt and not a true shell.

GitHub Copilot CLI

Copilot’s CLI is just as solid as Gemini’s. It handled complex tasks (like “start a new app that lets you visit endpoints that say hello in different languages”) without a hitch. But it’s just as nice to be able to do simple things quickly (like asking, “what process is listening on port 8080?”) without having to refresh system memory.

A screenshot of the GitHub Copilot CLI.
The ubiquitous Copilot VS Code extension, but for the terminal environment.

Matthew Tyson

There are still drawbacks, of course, and even simple things can go awry. For example, if the process listening on 8080 was run with systemctl, Copilot would issue a simple kill command.

Copilot CLI’s ?? is a nice idea, letting you provide a goal to be turned into a prompt—?? find the largest file in this directory yields find . -type f -exec du -h {} + 2>/dev/null | sort -rh | head -10— but I found the normal prompt worked just as well.

I noticed at times that Copilot seemed to choke and hang (or take inordinately long to complete) on larger steps, such as Creating Next.js project (Esc to cancel · 653 B).

In general, I did not find much distinction between Gemini and Copilot’s CLIs; both are top-shelf. That’s what you would expect from the flagship AI terminal tools from Google and Microsoft. The best choice likely comes down to which ecosystem and company you prefer.

Ollama

Ollama is the most empowering CLI in this bunch. It lets you install and run pre-built, targeted models on your local machine. This puts you in charge of everything, eliminates network calls, and discards any reliance on third-party cloud providers (although Ollama recently added cloud providers to its bag of tricks).

A screenshot of the Ollama CLI.
The DIY AI engine.

Matthew Tyson

Ollama isn’t an agent itself but is the engine that powers many of them. It’s “Docker for LLMs”—a simple command-line tool that lets you download, manage, and run powerful open source models like Llama 3 and Mistral directly on your own machine. You run ollama pull llama3 and then ollama run llama3 "..." to chat. (Programmers will especially appreciate CodeLlama.)

Incidentally, if you are not in a headless environment (like Windows) Ollama will install a simple GUI for managing and interacting with installed models (both local and cloud).

Ollama’s killer feature is privacy and offline access. Since the models run entirely locally, none of your prompts or code ever leaves your machine. It’s perfect for working on sensitive projects or in secure environments.

Ollama is an AI server, which gives you an API so that other tools (like Aider, OpenCode, or NPC Shell) can use your local models instead of paying for a cloud provider. The Ollama chat agent doesn’t compete with interactive CLIs like Gemini, Copilot, and Warp (see below); it’s more of a straight REPL.

The big trade-off is performance. You are limited by your own hardware, and running the larger models requires powerful (preferably Nvidia) GPUs. The choice comes down to power versus privacy: You get total control and security, but you’re responsible for bringing the horsepower. (And, in case you don’t know, fancy GPUs are expensive—even provisioning a decent one on the cloud can cost hundreds of dollars per month.)

Aider

Aider is a “pair-programming” tool that can use various providers as the AI back end, including a locally running instance of Ollama (with its variety of LLM choices). Typically, you would connect to an OpenRouter account to provide access to any number of LLMs, including free-tier ones.

A screenshot of the Aider CLI.
The agentic layer.

Matthew Tyson

Once connected, you tell Aider what model you want to use when launching it; e.g., aider --model ollama_chat/llama3.2:3b. That will launch an interactive prompt relying on the model for its brains. But Aider gives you agentic power and will take action for you, not just provide informative responses.

Aider tries to maintain a contextual understanding of your filesystem, the project files, and what you are working on. It also is designed to understand git, suggesting that you init a git project, committing as you go, and providing sensible commit messages. The core capability is highly influenced by the LLM engine, which you provide.

Aider is something like using Ollama but at a higher level. It is controlled by the developer; provides a great abstraction layer with multiple model options; and layers on a good deal of ability to take action. (It took me some wrangling with the Python package installations to get everything working in Aider, but I have bad pip karma.)

Aider is something like Roo Code, but for the terminal, adding project-awareness for any number of models. If you give it a good model engine, it will do almost everything that the Gemini or Copilot CLI does, but with more flexibility. The biggest drawback compared to those tools is probably having to do more manual asset management (like using the /add command to bring files into context).

AI Shell

Built by the folks at Builder.io, AI Shell focuses on creating effective shell commands from your prompts. Compared to the Gemini and Copilot CLIs, it’s more of a quick-and-easy utility tool; something to keep the terminal’s power handy without having to type out commands.

A screenshot of AI Shell.
The natural-language commander.

Matthew Tyson

AI Shell will take your desired goal (e.g., “$ ai find the process using the most memory right now and kill it”) and offer working shell commands in response. It will then ask if you want to run it, edit it, copy, or cancel the command. This makes AI Shell a simple place to drop into, as needed, from the normal command prompt. You just type “ai” followed by whatever you are trying to do.

Although it’s a handy tool, the current version of AI Shell can only use an OpenAI API, which is a significant drawback. There is no way to run AI Shell in a free tier, since OpenAI no longer offers free API access.

Warp

Warp started life as a full-featured terminal app. Its killer feature is that it gives you all the text and control niceties in a cross-platform, portable setup. Unlike the Gemini and Copilot CLI tools, which are agents that run inside an existing shell, Warp is a full-fledged, standalone GUI application with AI integrated at its core.

A screenshot of the Warp CLI.
The terminal app, reimagined with AI.

Matthew Tyson

Warp is a Rust-based, modern terminal that completely reimagines the user experience, moving away from the traditional text stream to a more structured, app-like interface.

Warp’s AI is not a separate prompt but is directly integrated with the input block. It has two basic modes: The first is to type # followed by a natural language query (e.g., “# find all files over 10 megs in this dir”), which Warp AI will translate into the correct command.

The second mode is the more complex, multistep agent mode (“define a cat-related non-blocking endpoint using netty”), which you enter with Ctrl-space.

An interesting feature, Warp Workflows are parameterized commands that you can save and share. You can ask the AI to generate a workflow for a complex task (like a multistage git rebase) and then supply it with arguments at runtime.

The main drawback for some CLI purists is that Warp is not a traditional CLI. It’s a block-based editor, which treats inputs and outputs as distinct chunks. That can take some getting used to—though some find it an improvement. In this regard, Warp breaks compatibility with many traditional terminal multiplexers like tmux/screen. Also, its AI features are tied to user accounts and a cloud back end, which likely raises privacy and offline-usability concerns for some developers.

All that said, Warp is a compelling AI terminal offering, especially if you’re looking for something different in your CLI. Aside from its AI facet, Warp is somewhere between a conventional shell (like Bash) and a GUI.

Conclusion

If you currently don’t like using a shell, these tools will make your life much easier. You will be able to do many of the things that previously were painful enough to make you think, “there must be a better way.” Now there is, and you can monitor processes, sniff TCP packets, and manage perms like a pro.

If you, like me, do like the shell, then these tools will make the experience even better. They give you superpowers, allowing you to romp more freely across the machine. If you tend (like I do) to do much of your coding from the command line, checking out these tools is an obvious move.

Each tool has its own idiosyncrasies of installation, dependencies, model access, and key management. A bit of wrestling at first is normal—which most command-line jockeys won’t mind.

(image/jpeg; 22.45 MB)

Deno adds tool to run NPM and JSR binaries 24 Dec 2025, 1:16 am

Deno 2.6, the latest version of the TypeScript, JavaScript, and WebAssembly runtime, adds a tool, called dx, to run binaries from NPM and JSR (JavaScript Registry) packages.

The update to the Node.js rival was announced December 10; installation instructions can be found at docs.deno.com. Current users can upgrade by running the deno upgrade command in their terminal.

In Deno 2.6, dx is an equivalent to the npx command. With dx, users should find it easier to run package binaries in a familiar fashion, according to Deno producer Deno Land. Developers can enjoy the convenience of npxwhile leveraging Deno’s robust security model and performance optimizations, Deno Land said.

Also featured in Deno 2.6 is more granular control over permissions, with --ignore-read and --ignore-env flags for selectively ignoring certain file reads or environment variable access. Instead of throwing a NotCapable error, users can direct Deno to return a NotFounderror and undefined respectively.

Deno 2.6 also integrates tsgo, an experimental type checker for TypeScript written in Go. This type checker is billed as being significantly faster than the previous implementation, which was written in TypeScript.

Other new capabilities and improvements in Deno 2.6:

  • For dependency management, developers can control the minimum age of dependencies, ensuring that a project only uses dependencies that have been vetted. This helps reduce the risk of using newly published packages that may contain malware or breaking changes shortly after release.
  • A deno audit subcommand helps identify security vulnerabilities in dependencies by checking the GitHub CVE database. This command scans and generates a report for both JSR and NPM packages.
  • The--lockfile-onlyflag for deno install allows developers to update a lockfile without downloading or installing the actual packages. This is particularly useful in continuous integration environments where users want to verify dependency changes without modifying their node_modules or cache.
  • The deno approve-scripts flag replaces the deno install --allow-scripts flag, enabling more ergonomic and granular control over which packages can run these scripts.
  • Deno’s Node.js compatibility layer continues to mature in Deno 2.6, with improvements across file operations, cryptography, process management, and database APIs, according to Deno Land.

(image/jpeg; 14.91 MB)

Rust vision group seeks enumeration of language design goals 23 Dec 2025, 10:55 pm

To help the Rust language continue scaling across domains and usage levels, the Rust Vision Doc group recommends enumerating the design goals for evolving the language while also improving the crates package system.

These suggestions were made in a December 19 blog post titled, “What do people love about Rust?” The group made the following specific recommendations:

  • Enumerate and describe Rust design goals and integrate them into processes, helping to ensure these are observed by future language designers and the broader ecosystem.
  • Double down on extensibility, introducing the ability for crates to influence the development experience and the compilation pipeline.
  • Help users to navigate the crates.io ecosystem and enable smoother interop.

In seeking to explain developers’ strong loyalty to Rust, the vision doc group found that, based on interviews of Rust users, developers love Rust for its balance of virtues including reliability, efficiency, low-level control, supportive tooling, and extensibility. Additionally, one of the most powerful aspects of Rust cited by developers is the way that its type system allows modeling aspects of the application domain. This prevents bugs and makes it easier to get started with Rust, the Rust vision doc group said.

The group said that each of these attributes was necessary for versatility across domains. However, when taken too far, or when other attributes are missing, they can become an obstacle, the group noted. One example cited was Rust’s powerful type system, which allows modeling the application domain and prevents bugs but sometimes feels more complex than the problem itself. Another example cited was async Rust, which has fueled a huge jump in using Rust to build network systems but feels “altogether more difficult” than sync Rust. A third obstacle, the group said, was the wealth of crates on crates.io, which are a key enabler but also offer a “tyrrany of choice” that becomes overwhelming. Ways are needed to help users navigate the crates.io ecosystem.

The group recommended creating an RFC that defines the goals sought as work is done on Rust. The RFC should cover the experience of using Rust in total (language, tools, and libraries). “This RFC could be authored by the proposed User Research team, though it’s not clear who should accept it—perhaps the User Research team itself, or perhaps the leadership council,” the group said.

(image/jpeg; 7.49 MB)

WhatsApp API worked exactly as promised, and stole everything 23 Dec 2025, 11:38 am

Security researchers have uncovered a malicious npm package that poses as a legitimate WhatsApp Web API library while quietly stealing messages, credentials, and contact data from developer environments.

The package, identified as “lotusbail,” operates as a trojanized wrapper around a genuine WhatsApp client library and had accumulated more than 50k downloads by the time it was flagged by Koi Security.

“With over 56000 downloads and functional code that actually works as advertised, it is the kind of dependency developers install without a second thought,” Koi researchers said in a blog post. “The package has been available on npm for 6 months and is still live at the time of writing.”

Stolen data was encrypted and exfiltrated to attacker-controlled infrastructure, reducing the likelihood of detection by network monitoring tools. Even more concerning for enterprises is the fact that Lotusbail abuses WhatsApp’s multi-device pairing to maintain persistence on compromised accounts even after the package is removed.

Legitimate API uses a proxy for threat

According to the researchers, lotusbail initially didn’t appear to be anything more than a helpful fork of the legitimate “@whiskeysockets/baileys” library used for interacting with WhatsApp via WebSockets. Developers could install it, send messages, receive messages, and never notice anything wrong.

Further probing, however, revealed an issue.

The package wrapped the legitimate WhatsApp WebSocket client in a malicious proxy layer that transparently duplicated every operation, including the ones involving sensitive data. During authentication, the wrapper captured session tokens and keys. Every message flowing through the application was intercepted, logged, and prepared for covert transmission to attacker-controlled infrastructure.

Additionally, the stolen information was protected en route. Rather than sending credentials and messages in plaintext, the malware employs a custom RSA encryption layer and multiple obfuscation strategies, making detection by network monitoring tools harder and allowing exfiltration to proceed under the radar.

“The exfiltration server URL is buried in encrypted configuration strings, hidden inside compressed payloads,” the researchers noted. “The malware uses four layers of obfuscation: Unicode variable manipulation, LZString compression, Base-91 encoding, and AES encryption. The server location isn’t hardcoded anywhere visible.”

Backdoor sticks around even after package removal

Koi said the most significant component of the attack was its persistence. WhatsApp allows users to link multiple devices to a single account through a pairing process involving an 8-character code. The malicious lotusbail package hijacked this mechanism by embedding a hardcoded pairing code that effectively added the attacker’s device as a trusted endpoint on the user’s WhatsApp account.

Even if developers or organizations later uninstalled the package, the attacker’s linked device remained connected. This allowed the attack to persist until the WhatsApp user manually unlinked all devices from the settings panel.

Persistent access allows the attackers to continue reading messages, harvesting contacts, sending messages on behalf of victims, and downloading media long after the initial exposure.

What must developers and defenders do?

Koi disclosure noted that traditional safeguards, based on reputation metrics, metadata checks, or static scanning, fail when malicious logic mimics legitimate behavior.

“The malware hides in the gap between ‘this code works’ and ‘this code does only what it claims’,” the researchers said, adding that such supply-chain threats require monitoring package behavior at runtime rather than relying on static checks alone. They recommended looking for (or relying on tools that can) warning signs, such as custom RSA encryption routines and dozens of embedded anti-debugging mechanisms in the malicious code.

The package remains available on npm, with its most recent update published just five days ago. GitHub, which has owned npm since 2020, did not immediately respond to CSO’s request for comment.

(image/jpeg; 1.18 MB)

When is an AI agent not really an agent? 23 Dec 2025, 9:00 am

If you were around for the first big wave of cloud adoption, you’ll remember how quickly the term cloud was pasted on everything. Anything with an IP address and a data center suddenly became a cloud. Vendors rebranded hosted services, managed infrastructure, and even traditional outsourcing as cloud computing. Many enterprises convinced themselves they had modernized simply because the language on the slides had changed. Years later, they discovered the truth: They hadn’t transformed their architecture; they had just renamed their technical debt.

That era of “cloudwashing” had real consequences. Organizations spent billions on what they believed were cloud-native transformations, only to end up with rigid architectures, high operational overhead, and little of the promised agility. The cost was not just financial; it was strategic. Enterprises that misread the moment lost time they could never recover.

We are now repeating the pattern with agentic AI, this time faster.

What ‘agentic’ is supposed to mean

If you believe today’s marketing, everything is an “AI agent.” A basic workflow worker? An agent. A single large language model (LLM) behind a thin UI wrapper? An agent. A smarter chatbot with a few tools integrated? Definitely an agent. The issue isn’t that these systems are useless. Many are valuable. The problem is that calling almost anything an agent blurs an important architectural and risk distinction.

In a technical sense, an AI agent should exhibit four basic characteristics:

  • Be able to pursue a goal with a degree of autonomy, not merely follow a rigid, prescripted flow
  • Be capable of multistep behavior, meaning it plans a sequence of actions, executes them, and adjusts along the way
  • Adapt to feedback and changing conditions rather than failing outright on the first unexpected input
  • Be able to act, not just chat, by invoking tools, calling APIs, and interacting with systems in ways that change state

If you have a system that simply routes user prompts to an LLM and then passes the output to a fixed workflow or a handful of hardcoded APIs, it could be useful automation. However, calling it an agentic AI platform misrepresents both its capabilities and its risks. From an architecture and governance perspective, that distinction matters a lot.

When hype becomes misrepresentation

Not every vendor using the word agent is acting in bad faith. Many are simply caught in the hype cycle. Marketing language is always aspirational to some degree, but there’s a point where optimism crosses into misrepresentation. If a vendor knows its system is mainly a deterministic workflow plus LLM calls but markets it as an autonomous, goal-seeking agent, buyers are misled not just about branding but also about the system’s actual behavior and risk.

That type of misrepresentation creates very real consequences. Executives may assume they are buying capabilities that can operate with minimal human oversight when, in reality, they are procuring brittle systems that will require substantial supervision and rework. Boards may approve investments on the belief that they are leaping ahead in AI maturity, when they are really just building another layer of technical and operational debt. Risk, compliance, and security teams may under-specify controls because they misunderstand what the system can and cannot do.

Whether or not this crosses the legal threshold for fraud, treat it as a fraud-level governance problem. The risk to the enterprise is similar: misallocated capital, misaligned strategy, and unanticipated exposure.

Signs of ‘agentwashing’

In practice, agentwashing tends to follow a few recognizable patterns. Be wary when you realize that a vendor cannot explain, in clear technical language, how their agents decide what to do next. They talk vaguely about “reasoning” and “autonomy,” but when pressed, everything trickles down to prompt templates and orchestration scripts.

Take note if an architecture often relies on a single LLM call with minimal glue code wrapped around it, especially if the slides imply a dynamic society of cooperating agents planning, delegating, and adapting in real time. If you strip away the branding, does it resemble traditional workflow automation combined with stochastic text generation?

Listen carefully for promises of “fully autonomous” processes that still require humans to monitor, approve, and correct most critical steps. There is nothing wrong with keeping humans in the loop—it’s essential in most enterprises. However, misleading language can suggest a false sense of autonomy.

These gaps between story and reality are not cosmetic. They directly affect how you design controls, structure teams, and measure success or failure.

Be laser-focused on specifics

At the time, we did not challenge cloudwashing aggressively enough. Too many boards and leadership teams accepted labels in place of architecture. Today, agentic AI will have an even greater impact on core business processes, regulatory scrutiny, and complex security and safety implications. It also carries significantly higher long-term costs if the architecture is wrong.

This time around, enterprises need to be much more disciplined.

First, name the behavior. Call it agentwashing when a product labeled as agentic is merely orchestration, an LLM, and some scripts. The language you use internally will shape how seriously people treat the issue.

Second, demand evidence instead of demos. Polished demos are easy to fake, but architecture diagrams, evaluation methods, failure modes, and documented limitations are harder to counterfeit. If a vendor can’t clearly explain how their agents reason, plan, act, and recover, that should raise suspicion.

Third, tie vendor claims directly to measurable outcomes and capabilities. That means contracts and success criteria should be framed around quantifiable improvements in specific workflows, explicit autonomy levels, error rates, and governance boundaries, rather than vague goals like “autonomous AI.”

Finally, reward vendors that are precise and honest about the technology’s actual state. Some of the most credible solutions in the market today are intentionally not fully agentic. They might be supervised automation with narrow use cases and clear guardrails. That is perfectly acceptable and, in many cases, preferable, as long as everyone is clear about what is being deployed.

Agentwashing is a red flag

Whether regulators eventually decide that certain forms of agentwashing meet the legal definition of fraud remains an open question. Enterprises do not need to wait for that answer.

From a governance, risk, or architectural perspective, treat agentwashing as a serious red flag. Scrutinize it with the same rigor you would apply to financial representations. Challenge it early, before it becomes embedded in your strategic road map. Refuse to fund it without technical proof and clear alignment with business outcomes.

The most important financial lessons we learned in the cloud era usually related to cloudwashing during its initial implementation. We’re on a similar trajectory with agentic AI, but the potential blast radius is larger. As with cloud conversions, the enterprises that have the most success with agentic AI will insist, from the start, on technical and ethical honesty from vendors and internal staff.

This time around, it’s even more important to know what you’re buying.

(image/jpeg; 1.57 MB)

Stop letting ‘urgent’ derail delivery. Manage interruptions proactively 23 Dec 2025, 4:00 am

As engineers and managers, we all have been interrupted by those unplanned, time-sensitive requests (or tasks) that arrive outside normal planning cadences. An “urgent” Slack, a last-minute requirement or an exec ask is enough to nuke your standard agile rituals. Apart from randomizing your sprint, it causes thrash for existing projects and leads to developer burnout. This is even more critical now in the AI-accelerated landscape, where overall volatility has increased with improved developer productivity. Randomizations are no longer edge cases; they are the norm.

Google’s DORA 2025 report found that “AI’s primary role in software development is that of an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.” Teams that are not equipped to manage the increased volatility end up in chaos and their engineers pay the price. The fix isn’t heroics; rather, it is simple strategies that must be applied consistently and managed head-on. 

Recognize the pitfalls and avoid them!

Existing team-level mechanisms like mid-sprint checkpoints provide teams the opportunity to “course correct”; however, many external randomizations arrive with an immediacy. This results in preempted work, fragmented attention and increases the delivery risk for teams. Let’s look at how some existing team practices fail:

  • We’ll cross that bridge when we get there. I have often seen teams shoot themselves in the foot by planning to use 100% capacity in their regular planning cycles, only to scramble when they need some triage bandwidth. This leaves no runway for immediate triage when external randomizations land mid-cycle.
  • The squeaky wheel gets the grease. Another common pitfall is that the loudest voice wins by default. Randomizations arrive through inconsistent channels like emails, chat pings, hallway conversations, etc. Sometimes I have seen that the loudest voice uses all available channels at the same time! Just because someone’s the loudest, does not mean their request is the top priority.
  • A self-fulfilling prophecy. Treating everything as “urgent” or randomization also dilutes the concept. We must understand that backlog reshuffling (say, during team planning sessions), planned handoffs or context switches, etc., do not need teams to pivot abruptly and should not be considered as randomizations. 

Here are a few ideas on how to avoid these pitfalls:

  • Reserve dedicated triage bandwidth: Teams must be deliberate about randomizations. Teams should consider managing external randomizations as a swim lane with dedicated capacity. Teams that experience variable demand should reserve 5–10% of capacity as a buffer. These can be tuned monthly.
  • Streamline Intake: Teams need not spend their time reconciling competing narratives across different channels; instead, they should create a single intake channel backed by a lightweight form (ex. Jira tickets). The form includes all information needed for triage, like change/feature needed, impact, affected customers and owner.
  • Determine priority: There are several ways to determine the priority of tasks. For our team, the Eisenhower Matrix turned out to be the most effective at identifying priorities. It classifies work by urgency (time sensitivity) and importance (business/customer impact), making prioritization decisions straightforward. Items that are both urgent and important (“Do now”) are immediately scheduled, while everything else gets deferral treatment.

How can this be operationalized sustainably?

The above ideas form a baseline for how to process the randomizations as they come in. However, teams often fail to consistently follow these practices. Below ideas will help teams make this baseline repeatable and sustainable:

Make it intentional (cultural shift)

Let’s ensure we understand that randomizations are part of serving evolving business priorities; they are not noise. Teams benefit from a mindset shift where randomizations are not seen as a friction to eliminate but as signals to be handled with intent. 

A few years back, our team’s monthly retrospectives found Job Satisfaction nosediving for a few months, until we identified its correlation to an increase in randomizations (and corresponding thrash). I invited an Agile Coach to discuss this issue, where we ultimately realized our cultural and mechanism gaps. With that mindset shift, the team was able to resolve the concerns by intentionally formalizing the randomization management flow: Intake → Triage → Prioritize → Execute (Rinse & Repeat). Where needed, promptly communicate to leadership about changes to existing commitments. 

Be frugal with time (bounded execution)

Even well-triaged items can spiral into open-ended investigations and implementations that the team cannot afford. How do we manage that? Time-box it. Just a simple “we’ll execute for two days, then regroup” goes a long way in avoiding rabbit-holes.

The randomization is for the team to manage, not for an individual. Teams should plan for handoffs as a normal part of supporting randomizations. Handoffs prevent bottlenecks, reduce burnout and keep the rest of the team moving. Use of well-defined stopping points, assumptions log, reproduction steps and spike summaries are some ideas for teams to make hand-offs easier.

Escalate early

In cases where there are disagreements on priority, teams should not delay asking for leadership help. For instance, Stakeholder B came up with a higher priority ask, but Stakeholder A is not aligned with their existing task to be deprioritized. This does not mean the team needs to complete both. I have seen such delays lead to quiet stretching, slipped dates and avoidable burnout. The goal is not to push problems upward, but to enable timely decisions, so that the team works on business priorities. A formal escalation mechanism on our team reduced the % unplanned work per sprint by around 40% when we implemented it.

Instrument, review and improve

Without making it a heavy lift, teams should capture and periodically review health metrics. For our team, % unplanned work, interrupts per sprint, mean time to triage and periodic sentiment survey helped a lot. Teams should review these within their existing mechanisms (ex., sprint retrospectives) for trend analysis and adjustments. 

Thankfully, a good part of this measurement and tracking can now be automated with AI agents. Teams can use a “sprint companion” that can help classify intake, compute metrics, summarize retrospectives and prompt follow-ups to keep consistent discipline.

Final thoughts

When teams treat randomizations as a managed class of work, interrupts can be handled in hours, avoiding multi-day churn! It helps transform chaos into clarity, protects delivery, reduces burnout and builds trust with stakeholders. I have seen this firsthand in our teams, and I encourage you to make it part of your playbook.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

(image/jpeg; 2.76 MB)

Microsoft previews C++ code editing tools for GitHub Copilot 22 Dec 2025, 11:02 pm

Microsoft is providing early access to C++ code editing tools for GitHub Copilot via the Visual Studio 2026 Insiders channel. These C++ tools allow GitHub Copilot to go beyond file searches and unlock greater context-aware refactoring that enables changes across multiple files and sections, according to Microsoft.

Public availability was announced December 16, with the blog also offering instructions on getting started with the tools. The C++ code editing tools for Copilot had been made available in a private preview on November 12.

Microsoft said the C++ code editing tools offer rich context for any symbol in a project, enabling Copilot agent mode to view all references across a code base, understand metadata such as type, scope, and declaration, visualize class inheritance hierarchies, and trace function call chains. These capabilities help Copilot accomplish complex C++ editing tasks with greater accuracy and speed.

Future plans call for expanding the C++ editing tools support to other GitHub Copilot surfaces, such as Visual Studio Code, to further empower agent-driven edits for C++. Additionally, Microsoft seeks feedback on how to improve the C++ tools experience. Users can report problems or suggest improvements through the Visual Studio feedback icon.

(image/jpeg; 8.37 MB)

Page processed in 0.046 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.