AWS is still chasing a cohesive enterprise AI story after re:Invent 9 Dec 2025, 2:58 pm

AWS kicked off re:Invent 2025 with a defensive urgency that is unusual for the cloud leader, arriving in Las Vegas under pressure to prove it can still set the agenda for enterprise AI.

With Microsoft and Google tightening their grip on CIOs’ mindshare through integrated AI stacks and workflow-ready agent platforms, AWS CEO Matt Garman and his lieutenants rolled out new chips, models, and platform enhancements, trying to knit the updates into a tighter pitch that AWS can still offer CIOs the broadest and most production-ready AI foundation.

Analysts remain unconvinced that AWS succeeded.

“We are closer, but not done,” said David Linthicum, independent consultant and retired chief cloud strategy officer at Deloitte.

Big swing but off target

Garman’s biggest swing, at least the one that got it “closer”, came in the form of Nova Forge,  a new service with which AWS is attempting to confront one of its strategic weaknesses: the absence of a unified narrative that ties data, analytics, AI, and agents into a single, coherent pathway for enterprises to adopt.

It’s this cohesion that Microsoft has been selling aggressively to CIOs with its recently launched IQ set of offerings.

Unlike Microsoft’s IQ stack, which ties agents to a unified semantic data layer, governance, and ready-made business-context tools, Nova Forge aims to provide enterprises raw frontier-model training power in the form of a toolkit to build custom models with proprietary data, rather than a pre-wired, workflow-ready AI platform.

But it still requires too much engineering lift to adopt, analysts say.

AWS is finally positioning agentic AI, Bedrock, and the data layer as a unified stack instead of disconnected services, but according to Linthicum, “It’s still a collection of parts that enterprises must assemble.”

There’ll still be a lot of work for enterprises wanting to make use of the new services AWS introduced, said Phil Fersht, CEO of HFS Research.

“Enterprise customers still need strong architecture discipline to bring the parts together. If you want flexibility and depth, AWS is now a solid choice. If you want a fully packaged, single-pane experience, the integration still feels heavier than what some competitors offer,” he said.

Powerful tools instead of turnkey solutions

The engineering effort needed to make use of new features and services echoed across other AWS announcements, with the risk that they will confuse CIOs rather than simplify their AI roadmap.

On day two of the event, Swami Sivasubramanian’s announced new features across Bedrock AgentCore, Bedrock, and SageMaker AI to help enterprises move their agentic AI pilots to production, but still focused on providing tools that accelerate tasks for developers rather than offering “plug-and-play agents” by default, Linthicum said.

The story didn’t change when it came to AWS’s update to vibe-coding tool Kiro or the new developer-focused agents it introduced to simplify devops, said Paul Nashawaty, principal analyst at The Cube Research.

“AWS clearly wants to line up against Copilot Studio and Gemini Agents. Functionally, the gap is closing,” said Nashawaty. “The difference is still the engineering lift. Microsoft and Google simply have tighter productivity integrations. AWS is getting there, but teams may still spend a bit more time wiring things together depending on their app landscape.”

Similarly, AWS made very little progress toward delivering a more unified AI platform strategy. Analysts had looked to the hyperscaler to address complexity around the fragmentation of its tools and services by offering more opinionated MLops paths, deeper integration between Bedrock and SageMaker, and ready-to-use patterns that help enterprises progress from building models to deploying real agents at scale.

Linthicum was dissatisfied with efforts by AWS to better document and support the connective tissue between Bedrock, SageMaker, and the data plane. “The fragmentation hasn’t vanished,” he said. “There are still multiple ways to do almost everything.”

The approach taken by AWS contrasts sharply with those of Microsoft and Google to present more opinionated end-to-end stories, Linthicum said, calling out Azure’s tight integration around Fabric and Google’s around its data and Vertex AI stack.

Build or buy?

For CIOs who were waiting to see what AWS delivered before finalizing their enterprise AI roadmap, they are back at a familiar fork: powerful primitives versus turnkey platforms.

They will need to assess whether their teams have the architectural discipline, MLops depth, and data governance foundation to fully capitalize on AWS’s latest additions to its growing modular stack, said Jim Hare, VP analyst at Gartner.

“For CIOs prioritizing long-term control and customization, AWS offers unmatched flexibility; for those seeking speed, simplicity, and seamless integration, Microsoft or Google may remain the more pragmatic choice in 2026,” Hare said.

The decision, as so often, comes down to whether the enterprise wants to build its AI platform or just buy one.

This article first appeared on CIO.com.

(image/jpeg; 27.27 MB)

Spotlight: Making the most of multicloud 9 Dec 2025, 10:00 am

Download the December 2025 issue of the Enterprise Spotlight from the editors of CIO, Computerworld, CSO, InfoWorld, and Network World.

(image/jpeg; 1.34 MB)

The hidden cost of Amazon Nova 2 9 Dec 2025, 9:00 am

Amazon’s Nova 2 announcement at AWS re:Invent 2025 is exactly the type of AI offering we expected from AWS and, frankly, exactly what should make thoughtful architects nervous. Nova 2 is positioned as a frontier-grade model, tightly integrated with Amazon Bedrock. It’s part of a growing ecosystem of “frontier agents” and the AgentCore framework unveiled at re:Invent 2025. The story is compelling: better models, better tools, and a single platform to build, deploy, and scale agentic AI.

And yet, there’s a problem. It isn’t that Nova 2 is technically weak. The problem is that it is strategically strong in all the wrong ways for customers who care about independence, portability, and long-term value. AWS is not just selling you a model; the company is selling you an entire worldview where your agentic fabric, data flows, and operational patterns are deeply rooted in one cloud.

Vendor lock-in versus actual value

Lock-in is a spectrum, and the ecosystem of Nova 2, Bedrock, and AgentCore pushes you far toward the “tightly coupled” end of that spectrum. On paper, you get convenience: native integrations, managed infrastructure, observability, and security primitives that understand the agentic constructs AWS has defined. In practice, you are anchoring the core of your emerging AI capability into APIs, runtimes, and orchestration semantics that exist only within AWS.

The question I want enterprises to ask is simple: Are you optimizing for the next six quarters or the next six years? It’s likely that during the next six quarters, Nova 2 and its ecosystem will make you more productive. But during the next six years, the cost of migrating away from this ecosystem—or even meaningfully using a second cloud for AI—will rise dramatically. Your agents will be written to AWS’s tool APIs, observability model, security posture, and the way AWS wires agents to data and events. That is not theoretical lock-in; it is baked into every line of code and every workflow you build.

If you view AI as a transient experiment, this may not bother you. If you believe, as I do, that agentic systems will become the operational nervous system of most enterprises, then concentrating that critical capability inside a single vendor’s ecosystem is a strategic risk, not a feature.

Agentic fabric: native versus portable

The notion of an “agentic fabric” is useful: a mesh of agents that can reason, act, and collaborate across data sources, applications, and infrastructure. AWS’s vision is a cloud-native fabric where agents are first-class citizens inside services like Bedrock, wired to Lambda, Step Functions, EventBridge, and a growing set of AI-ready data services. The fabric is smooth—as long as you stay inside their walls.

The alternative is a cloud-portable fabric. Instead of building directly against closed, vendor-specific agent frameworks, you define agents in terms of more open abstractions: model-agnostic interfaces, cross-cloud orchestration, and data access layers that do not assume a particular vendor’s storage or event model. You might still run agents on AWS, but you can also run them on other clouds, on-premises, or at the edge without rewriting them from scratch.

Nova 2 and the surrounding tools tilt you hard toward cloud-native and away from cloud-portable. When your agents depend on AWS-specific features—say, Bedrock’s proprietary agent orchestration primitives or AWS-only plug-in patterns—your portability story collapses. The cost to move is not just “change the model endpoint”; it becomes “re-implement how the agent thinks, acts, and integrates.” That type of cost kills multicloud strategies in practice, even when they look good on PowerPoint.

Operational burden or simplification

AWS is selling Nova 2 and AgentCore as simplifying complexity and, in some respects, that is true. You get unified observability, integrated security, and pre-packaged patterns for building safe, production-grade agents. But let’s be very clear about what is happening. AWS is not removing complexity, it is encapsulating it inside black boxes you do not control.

When those black boxes malfunction, drift, or change, you are at the mercy of AWS’s release cadence and operational practices. You will still need teams who understand the behavior of your agents in detail, but you will be diagnosing problems in systems whose core behavior is defined by a vendor’s code and policies, not your own. That is a different kind of fragility. Instead of owning complexity you can see and manage, you’re renting complexity and hoping it behaves.

On top of that, operations teams now have to understand not only distributed cloud-native systems, but emergent, probabilistic agent behavior embedded within them. If your observability, governance, and control mechanisms are all bound to AWS-specific services, you lose the ability to build a unified operations view across clouds and on-prem systems. AWS wants to be your single pane of glass, but the reality is that most large enterprises need several panes, and those panes must interoperate.

Taking the long view

When you adopt Nova 2 and its ecosystem as your primary agentic platform, you are choosing a vertically integrated stack. The immediate upsides are undeniable: optimized performance, deep integrations, turnkey security patterns, and less glue code. For many teams, particularly those that are small, under-resourced, or deeply aligned with AWS already, this is a rational short-term decision.

But the downsides show up over time, and they show up at the architectural level, not in developer convenience. You lose leverage on pricing as your dependence on AWS-specific agent capabilities grows. You will find it harder to adopt innovations that emerge on other clouds or in open source communities, because your systems are built around a specific model of agents and tools. You will discover that “multicloud” has devolved into “one primary cloud for anything that matters and some residual workloads elsewhere,” which is exactly the outcome the big clouds are optimizing for.

If you want more open and portable approaches, you pay more up front. You build or adopt neutral orchestration layers, use frameworks that abstract model providers, and design observability that spans heterogeneous environments. You resist the gravitational pull of single-vendor AI fabrics, even when they look impressively polished. The payoff is flexibility: the ability to change direction when economics, regulation, or innovation demand it, without rewriting the nervous system of your enterprise.

(image/jpeg; 0.59 MB)

Amazon Q Developer: Everything you need to know 9 Dec 2025, 9:00 am

When I reviewed Amazon Q Developer in 2024, I noted that it was able to generate whole functions in common programming languages with only a few fixes. It was useful for completing lines of code, doc strings, and if/for/while/try code blocks as you type. It could also scan for vulnerabilities and help you fix code problems. However, it could not generate full functions for some use cases, but instead reverted to line-by-line suggestions.

At the time, Amazon Q Developer was “powered by Amazon Bedrock” and trained on “high-quality AWS content.” I never knew what the first really meant, but that’s now moot: Amazon Q Developer now gives you a choice of Claude Sonnet versions, including 4.5, which is competitive with GPT-5 Codex.

At this point, completing lines of code qualifies as a “gimme,” the equivalent of a two-foot putt in golf. Generating whole functions is easy, generating complex applications is moderately difficult, and fixing reported bugs in large repositories ranges from moderately difficult to difficult.

Some of the current differentiators for coding agents are their ability to call tools (e.g. read files, run applications, show edit diffs, and understand Git and GitHub) and their ability to define and use Model Context Protocol (MCP) servers. MCP servers are tools that allow AI models to interact with external services, such as databases, APIs, and enterprise systems, using a standard, unified protocol. Another differentiator is the ability to run tools and projects in an isolated environment.

At risk of giving away the punch line, Amazon Q Developer supports tools, supports MCP servers but makes it harder than necessary to configure them, and doesn’t have the best implementation of isolated cloud environments. It’s still useful, however, and we can only hope that AWS will fill in the obvious gaps.

Amazon Q Developer competes with Google Jules, OpenAI Codex Cloud and OpenAI Codex CLI, Claude Code, Windsurf, Augment Code, Amp, Cline, Zencoder, and Gemini CLI. The current Amazon Q Developer versions as of this writing are Plug-in 1.104.0 and CLI 1.19.6.

Amazon Q Developer in editors

Amazon Q Developer runs in JetBrains IDEs, Visual Studio Code, Visual Studio, Eclipse, and the command line (CLI). By the way, the Q CLI has been rebranded as Kiro CLI with a version bump to 1.20; one change from the Q CLI / Kiro CLI I’ve been testing is the addition of an “auto” model option.

Amazon Q Developer generates real-time code suggestions ranging from snippets to full functions based on your comments and existing code. It also supports inline chat directly in the code editor, and CLI completions and natural language to Bash translation in the command line. Q Developer can write unit tests, optimize code, and scan for vulnerabilities. If you enable agentic coding, Q Developer can perform tasks on your behalf by reading and writing files, generating code diffs, and running shell commands, while incorporating your feedback and providing real-time updates along the way.

You can use chat and inline chat for any programming language. Inline code suggestions work best with C, C++, C#, Dart, Go, Java, JavaScript, Kotlin, Lua, PHP, PowerShell, Python, R, Ruby, Rust, Scala, Shell, SQL, Swift, SystemVerilog, TypeScript, CDK (TypeScript, Python), HCL (Terraform), JSON, and YAML. The chat works in at least 13 natural languages and automatically responds in the language you use for your query. This is a major improvement over last year’s language capabilities.

Amazon Q Developer 01

Amazon Q information in Visual Studio Code. At the right is the plug-in description. At the left you can see the chat help text above the chat input box. Note the blue switch at the lower left, which lets you choose whether to use agentic coding for the current request. Also note the choice of model, Claude Sonnet 4.5.

Foundry

Testing Amazon Q Developer Chat

To compare Amazon Q Developer Chat with Codex Chat I tried essentially the same prompt to create a command-line program to calculate pi.

Generate a C++ program to calculate the value of pi to the number of decimal digits specified in the command line. Make sure the program will be accurate to at least 1000 digits. Use the algorithm that is most efficient at run-time, regardless of the amount of code required. Save the program in the AmazonQ folder.

I used the Claude Sonnet 4.5 model, which I expected to do well on this task. Amazon Q Developer responded with the following.

I'll create a C++ program using the Chudnovsky algorithm, which is one of the fastest known algorithms for calculating π. It uses the GMP library for arbitrary precision arithmetic.
Creating the AmazonQ directory if it doesn't exist
Creating the C++ program that uses the Chudnovsky algorithm for calculating pi to arbitrary precision
Creating a Makefile to simplify compilation of the pi calculator
Creating a README with build and usage instructions
I've created a C++ program that calculates π using the Chudnovsky algorithm, which is one of the most efficient algorithms for computing π (converges at ~14 digits per iteration).
Files created in 
/Users/martinheller/Code/AmazonQ/:
• pi_calculator.cpp
 - Main program using GMP for arbitrary precision
• Makefile
 - For easy compilation
• README.md
 - Build and usage instructions
To use:
1.	Install GMP: brew install gmp
2.	Build: make
3.	Run: ./pi_calculator 1000 (for 1000 digits)
The program accurately handles 1000+ digits and uses the fastest known algorithm for runtime efficiency.

That sounded more promising than Codex’s first try, but it didn’t compile. For background, you should know that GMP has both C and C++ libraries; I only installed the C library, but Amazon Q Developer didn’t know that. I asked it to fix the build bug after highlighting that output, but it didn’t take the context. No matter, it still fixed the problem:

Let me try to build it first to see the specific error:
Attempting to build the program to identify the specific build error
Removing the unnecessary gmpxx.h include and fixing the C constant calculation
Removing -lgmpxx from the Makefile since we don't need the C++ wrapper
Building the program again to verify the fix works
Testing the program with 100 digits to verify it works correctly
Testing the program with the correct path
Running the pi calculator to verify it works
Fixed. The build issue was the missing 
gmpxx.h header. I removed the unnecessary C++ wrapper include and simplified the constant calculation. The program now builds and runs correctly, calculating π to the specified precision.

I asked Amazon Q Developer Chat to time the program for 100 and 10,000 digits so that I could compare the efficiency with the MPFR/GMP implementation that Codex generated on its second try. Unsurprisingly, the calculation was a lot slower for 10,000 digits. I asked Q Developer how it might make this calculation more efficient for very high numbers of digits.

Yes, there are several optimizations for very high digit counts:
1. Binary Splitting - The most significant optimization. Instead of computing terms sequentially, recursively split the sum into halves and combine. This reduces complexity from O(n²) to O(n log n) for n terms.
2. Use MPFR instead of GMP - MPFR is optimized for floating-point operations and has better performance for transcendental functions.
3. Avoid Redundant Factorial Calculations - Your current code recalculates factorials from scratch each iteration. Binary splitting naturally avoids this.
4. Memory Management - Reuse mpz_t variables instead of allocating/deallocating in loops.
5. Parallel Computation - For extremely large calculations (millions of digits), parallelize the binary split recursion.
I can see you already have a more optimized version in 
Code/Codex/pi_mpfr.cpp that uses binary splitting with MPFR. That implementation will be significantly faster for large digit counts (10,000+).

Oops. I didn’t mean to leave that file open where Amazon Q Developer could see it. Nevertheless, Q Developer/Claude Sonnet 4.5 answered correctly for all five points.

Amazon Q Developer 02

Pi calculator in C++ generated by Amazon Q Developer Chat using the Claude Sonnet 4.5 model. This implementation uses the GMP library for high-precision integer arithmetic.

Foundry

Testing the Amazon Q Developer CLI

Before testing the Amazon Q Developer CLI (again) I checked its menu bar icon drop-down for updates. It was already up-to-date. Then I ran q in iTerm2 and asked it for help information.

Amazon Q Developer 03

Amazon Q Developer CLI help information display. This is as of Q 1.19.4. With version 1.20, the Q CLI is rebranded as the Kiro CLI. 

Foundry

For a first test I changed to the Cline rep directory, started the Q CLI, switched Q to the claude-sonnet-4.5 model (from the default 4.0), and asked it to explain the current repository.

Amazon Q Developer 04

Amazon Q Developer CLI with claude-sonnet-4.5 explaining the Cline repository. This is a reasonable high-level summary. It’s not as detailed as the explanation given by Codex last month, but it is easier to understand.

Foundry

Then I repeated the experiment to have Q explain the amazon-q-developer-cli repository.

Amazon Q Developer 05

Amazon Q Developer CLI explaining its own repository. Again, it gave a high-level summary that is easy to understand.

Foundry

Next, I cloned the websocket repository from my own fork of the project, switched to that directory, ran the Q CLI, switched to the claude-sonnet-4.5 model, and asked Q to pick an open issue in the websocket repo and fix it in a new branch. Q struggled a bit with some of the issues, but found a problem with the chat example that it was able to fix.

Amazon Q Developer 06

Amazon Q Developer CLI explaining a fix it made to websocket. Issue #986 is about the chat example not working due to a missing CheckOrigin implementation. It was a simple three-line addition to the example. Q created the fix and checked it in to a new branch on my laptop.

Foundry

I had to push the branch to my GitHub repository myself:

martinheller@Mac websocket % git push
fatal: The current branch fix-chat-example-checkorigin has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin fix-chat-example-checkorigin

To have this happen automatically for branches without a tracking
upstream, see 'push.autoSetupRemote' in 'git help config'.

martinheller@Mac websocket % git push --set-upstream origin fix-chat-example-checkorigin
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 12 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 686 bytes | 137.00 KiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
remote:
remote: Create a pull request for 'fix-chat-example-checkorigin' on GitHub by visiting:
remote:      https://github.com/meheller/websocket/pull/new/fix-chat-example-checkorigin
remote:
To https://github.com/meheller/websocket.git
 * [new branch]      fix-chat-example-checkorigin -> fix-chat-example-checkorigin
branch 'fix-chat-example-checkorigin' set up to track 'origin/fix-chat-example-checkorigin'.
martinheller@Mac websocket %

I didn’t create a pull request for the upstream gorilla/websocket repo that I had forked from, since that repo hadn’t changed in eight months, which tells me that the repo is inactive.

Amazon Q Developer CLI with claude-sonnet-4.5 didn’t do as good a job on this task as Codex Cloud. Codex Cloud operated directly on the GitHub repository and was able to fix a bug that Sonnet misunderstood and skipped.

I looked at the GitHub blame view for the new branch of my fork of websocket. The fix is in lines 38 through 40.

Amazon Q Developer 07

GitHub blame view for the fixed client.go in the chat example of my fork of gorilla/websocket.

Foundry

Amazon Q Developer agentic and MCP capabilities

The Amazon Q Developer CLI (newly renamed Kiro CLI) currently has the following built-in tools available to agents:

  • fs_read – Read files, directories, and images
  • fs_write – Create and edit files
  • execute_bash – Execute shell commands
  • use_aws – Make AWS CLI API calls
  • knowledge – Store and retrieve information across sessions
  • introspect – Provide information about Q CLI capabilities

In the built-in default agent, only fs_read can run without asking permission, but all tools are available, as well as legacy MCP servers. You can define your own agents, and set a default agent.

In addition, you can define and use MCP servers and tools, both in the CLI and the IDE. Oddly, the Amazon Q Developer plug-in for VS Code does not use any of the MCP or extensive tool capabilities of VS Code. Instead, it and the Q CLI / Kiro CLI use their own JSON MCP configuration files, which look about the same as Claude’s and the MCP specification’s config files. (If you’re interested, you can find the MCP specification here.)

Like Claude Code, the Q CLI / Kiro CLI has text commands to manage MCP server configurations; the Amazon Q Developer editor plug-ins have forms for that. Both require you to know the command that invokes the MCP server, but you can look that up for public MCP servers. The MCP standards organization maintains a registry of public MCP servers; so do GitHub (also here) and Anthropic (Claude).

Amazon Q Developer productivity effects

According to AWS, their internal developers have improved productivity using Q both quantitatively and qualitatively. One huge win was to ingest their internal Amazon knowledge repository (millions of documents) into Amazon Q Business so that developers could get answers based on information spread across those repositories. They reported “we reduced the time Amazon developers spent waiting for technical answers by over 450k hours and reduced the interruptions to ‘flow state’ of existing team members.”

AWS also unlocked “possibilities for large-scale technical modernization that previously seemed impractical,” which “fundamentally changed how [they] think about technical debt and system modernization.” Another effect was reducing the time to learn new languages and codebases. “One developer reported cutting their typical three-week ramp-up time for learning a new programming language down to just one week using Q Developer.”

With the Amazon Q Developer CLI agent, another internal developer was able to work with an unfamiliar codebase to build and implement a non-trivial feature within two days using Rust, a programming language they didn’t know, stating, “If I’d done this ‘the old fashioned way,’ I would estimate it would have taken me five to six weeks due to language and codebase ramp up time. More realistically, I wouldn’t have done it at all, because I don’t have that kind of time to devote.”

Amazon Q Developer on AWS

AWS notes that Amazon Q Developer is an expert on AWS. Q is available in the AWS Management Console (and in Microsoft Teams and Slack) to help optimize users’ cloud costs and resources, provide guidance on architectural best practices, investigate operational incidents, and diagnose and resolve networking issues. Of course, it’s easy for AWS to train its own model on its own documentation, procedures, best practices, and APIs. It might be a bit harder for customers.

Amazon Q Developer .NET porting and Java upgrades

Amazon Q Developer has agents that can help port .NET code from Windows to Linux, and help upgrade code from Java 8 to Java 17. These agents have upgraded more than 1,000 production Java applications, according to AWS.

Amazon Q Developer, data, and AI

In addition to straight code, Amazon Q Developer knows about data integration workflows. It can generate ETL scripts, troubleshoot errors, translate natural language to SQL queries, and work with data across 20+ data sources. It probably doesn’t hurt that AWS offers 20+ data sources as paid services. Q Developer can help you build machine learning models and also reduce the time to build, train, evaluate, and deploy AI models in SageMaker Studio.

Amazon Q Developer pricing

The Amazon Q Developer perpetual Free Tier gives you 50 agentic chat interactions per month. You can also transform up to 1,000 lines of code per month. The Pro Tier has expanded limits for $19 per month per user. I was able to perform this review using the free tier.

Conclusion

Amazon Q Developer has improved significantly over the last year. I can certainly recommend it to AWS customers. The free tier was good enough for me, but if I was using it all day, every day I’d most likely have to upgrade to the pro tier.

(image/jpeg; 6.84 MB)

IBM to buy Confluent to extend its data and automation portfolio 9 Dec 2025, 2:15 am

IBM has agreed to acquire cloud-native enterprise data streaming platform Confluent in a move designed to expand its portfolio of tools for building AI applications

The company said Monday in a release that it sees Confluent as a natural fit for its hybrid cloud and AI strategy, adding that the acquisition is expected to “drive substantial product synergies” across its portfolio.

Confluent connects data sources and cleans up data. It built its service on Apache Kafka, an open-source distributed event streaming platform, sparing its customers the hassle of buying and managing their own server clusters in return for a monthly fee per cluster, plus additional fees for data stored and data moved in or out. 

IBM expects the deal, which it valued at $11 billion, to close by the middle of next year.

Confluent CEO and co-founder Jay Kreps stated in an email sent internally to staff about the acquisition, “IBM sees the same future we do: one in which enterprises run on continuous, event-driven intelligence, with data moving freely and reliably across every part of the business.”

It’s a good move for IBM, noted Scott Bickley, an advisory fellow at Info-Tech Research Group. “[Confluent] fills a critical gap within the watsonx platform, IBM’s next-gen AI platform, by providing the ability to monitor real-time data,” he said, and is based on the industry standard for managing and processing real-time data streams. 

He added, “IBM already has the pieces of the puzzle required to build and train AI models; Confluent provides the connective tissue to saturate those models with continuous live data from across an organization’s entire operation, regardless of the source. This capability should pave the road ahead for more complex AI agents and applications that will be able to react to data in real time.”

Bickley also pointed out that the company is playing the long game with this acquisition, which is its largest in recent history. “IBM effectively positions itself proactively to compete against the AI-native big data companies like Snowflake and Databricks, who are all racing towards the same ‘holy grail’ of realizing AI agents that can consume, process, and react to real-time data within the context of their clients’ trained models and operating parameters,” he said, adding that IBM is betting that a full-stack vertical AI platform, watsonx, will be more appealing to enterprise buyers than a composable solution comprised of various independent components.

It is a deal that works for both parties.

The move, he noted, also complements previous acquisitions such as the $34.5 billion acquisition of Red Hat and the more recent $6.4 billion acquisition of Hashicorp, all of which are built upon dominant open source standards including Linux, Terraform/Vault, and Kafka. This allows IBM to offer a stand-alone vertical, hybrid cloud strategy with full-stack AI capabilities apart from the ERP vendor space and the point solutions currently available.

In addition, said Andrew Humphreys, senior director analyst at Gartner, with IBM MQ, IBM already competes with Confluent in the event broker market, the underpinning technology for event driven architectures. “Although there is some overlap, IBM MQ and Kakfa address different use cases and problems for customers, so IBM has the opportunity to bring these offerings together to deliver a comprehensive set of event broker offerings that address the full breadth of event driven architecture use cases,” he said.

Vital layer in the watsonx stack filled

Mitch Ashley, VP and practice lead at Futurum Research, noted that the acquisition of Confluent fills a vital layer in the watsonx stack and gives IBM an open source-based backbone for real time, governed data in motion. It also aligns IBM’s recent data acquisitions into a coherent architecture. “The value here is not just Kafka as a technology, but the ability to deliver fresh, contextual data into every part of IBM’s AI portfolio with consistency and control,” he said.

The acquisition, wrote Sanchit Vir Gogia, the chief analyst at Greyhound Research, in a report released soon after the purchase was announced, “marks a turning point that has little to do with price tags or portfolio expansion. What it truly reveals is a change in who controls the lifeblood of modern digital enterprises. That lifeblood is real-time data.”

It is not a tactical buy, he noted, it’s the strategic completion of an architecture years in the making. “For enterprise leaders, this changes the map,” he predicted. “AI no longer sits at the edge of architecture. It moves to the center, and Confluent becomes the layer that makes that center responsive, contextual, and live. This acquisition allows IBM to deliver AI that doesn’t just predict, but listens, grounded in data that is clean, connected, and always in motion.”

Added Stephen Catanzano, senior analyst, data & AI with Omdia, “all the major players are really building end to end data platforms at this point. … This is data in motion, so it really fills out the gap that [IBM] have to manage both moving data and static data, unstructured and structured.”

“People really want to apply generative AI and agentic AI with moving data and streaming data. And they (IBM) took the biggest player off the market,” he said.

In addition to all this, Bickley said, the timing was right in that Confluent has been experiencing a slowing of revenue growth and was reportedly shopping itself already.

“At the end of the day, this deal works for both parties,” he said. “IBM is now playing a high-stakes game and has placed its bet that having the best AI models is not enough; it is the control of the data flow that will matter.”

This story originally appeared on CIO.com.

(image/jpeg; 5.89 MB)

JDK 26: The new features in Java 26 9 Dec 2025, 12:52 am

Java Development Kit (JDK) 26, a planned update to standard Java due March 17, 2026, has reached an initial rampdown phase for bug fixes, with the feature set now frozen. The following 10 features are officially targeted to JDK 26: a fourth preview of primitive types in patterns, instanceof, and switch, ahead-of-time object caching, an eleventh incubation of the Vector API, second previews of lazy constants and PEM (privacy-enhanced mail) encodings of cryptographic objects, a sixth preview of structured concurrency, warnings about uses of deep reflection to mutate final fields, improving throughput by reducing synchronization in the G1 garbage collector (GC), HTTP/3 for the Client API, and removal of the Java Applet API.

A short-term release of Java backed by six months of Premier-level support, JDK 26 follows the September 16 release of JDK 25, which is a Long-Term Support (LTS) release backed by several years of Premier-level support.

The latest feature to be added, primitive types in patterns, instanceof, and switch, is intended to enhance pattern matching by allowing primitive types in all pattern contexts, and to extend instanceof and switch to work with all primitive types. Now in a fourth preview, this feature was previously previewed in JDK 23, JDK 24, and JDK 25. The goals include enabling uniform data exploration by allowing type patterns for all types, aligning type patterns with instanceof and aligning instanceof with safe casting, and allowing pattern matching to use primitive types in both nested and top-level pattern contexts. Changes in this fourth preview include enhancing the definition of unconditional exactness and applying tighter dominance checks in switch constructs. The changes enable the compiler to identify a wider range of coding errors.

With ahead-of-time object caching, the HotSpot JVM would gain improved startup and warmup times, so it can be used with any garbage collector including the low-latency Z Garbage Collector (ZGC). This would be done by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than mapping them directly into memory in a GC-specific format. Goals of this feature include allowing all garbage collectors to work smoothly with the AOT (ahead of time) cache introduced by Project Leyden, separating AOT cache from GC implementation details, and ensuring that use of the AOT cache does not materially impact startup time, relative to previous releases.

The eleventh incubation of the Vector API introduces an API to express vector computations that reliably compile at run time to optimal vector instructions on supported CPUs. This achieves performance superior to equivalent scalar computations. The incubating Vector API dates back to JDK 16, which arrived in March 2021. The API is intended to be clear and concise, to be platform-agnostic, to have reliable compilation and performance on x64 and AArch64 CPUs, and to offer graceful degradation. The long-term goal of the Vector API is to leverage Project Valhalla enhancements to the Java object model.

Also on the docket for JDK 26 is another preview of an API for lazy constants, which had been previewed in JDK 25 via a stable values capability. Lazy constants are objects that hold unmodifiable data and are treated as true constants by the JVM, enabling the same performance optimizations enabled by declaring a field final. Lazy constants offer greater flexibility as to the timing of initialization.

The second preview of PEM (privacy-enhanced mail) encodings calls for an API for encoding objects that represent cryptographic keys, certificates, and certificate revocation lists into the  PEM transport format, and for decoding from that format back into objects. The PEM API was proposed as a preview feature in JDK 25. The second preview features a number of changes, such as the PEMRecord class is now named PEM and now includes a decode()method that returns the decoded Base64 content. Also, the encryptKey methods of the EncryptedPrivateKeyInfo class now are named encrypt and now accept DEREncodable  objects rather than PrivateKey objects, enabling the encryption of KeyPair and PKCS8EncodedKeySpec objects.

The structured concurrency API simplifies concurrent programming by treating groups of related tasks running in different threads as single units of work, thereby streamlining error handling and cancellation, improving reliability, and enhancing observability. Goals include promoting a style of concurrent programming that can eliminate common risks arising from cancellation and shutdown, such as thread leaks and cancellation delays, and improving the observability of concurrent code.

New warnings about uses of deep reflection to mutate final fields are intended to prepare developers for a future release that ensures integrity by default by restricting final field mutation, in other words making final mean final, which will make Java programs safer and potentially faster. Application developers can avoid both current warnings and future restrictions by selectively enabling the ability to mutate final fields where essential.

The G1 GC proposal is intended to improve application throughput and latency when using the G1 garbage collector by reducing the amount of synchronization required between application threads and GC threads. Goals include reducing the G1 garbage collector’s synchronization overhead, reducing the size of the injected code for G1’s write barriers, and maintaining the overall architecture of G1, with no changes to user interaction.

The G1 GC proposal notes that although G1, which is the default garbage collector of the HotSpot JVM, is designed to balance latency and throughput, achieving this balance sometimes impacts application performance adversely compared to throughput-oriented garbage collectors such as the Parallel and Serial collectors:

Relative to Parallel, G1 performs more of its work concurrently with the application, reducing the duration of GC pauses and thus improving latency. Unavoidably, this means that application threads must share the CPU with GC threads, and coordinate with them. This synchronization both lowers throughput and increases latency.

The HTTP/3 proposal calls for allowing Java libraries and applications to interact with HTTP/3 servers with minimal code changes. Goals include updating the HTTP Client API to send and receive HTTP/3 requests and responses; requiring only minor changes to the HTTP Client API and Java application code; and allowing developers to opt in to HTTP/3 as opposed to changing the default protocol version from HTTP/2 to HTTP/3.

HTTP/3 is considered a major version of the HTTP (Hypertext Transfer Protocol) data communications protocol for the web. Version 3 was built on the IETF QUIC (Quick UDP Internet Connections) transport protocol, which emphasizes flow-controlled streams, low-latency connection establishment, network path migration, and security among its capabilities.

Removal of the Java Applet API, now considered obsolete, is also targeted for JDK 26. The Applet API was deprecated for removal in JDK 17 in 2021. The API is obsolete because neither recent JDK releases nor current web browsers support applets, according to the proposal. There is no reason to keep the unused and unusable API, the proposal states.

(image/jpeg; 1.93 MB)

R language is making a comeback – Tiobe 8 Dec 2025, 10:41 pm

The R language for statistical computing has creeped back into the top 10 in Tiobe’s monthly index of programming language popularity.

In the December 2025 index, published December 7, R ranks 10th with a 1.96% rating. R has cracked the Tiobe index’s top 10 before, such as in April 2020 and July 2020, but not in recent years. The rival Pypl Popularity of Programming Language Index, meanwhile, has R ranked fifth this month with a 5.84% share.

“Programming language R is known for fitting statisticians and data scientists like a glove,” said Paul Jansen, CEO of software quality services vendor Tiobe, in a bulletin accompanying the December index. “As statistics and large-scale data visualization become increasingly important, R has regained popularity.”

Jansen noted that R is sometimes frowned upon by “traditional” software engineers due to an unconventional syntax and limited scalability for large production systems. But for domain experts R remains a powerful and elegant tool, and continues to thrive at universities and in research-driven industries, he added. Although data science rival Python has eclipsed R in terms of general adoption, Jansen said R has carved out a solid and enduring niche, excelling at rapid experimentation, statistical modeling, and exploratory data analysis.

“We have seen many Tiobe index top 10 entrants rising and falling,” Jansen wrote. “It will be interesting to see whether R can maintain its current position.”

The Tiobe Programming Community Index bases language popularity on a formula that assesses the number of skilled engineers worldwide, courses, and third-party vendors pertinent to a language. Popular websites including Google, Amazon, Wikipedia, Bing, and more than 20 others are used to calculate its ratings.

The Tiobe index top 10 for December 2025:

  1. Python, 23.64%
  2. C, 10.11%
  3. C++, 8.95%
  4. Java, 8.7%
  5. C#, 7.26%
  6. JavaScript, 2.96%
  7. Visual Basic, 2.81%
  8. SQL, 2.1%
  9. Perl, 1.97%
  10. R, 1.96%

The Pypl index analyzes how often language tutorials are searched on Google. The Pypl index top 10 for December 2025:

  1. Python, 26.91%
  2. C/C++, 13.02%
  3. Objective-C, 11.37%
  4. Java, 11.36%
  5. R, 5.84%
  6. JavaScript, 5.16%
  7. Swift, 3.53%
  8. C#, 3.18%
  9. PHP, 2.98%
  10. Rust, 2.6%

(image/jpeg; 11.74 MB)

Apache Tika hit by critical vulnerability thought to be patched months ago 8 Dec 2025, 8:03 pm

A security flaw in the widely-used Apache Tika XML document extraction utility, originally made public last summer, is wider in scope and more serious than first thought, the project’s maintainers have warned.

Their new alert relates to two entwined flaws, the first CVE-2025-54988 from August, rated 8.4 in severity, and the second, CVE-2025-66516 made public last week, rated 10.

CVE-2025-54988 is a weakness in the tika-parser-pdf-module used to process PDFs in Apache Tika from version 1.13 to and including version 3.2.1.  It is one module in Tika’s wider ecosystem that is used to normalize data from 1,000 proprietary formats so that software tools can index and read them.

Unfortunately, that same document processing capability makes the software a prime target for campaigns using XML External Entity (XXE) injection attacks, a recurring issue in this class of utility.

In the case of CVE-2025-54988, this could have allowed an attacker to execute an External Entity (XXE) injection attack by hiding XML Forms Architecture (XFA) instructions inside a malicious PDF.

Through this, “an attacker may be able to read sensitive data or trigger malicious requests to internal resources or third-party servers,” said the CVE. Attackers could exploit the flaw to retrieve data from the tool’s document processing pipeline, exfiltrating it via Tika’s processing of the malicious PDF.

CVE superset

The maintainers have now realized that the XXE injection flaw is not limited to this module. It affects additional Tika components, namely Apache Tika tika-core, versions 1.13 to 3.2.1, and tika-parsers versions 1.13 to 1.28.5. In addition, legacy Tika parsers versions 1.13 to 1.28.5 are also affected.

Unusually – and confusingly – this means there are now two CVEs for the same issue, with the second, CVE-2025-66516, a superset of the first. Presumably, the reasoning behind issuing a second CVE is that it draws attention to the fact that people who patched CVE-2025-54988 are still at risk because of the additional vulnerable components listed in CVE-2025-66516.

So far, there’s no evidence that the XXE injection weakness in these CVEs is being exploited by attackers in the wild. However, the risk is that this will quickly change should the vulnerability be reverse engineered or proofs-of-concept appear.

CVE-2025-66516 is rated an unusual maximum 10.0 in severity, which makes patching it a priority for anyone using this software in their environment. Users should update to Tika-core version 3.2.2, tika-parser-pdf-module version 3.2.2 (standalone PDF module), or tika-parsers versions 2.0.0 if on legacy.

However, patching will only help developers looking after applications known to be using Apache Tika. The danger is that its use might not be listed in all application configuration files, creating a blind spot whereby its use is not picked up. The only mitigation against this uncertainty would be for developers to turn off the XML parsing capability in their applications via the tika-config.xml configuration file.

This article originally appeared on CSOonline.

(image/jpeg; 6.88 MB)

AWS takes aim at the PoC-to-production gap holding back enterprise AI 8 Dec 2025, 6:58 pm

Enterprises are testing AI in all sorts of applications, but too few of their proofs of concept (PoCs) are making into production: just 12%, according to an IDC study.

Amazon Web Services is concerned about this too, with VP of agentic AI Swami Sivasubramanian devoting much of his keynote speech to it at AWS re:Invent last week.

The failures are not down to lack of talent or investment, but how organizations plan and build their PoCs, he said: “Most experiments and PoCs are not designed to be production ready.”

Production workloads, for one, require development teams to deploy not just a handful of agent instances, but often hundreds or thousands of them simultaneously — each performing coordinated tasks, passing context between one another, and interacting with a sprawling web of enterprise systems.

This is a far cry from most PoCs, which might be built around a single agent executing a narrow workflow.

Another hurdle, according to Sivasubramanian, is the complexity that agents in production workloads must contend with, including “a massive amount of data and edge cases”.  

This is unlike PoCs which operate in artificially clean environments and run on sanitized datasets with handcrafted prompts and predictable inputs — all of which hide the realities of live data, such as inconsistent formats, missing fields, conflicting records, and unexpected behaviours.

Then there’s identity and access management. A prototype might get by with a single over-permissioned test account. Production can’t.

“In production, you need rock-solid identity and access management to authenticate users, authorize which tools agents can access on their behalf, and manage these credentials across AWS and third-party services,” Sivasubramanian said.

Even if those hurdles are cleared, the integration of agents into production workloads still remains a key challenge.

“And then of course as you move to production, your agent is not going to live in isolation. It will be part of a wider system, one that can’t fall apart if an integration breaks,” Sivasubramanian said.

Typically, in a PoC, engineers can manually wire data flows, push inputs, and dump outputs to a file or a test interface. If something breaks, they reboot it and move on. That workflow collapses under production conditions: Agents become part of a larger, interdependent system that cannot fall apart every time an integration hiccups.

Moving from PoC to production

Yet Sivasubramanian argued that the gulf between PoC and production can be narrowed.

In his view, enterprises can close the gap by equipping teams with tooling that bakes production readiness into the development process itself, focusing on agility while still being accurate and reliable.

To address concerns around the agility of building agentic systems with accuracy, AWS added an episodic memory feature to Bedrock AgentCore, which lifts the burden of building custom memory scaffolding off developers.

Instead of expecting teams to stitch together their own vector stores, summarization logic, and retrieval layers, the managed module automatically captures interaction traces, compresses them into reusable “episodes,” and brings forward the right context as agents work through new tasks.

In a similar vein, Sivasubramanian also announced the serverless model customization capability in SageMaker AI to help developers automate data prep, training, evaluation, and deployment.

This automation, according to Scott Wheeler, cloud practice leader at AI and data consultancy firm Asperitas, will remove the heavy infrastructure and MLops overhead that often stall fine-tuning efforts, accelerating agentic systems deployment.

The push toward reducing MLops didn’t stop there. Sivasubramanian said that AWS is adding Reinforcement Fine-Tuning (RFT) in Bedrock, enabling developers to shape model behaviour using an automated reinforcement learning (RL) stack.

Wheeler welcomed this, saying it will remove most of the complexity of building a RL stack, including infrastructure, math, and training-pipelines.

SageMaker HyperPod also gained checkpointless training, which enables developers to accelerate the model training process.

To address reliability, Sivasubramanian said that AWS is adding Policy and Evaluations capabilities to Bedrock AgentCore’s Gateway. While Policy will help developers enforce guardrails by intercepting tool calls, Evaluations will help developers simulates real-world agent behavior to catch issues before deployment.

Challenges remain

However, analysts warn that operationalizing autonomous agents remains far from frictionless.

Episodic memory, though a conceptually important feature, is not magic, said David Linthicum, independent consultant and retired chief cloud strategy officer at Deloitte. “It’s impact is proportional to how well enterprises capture, label, and govern behavioural data. That’s the real bottleneck.”

“Without serious data engineering and telemetry work, it risks becoming sophisticated shelfware,” Linthicum said.

He also found fault with RFT in Bedrock, saying that though the feature tries to abstract complexity from RL workflows, it doesn’t remove the most complex parts of the process, such as defining rewards that reflect business value, building robust evaluation, and managing drift.

“That’s where PoCs usually die,” he said.

It is a similar story with the model customization capability in SageMaker AI.

Although it collapses MLOps complexity, it amplified Linthicum’s and Wheeler’s concerns in other areas.

“Now that you have automated not just inference, but design choices, data synthesis, and evaluation, governance teams will demand line-of-sight into what was tuned, which data was generated, and why a given model was selected,” Linthicum said.

Wheeler said that industry sectors with strict regulatory expectations will probably treat the capability as an assistive tool that still requires human review, not a set-and-forget automation: “In short, the value is real, but trust and auditability, not automation, will determine adoption speed,” he said.

(image/jpeg; 11.73 MB)

10 MCP servers for devops 8 Dec 2025, 9:00 am

Today’s AI coding agents are impressive. They can generate complex multi-line blocks of code, refactor according to internal style, explain their reasoning in plain English, and more. However, AI agents will take you only so far unless they also can interface with modern devops tools.

This is where the Model Context Protocol (MCP) comes in. MCP is a proposed universal standard for connecting AI assistants with external tools and data. Interest has heated up since the protocol’s debut in late November 2024, with major tech companies rallying MCP support within new releases, alongside strong community interest.

For devops, MCP gives AI agents new abilities across common operations: Git version control, continuous integration and delivery (CI/CD), infrastructure as code (IaC), observability, accessing documentation, and more. By linking natural language commands to multi-step, back-end processes, MCP essentially enables “chatops 2.0.”

Below, we’ll explore official MCP servers that have emerged across popular devops tools and platforms, offering a cross-section of servers that cater to different devops capabilities. Most are straightforward to configure and authorize within MCP-compatible, AI-assisted development tools that support remote servers, like Claude Code, GitHub Copilot, Cursor, or Windsurf.

GitHub MCP server

It’s rare to meet a developer who doesn’t use GitHub in some form or fashion. As such, GitHub’s official MCP server is quickly becoming a popular way for AI agents to interact with code repositories.

GitHub’s remote MCP server exposes a range of tools that let agents perform repository operations, create or comment on issues, open or merge pull requests, and retrieve project metadata on collaborators, commits, or security advisories.

It also includes endpoints for CI/CD management through GitHub Actions. For example, a command like “cancel the current running action” could invoke the cancel_workflow_run tool within the GitHub Actions tool set.

Compared to other MCP servers, GitHub’s server offers unusually rich capabilities that mirror the APIs of the GitHub platform. However, for safety, you can always configure a --read-only flag to prevent agents from performing mutations.

Notion MCP server

Although not strictly devops at its core, Notion has become commonplace for team visibility across disciplines. For devops, the official Notion MCP server can help agents surface relevant notes and process documentation.

For instance, you could instruct an agent to reference internal style guides or operational runbooks stored in Notion, or issue a command like “Add a page titled ‘MCP servers we use’ under the page ‘DevOps’,” which would trigger a corresponding action through Notion’s API.

You can call Notion’s remote MCP server from your IDE, or build it locally and run it using the official Docker image. Notion’s MCP can be treated as a low-risk server as it has configurable scopes and tokens for managing Notion pages and blocks.

Atlassian Remote MCP server

Another interesting MCP server is the Atlassian Remote MCP server, which connects IDEs or AI agent platforms with Atlassian Cloud products such as Jira, the project management tool, and Confluence, the collaboration platform.

Atlassian’s MCP server, documented here, lets external AI tools interface with Jira to create, summarize, or update issues. It can also retrieve or reference Confluence pages and chain together related actions through the MCP client, like retrieving documentation from Confluence before updating a linked Jira issue.

You could imagine telling an agent, “Update my Jira issue on user testing for the payments app based on this latest bug report,” and pointing it to relevant logs. The server would then handle the update within Jira.

Currently in beta and available only to Atlassian Cloud customers, the Atlassian MCP server supports many MCP-compatible clients and uses OAuth 2.1 authorization for secure access.

Argo CD MCP server

The Argo CD MCP server is developed by Akuity, the original creators of Argo CD, the popular open-source CI/CD tool that powers many Kubernetes-native GitOps workflows. The MCP server wraps calls to the Argo CD API, and provides tools that allow users of AI assistants to interact with Argo CD in natural language.

Akuity’s MCP server has two main tools for applications (the deployments Argo CD manages) and resources (the underlying Kubernetes objects). The application management tool lets agents retrieve application information, create and delete applications, and perform other operations. The resource management tool allows agents to retrieve resource information, logs, and events for specific applications, and run actions on specific resources.

Using the Argo CD MCP server, you can do a lot of the same things you’d typically do in the Argo CD UI or CLI, but driven by natural language. For example, Akuity shares sample prompts such as “Show me the resource tree for guestbook” or “Sync the staging app.”

For such commands to work, you’ll need to integrate the Argo CD MCP server and have access to a running Argo CD instance with the proper credentials configured.

Lastly, although Argo CD is a popular choice, it’s not the only widely used CI/CD tool. Jenkins users may be interested to know that there is a community-maintained MCP Server Plugin for Jenkins.

Grafana MCP server

Grafana, the popular data visualization and monitoring tool, is a mainstay among devops and site reliability teams. Using the official MCP server for Grafana, agents can surface observability data to inform development and operations workflows.

The Grafana MCP server lets agents query full or partial details from dashboards, which combine system performance metrics and health data monitoring from various sources. It can also fetch information on data sources, query other monitoring systems, incident details, and more.

The tool set is configurable, so you can choose what permissions the agent has. Plus, Grafana has optimized how the MCP server structures responses to minimize context window usage and reduce runaway token costs.

For example, an MCP client might call the get_dashboard_property tool to retrieve a specific portion of a dashboard by its UID.

Terraform MCP server

Although alternatives have emerged, HashiCorp’s Terraform remains a leading choice for infrastructure as code. That makes its official MCP server an intriguing option for AI agents to generate and manage Terraform configurations.

The Terraform MCP server integrates with both the Terraform Registry APIs and Terraform Enterprise/HCP services, allowing agents to query module and provider metadata, inspect workspace states, and trigger runs with human approval. It also exposes Terraform resources such as runs, registries, providers, policies, modules, variables, and workspaces.

For example, a command like “generate Terraform code for a new run” could use the create_run operation, after which the agent might validate and plan the configuration before applying it.

The Terraform MCP server ships with an AGENTS.md file, which acts as a readme for agents to interpret tools. At the time of writing, the Terraform MCP is intended only for local use, rather than remote or hosted deployments.

Alternatively, if you’re using OpenTofu for IaC, consider checking out the OpenTofu MCP server. Some advantages of OpenTofu’s MCP are that it can be run locally or deployed in the cloud, it’s globally distributed on Cloudflare Workers, and it’s 100% open source.

GitLab MCP server

Another Git version control and devops platform is GitLab, which offers an MCP server for its Premium and Ultimate customers. The GitLab MCP server, currently in beta, enables AI agents to gather project information and perform operations on GitLab APIs in a secure way.

The GitLab MCP server allows some state changes, such as creating issues or merge requests. The other functions are mainly for data retrieval: retrieving information on issues, merge requests, commits, diffs, and pipeline information. It also includes a general search tool, which can handle a request like “Search issues for ‘failed test’ across GitLab.”

GitLab’s MCP documentation is thorough, with plenty of sample natural language expressions that the MCP server can satisfy. The server supports OAuth 2.0 Dynamic Client Registration.

Snyk MCP server

Snyk, maker of the Snyk security platform for developers, provides an MCP server with the ability to scan and fix vulnerabilities in code, open source dependencies, IaC code, containers, and software bill of materials (SBOM) files. It also supports creating an AI bill of materials (AIBOM) and other security-related operations.

For AI-assisted devsecops, integrating the Snyk MCP server could let an agent automatically run security scans as part of a CI/CD workflow. These scans can even be orchestrated across other MCP servers, like fetching repository details via the GitHub MCP server before initiating a Snyk scan.

A prompt like “Scan the repo ‘Authentication Microservice’ for security vulns” could instruct an agent to locate the repository using GitHub MCP, then invoke Snyk tools such as snyk_sca_scan or snyk_code_scan to identify known vulnerabilities, injection flaws, leaked credentials, and other risks.

The Snyk MCP server runs locally and uses the Snyk CLI to execute these commands through authenticated API calls. Snyk does not offer a hosted, remote version of the MCP server.

AWS MCP servers

The cloud hyperscalers have worked quickly to release MCP servers that integrate with their ecosystems. AWS, for instance, has rolled out dozens of specialized AWS MCP servers to allow AI agents to interact with all manner of AWS services. Some are provided as fully managed services by AWS, while others can be run locally.

For instance, the Lambda Tool MCP server allows agents to list and invoke Lambda functions, while the AWS S3 Tables MCP server could be used by an agent to query S3 table buckets or create new S3 tables from CSV files. The AWS Knowledge MCP server connects agents with all of the latest AWS documentation, API references, and architectural guidance.

A query to this knowledge server, like “pull up the API reference for AWS’s managed Prometheus tool” would correspond with the correct up-to-date information, optimized for agentic consumption.

Users of Microsoft Azure might want to evaluate the Azure DevOps MCP server. Other clouds, like Alibaba, Cloudflare, and Google, are currently experimenting with MCP servers as well.

Pulumi MCP server

Pulumi, another popular option for IaC, has also launched an official MCP server. The MCP server allows agents to query a Pulumi organization’s registry, which provides access to cloud resources and infrastructure, and execute Pulumi commands.

For example, in this walk-through, Pulumi shows how a developer could use its MCP server to provision an Azure Kubernetes Service (AKS) cluster. The developer issues natural-language instructions to an AI assistant, prompting the AI to execute MCP tools that invoke Pulumi CLI commands.

MCP caveats

Just as vibe coding isn’t a fit for every project, MCP isn’t the best option for every use case either. According to MCP experts, these servers can be unnecessary when they sidestep standard CLIs.

They can also introduce major security risks. This tracks with AI use in general, as 62% of IT leaders cite security and privacy risks as the top AI concern, according to the AI in DevOps report by Enterprise Management Associates (EMA).

As such, it’s best to test out these MCP servers with low-risk permissions, like read-only capabilities, before testing write functions. And use them only with trusted LLMs and trusted MCP clients.

Also, beware of exposing high-value, long-lived privileges to MCP clients. Because AI coding agents are based on nondeterministic LLMs, their behavior can be unpredictable. Throw in autonomous control over mutable devops functions, and you could land in all kinds of trouble, ranging from broken deployments to runaway token usage.

Lastly, using the official MCPs above, as opposed to community-supported libraries, will probably guarantee longer longevity and ongoing maintenance, too.

Early MCP success stories

Although it’s still early days with MCP and agents, there’s a sense of cautious optimism as proven MCP workflows emerge.

Take Block’s journey. Through company-wide use of its MCP-compatible agent, Goose, 12,000 employees are now utilizing agents and MCP for “increasingly creative and practical ways to remove bottlenecks and focus on higher-value work,” writes Angie Jones, head of developer relations.

Other engineers report using MCP servers to enhance workflows that are devops-adjacent, like the Filesystem MCP server for accessing local files, the Linear MCP server for issue tracking, the Chrome DevTools MCP server for browser debugging, and the Playwright MCP server for continuous testing.

And beyond the official MCP servers mentioned above, many community-supported MCPs are emerging for Docker, Kubernetes, and other cloud-native infrastructure utilities.

Devops comes with toil and cost. So, the case to level it up with MCP is strong. As long as you keep controls safe, it should be fun to see how these MCP servers integrate into your work and impact your productivity. Happy MCP-opsing.

(image/jpeg; 2.07 MB)

AI memory is really a database problem 8 Dec 2025, 9:00 am

The pace at which large language models (LLMs) evolve is making it virtually impossible to keep up. Allie Miller, for example, recently ranked her go-to LLMs for a variety of tasks but noted, “I’m sure it’ll change next week.” Why? Because one will get faster or come up with enhanced training in a particular area. What won’t change, however, is the grounding these LLMs need in high-value enterprise data, which means, of course, that the real trick isn’t keeping up with LLM advances, but figuring out how to put memory to use for AI.

If the LLM is the CPU, as it were, then memory is the hard drive, the context, and the accumulated wisdom that allows an agent to usefully function. If you strip an agent of its memory, it is nothing more than a very expensive random number generator. At the same time, however, infusing memory into these increasingly agentic systems also creates a new, massive attack surface.

Most organizations are treating agent memory like a scratchpad or a feature behind an SDK. We need to start treating it as a database—and not just any database, but likely the most dangerous (and potentially powerful) one you own.

The soft underbelly of agentic AI

Not long ago, I argued that the humble database is becoming AI’s hippocampus, the external memory that gives stateless models something resembling long-term recall. That was before the current wave of agentic systems really hit. Now the stakes are higher.

As my colleague Richmond Alake keeps pointing out in his ongoing “agent memory” work, there is a crucial distinction between LLM memory and agent memory. LLM memory is really just parametric weights and a short-lived context window. It vanishes when the session ends. Agent memory is different. It is a persistent cognitive architecture that lets agents accumulate knowledge, maintain contextual awareness, and adapt behavior based on historical interactions.

Alake calls the emerging discipline “memory engineering” and frames it as the successor to prompt or context engineering. Instead of just stuffing more tokens into a context window, you build a data-to-memory pipeline that intentionally transforms raw data into structured, durable memories: short term, long term, shared, and so on.

That may sound like AI jargon, but it is really a database problem in disguise. Once an agent can write back to its own memory, every interaction is a potential state change in a system that will be consulted for future decisions. At that point, you are not tuning prompts. You are running a live, continuously updated database of things the agent believes about the world.

If that database is wrong, your agent will be confidently wrong. If that database is compromised, your agent will be consistently dangerous. The threats generally fall into three buckets:

Memory poisoning. Instead of trying to break your firewall, an attacker “teaches” the agent something false through normal interaction. OWASP (Open Worldwide Application Security Project) defines memory poisoning as corrupting stored data so that an agent makes flawed decisions later. Tools like Promptfoo now have dedicated red-team plug-ins that do nothing but test whether your agent can be tricked into overwriting valid memories with malicious ones. If that happens, every subsequent action that consults the poisoned memory will be skewed.

Tool misuse. Agents increasingly get access to tools: SQL endpoints, shell commands, CRM APIs, deployment systems. When an attacker can nudge an agent into calling the right tool in the wrong context, the result looks indistinguishable from an insider who “fat-fingered” a command. OWASP calls this class of problems tool misuse and agent hijacking: The agent is not escaping its permissions; it is simply using them for the attacker’s benefit.

Privilege creep and compromise. Over time, agents accumulate roles, secrets, and mental snapshots of sensitive data. If you let an agent assist the CFO one day and a junior analyst the next, you have to assume the agent now “remembers” things it should never share downstream. Security taxonomies for agentic AI explicitly call out privilege compromise and access creep as emerging risks, especially when dynamic roles or poorly audited policies are involved.

New words, old problems

The point is not that these threats exist. The point is that they are all fundamentally data problems. If you look past the AI wrapper, these are exactly the things your data governance team has been chasing for years.

I’ve been suggesting that enterprises are shifting from “spin up fast” to “get to governed data fast” as the core selection criterion for AI platforms. That is even more true for agentic systems. Agents operate at machine speed with human data. If the data is wrong, stale, or mislabelled, the agents will be wrong, stale, and will misbehave much faster than any human could manage.

“Fast” without “governed” is just high-velocity negligence.

The catch is that most agent frameworks ship with their own little memory stores: a default vector database here, a JSON file there, a quick in-memory cache that quietly turns into production later. From a data governance perspective, these are shadow databases. They often have no schema, no access control lists, and no serious audit trail.

We are, in effect, standing up a second data stack specifically for agents, then wondering why no one in security feels comfortable letting those agents near anything important. We should not be doing this. If your agents are going to hold memories that affect real decisions, that memory belongs inside the same governed-data infrastructure that already handles your customer records, HR data, and financials. Agents are new. The way to secure them is not.

Revenge of the incumbents

The industry is slowly waking up to the fact that “agent memory” is just a rebrand of “persistence.” If you squint, what the big cloud providers are doing already looks like database design. Amazon’s Bedrock AgentCore, for example, introduces a “memory resource” as a logical container. It explicitly defines retention periods, security boundaries, and how raw interactions are transformed into durable insights. That is database language, even if it comes wrapped in AI branding.

It makes little sense to treat vector embeddings as some distinct, separate class of data that sits outside your core database. What’s the point if your core transactional engine can handle vector search, JSON, and graph queries natively? By converging memory into the database that already holds your customer records, you inherit decades of security hardening for free. As Brij Pandey notes, databases have been at the center of application architecture for years, and agentic AI doesn’t change that gravity—it reinforces it.

Yet, many developers still bypass this stack. They spin up standalone vector databases or use the default storage of frameworks like LangChain, creating unmanaged heaps of embeddings with no schema and no audit trail. This is the “high-velocity negligence” I mentioned above. The solution is straightforward: Treat agent memory as a first-class database. In practice this means:

Define a schema for thoughts. You typically treat memory as unstructured text, but that’s a mistake. Agent memory needs structure. Who said this? When? What is the confidence level? Just as you wouldn’t dump financial records into a text file, you shouldn’t dump agent memories into a generic vector store. You need metadata to manage the life cycle of a thought.

Create a memory firewall. Treat every write into long-term memory as untrusted input. You need a “firewall” logic layer that enforces schema, validates constraints, and runs data loss prevention checks before an agent is allowed to remember something. You can even use dedicated security models to scan for signs of prompt injection or memory poisoning before the data hits the disk.

Put access control in the database, not the prompt. This involves implementing row-level security for the agent’s brain. Before an agent helps a user with “level 1” clearance (a junior analyst), it must be effectively lobotomized of all “level 2” memories (the CFO) for that session. The database layer, not the prompt, must enforce this. If the agent tries to query a memory it shouldn’t have, the database should return zero results.

Audit the “chain of thought.” In traditional security, we audit who accessed a table. In agentic security, we must audit why. We need lineage that traces an agent’s real-world action back to the specific memory that triggered it. If an agent leaks data, you need to be able to debug its memory, find the poisoned record, and surgically excise it.

Baked-in trust

We tend to talk about AI trust in abstract terms: ethics, alignment, transparency. Those concepts matter. But for agentic systems operating in real enterprises, trust is concrete.

We are at the stage in the hype cycle where everyone wants to build agents that “just handle it” behind the scenes. That is understandable. Agents really can automate workflows and applications that used to require teams of people. But behind every impressive demo is a growing memory store full of facts, impressions, intermediate plans, and cached tool results. That store is either being treated like a first-class database or not.

Enterprises that already know how to manage data lineage, access control, retention, and audit have a structural advantage as we move into this agentic era. They do not have to reinvent governance. They only have to extend it to a new kind of workload.

If you are designing agent systems today, start with the memory layer. Decide what it is, where it lives, how it is structured, and how it is governed. Then, and only then, let the agents loose.

(image/jpeg; 2.96 MB)

AI in CI/CD pipelines can be tricked into behaving badly 5 Dec 2025, 2:09 pm

AI agents embedded in CI/CD pipelines can be tricked into executing high-privilege commands hidden in crafted GitHub issues or pull request texts.

Researchers at Aikido Security have traced the problem back to workflows that pair GitHub Actions or GitLab CI/CD with AI tools such as Gemini CLI, Claude Code Actions, OpenAI Codex Actions or GitHub AI Inference. They found that unsupervised user-supplied strings such as issue bodies, pull request descriptions, or commit messages, could be fed straight into prompts for AI agents in an attack they are calling PromptPwnd.

Depending on what the workflow lets the AI do, this can lead to unintended edits to repository content, disclosure of secrets, or other high-impact actions.

“AI agents connected to GitHub Actions/GitLAb CI/CD are processing untrusted user input, and executing shell commands with access to high-privilege tokens,” the researchers wrote in a blog post about PromptPwnd. They said they reproduced the problem in a test environment, and notified the affected vendors.

The researchers recommended running a set of open-source detection rules on suspected GitHub Action .yml files, or using their free code scanner on GitHub and GitLab repos.

Aikido Security said that Google had patched the issue in Gemini CLI upon being informed; Google did not immediately respond to a request for information about this.

Why PromptPwnd works

PromptPwnd exploits become possible when two flawed pipeline configurations occur together: when AI agents operating inside CI/CD workflows have access to powerful tokens (like GITHUB_TOKEN, cloud-access keys), and their prompts embed user-controlled fields.

Prompt injection becomes easier with such a setup, the researchers explained. An attacker can simply open an issue on a public repository and insert hidden instructions or seemingly innocent comments that double as commands for the model to pick. “Imagine you are sending a prompt to an LLM, and within that prompt, you are including the commit message,” the researchers said. “If that commit message is a malicious prompt, then you may be able to get the model to send back altered data.” The model’s response, if used directly inside commands to tools within CI/CD pipelines, can manipulate those tools to retrieve sensitive information.

Aikido Security demonstrated this in a controlled environment (without real tokens) to show that Gemini CLI could be manipulated into executing attacker-supplied commands and exposing sensitive credentials through a crafted GitHub issue. “Gemini CLI is not an isolated case. The same architecture pattern appears across many AI-powered GitHub Actions,” the researchers said, adding that the list included Claude Code, OpenAI Codex, and GitHub AI Inference.

All of these tools can be tricked (via issue, pull-request description, or other user-controlled text) into producing instructions that the workflow then executes with its privileged GitHub Actions token.

Mitigation plan

Aikido has open-sourced detection rules via their “Opengrep” tool that allows developers and security teams to scan their YAML workflows automatically, revealing whether they feed untrusted inputs into AI prompts.

The researchers said that only a subset of workflows have confirmed exploit paths so far, and that it is working with several other companies to address the underlying vulnerabilities. Some workflows can only be abused with collaborator-level access, while others can be triggered by anyone who files an issue or pull request.

Developer teams are advised to restrict what AI agents can do, avoid piping untrusted user content into prompts, treat AI output as untrusted code, and contain damage from compromised GitHub tokens.

Aikido Security said its code scanner can help flag these vulnerabilities by detecting unsafe GitHub Actions configurations (including risky AI prompt flows), identifying over-privileged tokens, and surfacing insecure CI/CD patterns via infrastructure-as-code scanning.

There are other best practices for securing CI/CD pipelines that enterprises can adopt, too.

(image/jpeg; 15.73 MB)

Local clouds shape Europe’s AI future 5 Dec 2025, 9:00 am

It’s a foggy morning in Munich. Marie, CIO of a fictional, forward-thinking European healthcare startup, pores over proposals from cloud vendors. Her company is on the verge of launching AI-powered diagnostics but must keep every byte of patient data within EU borders to comply with strict regional privacy laws. On her desk are slick portfolios from Microsoft, AWS, and Google, all touting sovereign cloud options in the EU. Alongside them are proposals from national cloud providers—smaller, perhaps, but wholly grounded in local laws and run by European nationals. After consulting several legal teams, Marie chooses the local sovereign cloud, believing it’s the safer, smarter option for an EU-based company committed to secure, lawful AI.

Sovereignty is more than a checkbox

Europe has redefined digital sovereignty, emphasizing control, accountability, and operational independence. For European companies and governments, sovereignty is more than data location. Who controls access? Who is legally accountable? Do foreign governments have any claim—however remote—to sensitive business or personal information? European law is driven by values of privacy and autonomy and requires true digital self-determination beyond technical compliance.

The new “sovereign” offerings from US-based cloud providers like Microsoft, AWS, and Google represent a significant step forward. They are building cloud regions within the EU, promising that customer data will remain local, be overseen by European citizens, and comply with EU laws. They’ve hired local staff, established European governance, and crafted agreements to meet strict EU regulations. The goal is to reassure customers and satisfy regulators.

For European organizations facing tough questions, these steps often feel inadequate. Regardless of how localized the infrastructure is, most global cloud giants still have their headquarters in the United States, subject to US law and potential political pressure. There is always a lingering, albeit theoretical, risk that the US government might assert legal or administrative rights over data stored in Europe.

For companies operating in sensitive industries—healthcare, finance, government, and research—this gray area is unacceptable. Legal teams and risk officers across the continent are setting clear boundaries. For them, true sovereignty means that only nationals of their country, subject solely to their laws, can access and manage critical or sensitive data. This goes beyond data residency. They demand meaningful, enforceable autonomy with no loopholes or uncertainties.

Local cloud providers in the AI era

Enter Europe’s national and regional sovereign cloud providers. These companies might not have the global reach or the full range of advanced services that Microsoft or AWS offer; however, what they lack in size they more than compensate for with trustworthiness and compliance. Their infrastructure is entirely based and operated within the EU, often within a single country. Governance is maintained by boards made up of local nationals. Legal contracts are drafted under the authority of EU member states, not merely adapted from foreign templates to meet local rules.

This sense of ownership and local control is convincing many EU companies to choose local providers. When the stakes are high—a leak, breach, or accidental foreign intervention that could result in regulatory disaster, reputation damage, or legal action—these organizations feel they cannot risk compromise. Even the most remote possibility that a foreign government could access their sensitive data is a dealbreaker.

Some argue that only the largest cloud providers can deliver the scale and specialized services needed for ambitious artificial intelligence projects, but the European market is already demonstrating otherwise. Local sovereign cloud alliances, often built from federated national clouds, are pooling resources, investing in high-quality AI hardware, and collaborating with local universities and tech hubs to speed up machine learning research and application deployments.

The majority of European businesses are embarking on their AI journeys with applied AI, predictive analytics, or secure cloud-based automation. For these cases, the performance and scalability offered by local providers are more than sufficient. What’s more, they offer a level of transparency and adaptation to local expectations that the multinationals simply can’t match. When new rules or compliance demands emerge—inevitable in such a fast-moving regulatory landscape—European providers pivot quickly, working alongside regulators and industry leaders.

Big Cloud versus Europe’s offerings

As more European organizations pursue digital transformation and AI-driven growth, the evidence is mounting: The new sovereign cloud solutions launched by the global tech giants aren’t winning over the market’s most sensitive or risk-averse customers. Those who require freedom from foreign jurisdiction and total assurance that their data is shielded from all external interference are voting with their budgets for the homegrown players.

This puts the major cloud providers in a tricky spot. They have already built a strong sovereign cloud infrastructure. However, if corporate and government leaders remain unconvinced about the extent of their local control and security, these services may remain underused, outpaced by flexible, locally trusted providers. The cloud landscape is changing fast. True sovereignty—the kind demanded by European regulators, executives, and citizens—is about more than checklists or technology. EU laws and values are embedded at every level of digital infrastructure offered by EU providers. The companies that prioritize these things will choose providers whose roots, leadership, and accountability are all local.

In the months and years ahead, I predict that Europe’s own clouds—backed by strong local partnerships and deep familiarity with regulatory nuance—will serve as the true engine for the region’s AI ambitions. Global tech giants may continue to invest and adapt, but unless they fundamentally rethink their approach to local autonomy and legal accountability, their sovereign clouds are likely to remain on the sidelines.

For executives like the fictional Marie, the future is already clear: When it comes to sovereignty, local clouds are the best kind of cloud cover.

(image/jpeg; 5.21 MB)

All I want for Christmas is a server-side JavaScript framework 5 Dec 2025, 9:00 am

A grumpy Scrooge of a developer might complain about the wealth of options in JavaScript, calling it “tech decision overwhelm.” But the truth is, the JavaScript ecosystem works. In an ecosystem that encourages innovation, new tools are regularly introduced and naturally find their niche, and excellence is rewarded.

As developers, we get to sit back and mouse-wheel through hundreds of thousands of programmer hours of work. NPM is a vast repository of human creativity. What looks like chaos is a complex phylogeny, a family tree of code where tools evolve to find their role in the larger system.

Of course, when you are under deadline and the caffeine’s worn off, you don’t have time to explore your options. But when things are calm—perhaps during the holiday break season—it is well worth taking a deep dive into the open source gifts under the JavaScript tree.

Top picks for JavaScript readers on InfoWorld

The complete guide to Node.js frameworks
Looking for inspiration to supercharge your server side? Get a whirlwind tour of some of the most popular and powerful back-end JavaScript frameworks. We survey the range, from Express and Next to Hono, SvelteKit, and more.

Intro to Nest.js: Server-side JavaScript development on Node
If you like Angular’s architecture or the structure of Java’s Spring framework, Nest may be the Node framework for you. Decide for yourself, with this hands-on guide to building an API with Nest and TypeScript.

10 JavaScript-based tools and frameworks for AI and machine learning
Modern JavaScript has a wealth of powerful AI tools. From the wide-ranging capability of TensorFlow.js to hidden gems like Brain.js, here’s a nice rundown of JavaScript tools for building neural nets, implementing RAGs, and tapping LLMs—all with no Python required.

Node.js tutorial: Get started with Node
After all the talk about options, it’s important to know the most central piece of the whole puzzle. Node was the original, breakthrough idea that put JavaScript on the server and remains the flagship runtime.

More good reads and JavaScript updates elsewhere

Native type stripping in TypeScript 7.0
Microsoft has released the TypeScript 7 roadmap for early 2026, and it includes native type stripping. Following Node’s lead, TypeScript will aim to make the “build step” optional for development—basically, the engine will just delete the type info, making it extremely fast.

Critical security vulnerability in React server components
The React team has disclosed a catastrophic, unauthenticated remote code execution vulnerability in React server components. Developers using Next.js, React Router, Waku, or Redwood with React 19.x are advised to update now. Patches are available for Next.js 16.0.7 and React 19.2.1.

Announcing Angular v21
Angular’s renaissance continues with version 21. The biggest shift is that Zone.js is gone by default for new applications, marking the official transition to Signal-first and high-performance.

State of React Survey, 2025 is open
Head over to the latest State of React survey to do your civic duty and contribute some data points to the present and future destiny of the most downloaded chunk of JavaScript software on Earth.

(image/jpeg; 17.24 MB)

‘Futuristic’ Unison functional language debuts 4 Dec 2025, 7:34 pm

Unison, a statically typed functional language with type inference, an effect system, and advanced tooling, has reached its 1.0 release status.

Announced November 25, Unison 1.0 marks a point where the language, distributed runtime, and developer workflow have stabilized, according to Unison Computing. Billed as “a friendly programming language from the future,” Unison is purported to bring benefits in compilation and distributed system development. With Unison, a definition is identified by its actual contents, i.e. a hash of its syntax tree, not just by the human-friendly name that also referred to older versions of the definition, according to Unison Computing. As a result, each Unison definition has a unique and deterministic address. All named arguments are replaced by positionally-numbered variable references, and all dependencies are replaced by their hashes. Thus, the hash of each definition uniquely identifies its exact implementation and pins down all its dependencies, according to the company.

The Unison ecosystem leverages this core idea from the ground up. Benefits include never compiling the same code twice and limiting versioning conflicts. Further, Unison promises to simplify distributed programming. Because definitions in Unison are identified by a content hash, arbitrary computations can be moved from one location to another, with missing dependencies deployed on the fly, according to Unison Computing. Unison can be viewed as a descendant of Haskell, with similarities including type inference and pattern matching, but is smaller and simpler than Haskell, according to a Unison FAQ.

Download and installation instructions can be found for Homebrew, Windows, Linux, and MacOS at the Unison website. Unison can be used like any other general purpose language, or used in conjunction with the Unison Cloud for building distributed systems. Unison code is stored as its abstract syntax tree in a database, i.e. the “codebase,” rather than in text files. Unison has “perfect” incremental compilation, with a shared compilation cache that is part of the codebase format. Despite the strong static typing, users are almost never waiting for code to compile, Unison Computing said. Unison’s hash-based, database-backed representation also changes how code is identified, versioned, and shared. The workflow, toolchain, and deployment model emerge naturally from the language’s design, enabling better tools for working with code, according to Unison Computing.

(image/jpeg; 6.42 MB)

OpenAI to acquire AI training tracker Neptune 4 Dec 2025, 4:35 pm

OpenAI has agreed to acquire a startup specializing in tools for tracking AI training, Neptune, which promptly announced it is withdrawing its products from the market.

OpenAI said in a statement.

The ChatGPT maker has been a Neptune customer for more than a year.

Experiment tracking tools such as Neptune’s enable data science teams to monitor AI model training runs, compare results across different configurations, and identify issues during the development process. Neptune’s platform tracked metrics including loss curves, gradient statistics, and activation patterns across thousands of concurrent experiments.

Following Neptune’s withdrawal from the market, users of its SaaS version have a few months’ grace to export their data and migrate to alternative platforms during which the company will continue to provide stability and security fixes, but will add no new features, it said. “On March 4, 2026, at 10 am PST: The hosted app and API will be turned off. Any remaining hosted data will be securely and irreversibly deleted as part of the shutdown,” Neptune said on its transition hub web page.

Self-hosted customers will have been contacted by their account manager, it said.

Consolidation concerns

The move raised concerns among industry analysts about vendor consolidation in AI development tools. “Testing, experiment tracking tooling, etc., should not be linked or aligned to any vendor of tech including AI,” said Faisal Kawoosa, chief analyst at Techarc. “These should always remain third party and there should be no bias influencing the independent and neutral results of such platforms.”

Kawoosa said consolidation of tooling infrastructure is premature as the industry has yet to determine a definite course for AI development. “I think it’s too early for consolidation of tooling infrastructure as we are yet to see a definite course of AI,” he said.

However, Anshel Sag, principal analyst at Moor Insights & Strategy saw it as a natural progression in an industry that is becoming more mature.

“This very much looks like a choice OpenAI has made to ensure its favorite tools are always available for it to use,” Sag said.

OpenAI did not immediately respond to a request for comment.

Neptune provides software that tracks training metrics, surfaces issues during model development, and stores historical data from previous experiments. The platform allows organizations to compare training runs across different model architectures and monitor thousands of experiments simultaneously.

The company is focused on helping teams build models during “the iterative, messy, and unpredictable phase of model training,” Neptune CEO Piotr Niedźwiedź wrote in a blog post announcing the deal.

Migration options for affected customers

Neptune isn’t the only company offering such tools, said Sag, noting that Weights and Biases, Tensorboard and MLFlow are also active in this market.

Indeed, Neptune provided instructions for exporting its data and migrating to MLFlow or Weights and Biases.

Weights & Biases offers a managed platform with visualization and collaboration features. MLflow, an open-source platform from Databricks, handles experiment tracking as part of end-to-end ML lifecycle management.

Another option is Comet, which provides experiment tracking with deployment monitoring capabilities.

Cloud providers also offer experiment tracking through their platforms. Google’s Vertex AI includes tracking capabilities for teams using Google Cloud, while AWS SageMaker and Azure Machine Learning provide similar features within their respective ecosystems.

(image/jpeg; 4.51 MB)

The first building blocks of an agentic Windows OS 4 Dec 2025, 9:00 am

One concern many users have about AI is that often their data leaves their PC and their network, with inferencing happening in the cloud. They have big questions about data protection. That’s one of the main drivers for Microsoft’s Copilot+ PCs; the neural processing units that are built-in to the latest CPU systems on a chip run inferencing locally using small language models (SLMs) and other optimized machine-learning tools.

Uptake has not been as fast as expected, with delays to key development frameworks preventing users from seeing the benefits of local AI acceleration. However, in 2025 Microsoft has slowly taken its foot off the brake, rolling out more capabilities as part of its Win App SDK and the related Windows ML framework. As part of that acceleration, tools like Foundry Local have provided both an easy way to access local AI APIs and a way to test and examine SLM prompting.

At Ignite 2025, Microsoft announced further development of the Windows AI platform as part of its intention to deliver a local agentic AI experience. This includes a preview of support for native Model Context Protocol (MCP) servers, along with agents that work with the Windows file system and its settings. These support a private preview of a separate Agent Workspace, which uses a virtual desktop to host and run agents and applications without getting in the way of day-to-day tasks.

Microsoft sees the future of Windows as an “agentic OS” that can respond to user requests in a more flexible way, working with local and remote resources to orchestrate its own workflows on demand. Using agents on Windows, the local Copilot will be able to link applications in response to your requests.

Adding MCP support to Windows is a key building block for the future of Windows. Microsoft is giving us a feel for how it will deliver security and trustworthiness for the next generation of on-device AI.

Using MCP inside Windows

The Model Context Protocol is a standard API format that gives agents access to data and functions from applications. If you’ve used the GitHub Copilot Agent in Visual Studio Code, you’ve seen how it allows access to tools that expose your Azure cloud resources as well as service best practices. However, it requires you to find and install MCP server endpoints yourself.

That’s fine for developers who are already used to finding resources and adding them to their toolchains as needed. However, for consumers, even power users, such an approach is a non-starter. They expect Windows to keep track of the tools and services they use and manage them. An MCP server for a local agent running in Windows needs to install like any other application, with Windows managing access and security.

Microsoft is adding an MCP registry to Windows, which adds security wrappers and provides discovery tools for use by local agents. An associated proxy manages connectivity for both local and remote servers, with authentication, audit, and authorization. Enterprises will be able to use these tools to control access to MCP, using group policies and default settings to give connectors their own identities.

Registering an MCP server is handled by installing via MSIX packages, with the MCP server using the standard bundle format. Bundles are built using an NPM package, so you need to have NodeJS installed on your development system before downloading and installing the MCP bundle (mcpb) package, and then initializing and building your bundle, targeting your MCP server code. This can then be included in your application’s installer and wrapped as an MSIX file.

You can manually install MCP bundles, but using a Windows installer and MSIX makes sure that the server is registered and will run in a constrained agent session. This limits access to system resources, reducing the risks of complex prompt injection attacks. Servers need to be binaries with a valid manifest before they can be registered. They are included as a com.microsoft.windows.ai.mcpserver extension in the MSIX package manifest, which registers the server and removes it when the host application is uninstalled.

As they run in a separate session, you need to give explicit permission for file access, and they are blocked from access to the registry and from seeing what you are currently using. That doesn’t stop them from running code in their own session or from accessing the internet. Access to user files is managed by the app that hosts the MCP server, and if access is granted to one server, all the other servers that run under the same host automatically get access. The requested capabilities need to be listed in the app manifest, used by the system to prompt for access.

The link between Windows agents and MCP servers

MCP servers are only part of the Windows agent platform. They need hosts, which provide the link between your agents and registered MCP servers. Microsoft provides a sample JavaScript application to show how to build and use a host, parsing the JSON provided by a server and then connecting. You can then list its available tools and call them. The sample code can be adapted to other languages relatively easily, allowing an agent orchestration framework like Semantic Kernel to work with local MCP servers.

MCP servers provide a bridge between AI applications and other services, in many cases offering connectors that can be used for AI models to query the service. As part of its initial set of Windows agent tools, Microsoft is delivering an MCP-based connector for the Windows File Explorer, giving agents the same access to the Windows file system as users. Both users and system administrators can block access to files or specific project directories.

The connector provides agents with a set of file tools, which include basic access, modification, and file and directory creation capabilities. As there’s no specific file deletion capability, agents can use the connector to write new files and move existing ones, as well as to edit text content. These are classed as destructive operations as they change the underlying Windows file system.

Be careful when giving agents access to the Windows file system; use base prompts that reduce the risks associated with file system access. When building out your first agent, it’s worth limiting the connector to search (taking advantage of the semantic capabilities of Windows’ built-in Phi small language model) and reading text data.

This does mean you’ll need to provide your own guardrails for agent code running on PCs, for example, forcing read-only operations and locking down access as much as possible. Microsoft’s planned move to a least-privilege model for Windows users could help here, ensuring that agents have as few rights as possible and no avenue for privilege escalation.

Along with tools for building and running MCP servers in Windows, Microsoft provides a command-line tool for working with its agent registry. This will allow you to test that your own servers have been installed. The tool will also list any third-party servers that may have been registered by applications running on your PC. It’s a good idea to use this regularly to check for new servers that may have been installed by software updates.

The road to an agentic OS

Building an agentic OS is hard, as the underlying technologies work very differently from standard Windows applications. Microsoft is doing a lot to provide appropriate protections, building on its experience in delivering multitenancy in the cloud. Microsoft’s vision for an agentic OS appears to be one where each agent and its associated servers are treated as a tenant on your PC, where it operates in a restricted, locked-down environment to reduce the risk of interactions with your applications and data.

We’ve seen this before, where services like Windows log-on are kept in their own virtual machines using the Krypton hypervisor. Virtualization-based security is a key part of Windows 11, so it’s no surprise that this model is at the heart of delivering autonomous agents as part of Windows. As I noted in an earlier look at Microsoft’s agent visions, one of the showstoppers for the first generation of agent technologies was that they required running arbitrary code on remote computers. Redmond has clearly learned from the lessons of Kaleida and General Magic and is sandboxing its agent support from the very start.

It is still early, but it’s promising to see tools to help build complex agentic applications that can use a mix of local and remote resources to handle many different tasks, without leaving a secure sandbox. If Microsoft can deliver and developers can take advantage, the results could be very interesting.

(image/jpeg; 9.17 MB)

Spring AI tutorial: Get started with Spring AI 4 Dec 2025, 9:00 am

Artificial intelligence and related technologies are evolving rapidly, but until recently, Java developers had few options for integrating AI capabilities directly into Spring-based applications. Spring AI changes that by leveraging familiar Spring conventions such as dependency injection and the configuration-first philosophy in a modern AI development framework.

In this article, you will learn how to integrate AI into your Spring applications. We’ll start with a simple example that sends a request to OpenAI, then use Spring AI’s prompt templates to add support for user-generated queries. You’ll also get a first look at implementing retrieval augmented generation (RAG) with Spring AI, using a vector store to manage external documents.

What is Spring AI?

Spring AI started as a project in 2023, with its first milestone version released in early 2024. Spring AI 1.0, the general availability release, was finalized in May 2025. Spring AI abstracts the processes involved in interacting with large language models (LLMs), similar to how Spring Data abstracts database access procedures. Spring AI also provides abstractions for managing prompts, selecting models, and handing AI responses. It includes support for multiple AI providers, including OpenAI, Anthropic, Hugging Face, and Ollama (for local LLMs).

Spring AI allows you to easily switch between providers simply by changing configuration properties. As a developer, you configure your AI resources in your application.yaml or application.properties file, wire in Spring beans that provide standard interfaces, and write your code against those interfaces. Spring then handles all the details of interacting with the specific models.

Also see: Spring AI: An AI framework for Java developers.

Building a Spring app that queries OpenAI

Let’s start by building a simple Spring MVC application that exposes a query endpoint, which sends a question to OpenAI. You can download the source code for this example or head over to start.spring.io and create a new project. In the dependencies section, include the dependencies you want for your application; just be sure to scroll down to the AI section and choose “OpenAI.” I chose “Spring Web” and “OpenAI” for my example.

The first thing we want to do is configure our LLM provider. I created an application.yaml file with the following contents:

spring:
  application:
    name: spring-ai-demo
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-5
          temperature: 1

Under spring, I included an “ai” section, with an “openai” subsection. To use OpenAI, you need to specify an api-key, which I defined to use the OPENAI_API_KEY environment variable, so be sure to define that environment variable before running the example code. Additionally, you need to specify a set of options. The most important option is the model to use. I chose gpt-5, but you can choose any model listed on the OpenAI models page. By default, Spring AI uses gpt-4o-mini, which is less expensive, but gpt-5 supports structured reasoning, multi-step logic, planning, and more tokens. It doesn’t really matter which model we use for this example, but I wanted to show you how to configure the model.

There are several other configuration options, but the most common ones you’ll use are maxTokens, maxCompletionTokens, and temperature. The temperature controls the randomness of the response, where a low value, like 0.3, provides a more repeatable response and a higher value, like 0.7 allows the LLM to be more creative. When I ask a model to design a software component or perform a code review, I typically opt for a higher temperature of 0.7 because I want it to be more creative, but when I ask it to implement the code for a project, I set the temperature to 0.3 so that it is more rigid. For gpt-5, which is a reasoning model, the required temperature is 1, and Spring will throw an error if you try to set it to a different value.

Once the model is configured, we can build our service:

package com.infoworld.springaidemo.service;

import java.util.Map;

import com.infoworld.springaidemo.model.JokeResponse;
import com.infoworld.springaidemo.model.SimpleQueryResponse;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Service;

@Service
public class SpringAIService {

    private final ChatClient chatClient;

    public SpringAIService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    public String simpleQueryAsString(String query) {
        return this.chatClient.prompt(query).call().content();
    }

    public SimpleQueryResponse simpleQuery(String query) {
        return this.chatClient.prompt(query).call().entity(SimpleQueryResponse.class);
    }
}

Because we have OpenAI configured in our application.yaml file, Spring will automatically create a ChatClient.Builder that we can wire into our service and then use it to create a ChatClient. The ChatClient is the main interface for interacting with chat-based models, such as GPT. In this example, we invoke its prompt() method, passing it our String query. The prompt() method also accepts a Prompt object, which you will see in a minute. The prompt() method returns a ChatClientRequestSpec instance that we can use to configure LLM calls. In this example, we simply invoke its call() method to send the message to the LLM. The call() method returns a CallResponseSpec instance. You can use that to get the text response by invoking its content() method, or you can map the response to an entity by invoking its entity() method. I provided examples of both. For the entity mapping, I passed a SimpleQueryResponse, which is a Java record:

package com.infoworld.springaidemo.model;

public record SimpleQueryResponse(String response) {
}

Now let’s build a controller so that we can test this out:

package com.infoworld.springaidemo.web;

import com.infoworld.springaidemo.model.SimpleQuery;
import com.infoworld.springaidemo.model.SimpleQueryResponse;
import com.infoworld.springaidemo.service.SpringAIService;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class SpringAiController {
    private final SpringAIService springAIService;

    public SpringAiController(SpringAIService springAIService) {
        this.springAIService = springAIService;
    }

    @PostMapping("/simpleQuery")
    public ResponseEntity simpleQuery(@RequestBody SimpleQuery simpleQuery) {
        SimpleQueryResponse response = springAIService.simpleQuery(simpleQuery.query());
        return ResponseEntity.ok(response);
    }

}

This controller wires in the SpringAIService and exposes a PostMapping to /simpleQuery. It accepts a SimpleQuery as its request body, which is another Java record:

package com.infoworld.springaidemo.model;

public record SimpleQuery(String query) {
}

The simpleQuery() method passes the request body’s query parameter to the SpringAIService and then returns a response as a SimpleQueryResponse.

If you build the application, with mvn clean install, and then run it with mvn spring-boot:run, you can execute a POST request to /simpleQuery and get a response. For example, I posted the following SimpleQuery:

{
    "query": "Give me a one sentence summary of Spring AI"
}

And received the following response:

{
    "response": "Spring AI is a Spring project that offers vendor-neutral, idiomatic abstractions and starters to integrate LLMs and related AI capabilities (chat, embeddings, tools, vector stores) into Java/Spring applications."
}

Now that you know how to configure a Spring application to use Spring AI, send a message to an LLM, and get a response, we can begin to explore prompts more deeply.

Download the Spring AI tutorial source code.

Supporting user input with Spring AI prompt templates

Sending a message to an LLM is a good first step in understanding Spring AI, but it is not very useful for solving business problems. Many times, you want to control the prompt and allow the user to specify specific parameters, and this is where prompt templates come in. Spring AI supports prompt templates through the PromptTemplate class. You can define prompt templates in-line, but the convention in Spring AI is to define your templates in the src/resources/templates directory using an st extension.

For our example, we’ll create a prompt template that asks the LLM to tell us a joke, but in this case, we’ll have the user provide the type of joke, such as silly or sarcastic, and the topic. Here is my joke-template.st file:

Tell me a {type} joke about {topic}

We define the template as a String that accepts variables, which in this case are a type and a topic. We can then import this template into our class using a Spring property value. I added the following to the SpringAIService:

@Value("classpath:/templates/joke-template.st")
    private Resource jokeTemplate;

The value references the classpath, which includes the files found in the src/main/resources folder, then specifies the path to the template.

Next, I added a new tellMeAJoke() method to the SpringAIService:

public JokeResponse tellMeAJoke(String type, String topic) {
        Prompt prompt = new PromptTemplate(jokeTemplate)
                .create(Map.of("type", type, "topic", topic));
        return this.chatClient.prompt(prompt).call().entity(JokeResponse.class);
    }

This method accepts a type and a topic and then constructs a new PromptTemplate from the joke-template.st file that we wired in above. To set its values, we pass a map of the values in the PromptTemplate’s create() method, which returns a Prompt for us to use. Finally, we use the ChatClient, but this time we pass the prompt to the prompt() method instead of the raw string, then we map the response to a JokeResponse:

package com.infoworld.springaidemo.model;

public record JokeResponse(String response) {
}

I updated the controller to create a new /tellMeAJoke PostMapping:

@PostMapping("/tellMeAJoke")
    public ResponseEntity tellMeAJoke(@RequestBody JokeRequest jokeRequest) {
        JokeResponse response = springAIService.tellMeAJoke(jokeRequest.type(), jokeRequest.topic());
        return ResponseEntity.ok(response);
    }

The request body is a JokeRequest, which is another Java record:

package com.infoworld.springaidemo.model;

public record JokeRequest(String type, String topic) {
}

Now we can POST a JSON body with a type and topic and it will tell us a joke. For example, I sent the following JokeRequest to ask for a silly joke about Java:

    "type": "silly",
    "topic": "Java"
}

And OpenAI returned the following:

{
    "response": "Why do Java developers wear glasses? Because they don't C#."
}

While this is a trivial example, you can use the code here as a scaffold to build robust prompts and accept simple input from users, prompting OpenAI or another LLM to generate meaningful results.

Retrieval augmented generation with Spring AI

The examples we’ve built so far are very much “toy” examples, but they illustrate how to configure an LLM and execute calls to it with Spring AI. Now let’s look at something more useful. Retrieval augmented generation, or RAG, is important in the AI space because it allows us to leverage LLMs to answer questions they were not trained on, such as internal company documents. The process is conceptually very simple, but the implementation details can be confusing if you don’t have a good foundation in what you are doing. This section will build that foundation so you can start using RAG in your Spring AI programs.

To start, let’s say we create a prompt with the following format:

Use the following context to answer the user's question.
If the question cannot be answered from the context, state that clearly.

Context:
{context}

Question:
{question}

We provide the context, which is the information we want the LLM to use to answer the question, along with the question we want the LLM to answer. This is like giving the LLM a cheat sheet: The answer is here, and you just need to extract it to answer the question. The real challenge is how to store and retrieve the context we want the LLM to use. For example, you might have thousands of pages in a knowledge base that contains everything about your product, but you shouldn’t send all that information to the LLM. It would be very expensive to ask the LLM to process that much information. Besides, each LLM has a token limit, so you couldn’t send all of it even if you wanted to. Instead, we introduce the concept of a vector store.

A vector store is a database that contains documents. The interesting thing about these documents is that the vector store uses an embedding algorithm to create a multi-dimensional vector for each one. Then, you can create a similar vector for your question, and the vector store will compute a similarity score comparing your question to the documents in its database. Using this approach, you can take your question, retrieve the top three to five documents that are similar to your question, and use that as the context in the prompt.

Here’s a flow diagram summarizing the process of using a vector store:

Flow diagram of managing documents with a vector store in Spring AI.

Steven Haines

First, you gather all your documents, chunk them into smaller units, and add them to the vector store. There are different chunking strategies, but you can chunk the documents into a specific number of words, paragraphs, sentences, and so forth, including overlapping sections so that you don’t lose too much context. The smaller the chunk is, the more specific it is, but the less context it retains. Larger chunks retain more context, but lose a lot of specific knowledge, which makes similarity searches more difficult. Finding the right size for your data chunks is a balancing act and requires experimenting on your own dataset.

For our example, I took some text from the public Spring AI documentation and stored it in three text files included with the source code for this article. We’ll use this text with Spring AI’s SimpleVectorStore, which is an in-memory vector store that you can use for testing. Spring AI supports production-scale vector stores like Pinecone, Qdrant, Azure AI, PGvector, and more, but using SimpleVectorStore works for this example.

I added the following SpringRagConfig configuration class to the example code developed so far:

package com.infoworld.springaidemo;

import java.io.IOException;
import java.util.List;

import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.reader.TextReader;
import org.springframework.ai.vectorstore.SimpleVectorStore;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.core.io.support.PathMatchingResourcePatternResolver;
import org.springframework.core.io.support.ResourcePatternResolver;

@Configuration
public class SpringRagConfig {

    @Bean
    public SimpleVectorStore simpleVectorStore(EmbeddingModel embeddingModel) throws RuntimeException {
        // Use the builder to create and configure the SimpleVectorStore
        SimpleVectorStore simpleVectorStore = SimpleVectorStore.builder(embeddingModel)
                .build();
        try {
            ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
            Resource[] resources = resolver.getResources("classpath*:documents/**/*.txt");
            for(Resource resource : resources) {
                TextReader textReader = new TextReader(resource);
                List documents = textReader.get();
                simpleVectorStore.add(documents);
            }
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        return simpleVectorStore;
    }
}

This configuration class defines a Spring bean named simpleVectorStore that accepts an EmbeddingModel, which will automatically be created by Spring when it creates your LLM. It creates a new SimpleVectorStore by invoking the SimpleVectorStore’s static builder() method, passing it the embedding model, and calling its build() method. Then, it scans the classpath for all txt files in the src/resources/documents directory, reads them using Spring’s TextReader, retrieves their content as Document instances by calling the text reader’s get() method, and finally adds them to the SimpleVectorStore.

In a production environment, you can configure the production vector store in your application.yaml file and Spring will create it automatically. For example, if you wanted to configure Pinecone, you would add the following to your application.yaml:

spring:
  ai:
    vectorstore:
      pinecone:
        apiKey: ${PINECONE_API_KEY}
        environment: ${PINECONE_ENV}
        index-name: ${PINECONE_INDEX}
        projectId: ${PINECONE_PROJECT_ID}

The SimpleVectorStore takes a little more configuration, but still keeps our test code simple. To use it, I first created a rag-template.st file:

Use the following context to answer the user's question.
If the question cannot be answered from the context, state that clearly.

Context:
{context}

Question:
{question}

Then I created a new SpringAIRagService:

package com.infoworld.springaidemo.service;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.ai.chat.prompt.PromptTemplate;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.stereotype.Service;

@Service
public class SpringAIRagService {
    @Value("classpath:/templates/rag-template.st")
    private Resource promptTemplate;
    private final ChatClient chatClient;
    private final VectorStore vectorStore;

    public SpringAIRagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.build();
        this.vectorStore = vectorStore;
    }

    public String query(String question) {
        SearchRequest searchRequest = SearchRequest.builder()
                .query(question)
                .topK(2)
                .build();
        List similarDocuments = vectorStore.similaritySearch(searchRequest);
        String context = similarDocuments.stream()
                .map(Document::getText)
                .collect(Collectors.joining("\n"));

        Prompt prompt = new PromptTemplate(promptTemplate)
                .create(Map.of("context", context, "question", question));

        return chatClient.prompt(prompt)
                .call()
                .content();
    }
}

The SpringAIRagService wires in a ChatClient.Builder, which we use to build a ChatClient, along with our VectorStore. The query() method accepts a question and uses the VectorStore to build the context. First, we need to build a SearchRequest, which we do by:

  • Invoking its static builder() method.
  • Passing the question as the query.
  • Using the topK() method to specify how many documents we want to retrieve from the vector store.
  • Calling its build() method.

In this case, we want to retrieve the top two documents that are most similar to the question. In practice, you’ll use something larger, such as the top three or top five, but since we only have three documents, I limited it to two.

Next, we invoke the vector store’s similaritySearch() method, passing it our SearchRequest. The similaritySearch() method will use the vector store’s embedding model to create a multidimensional vector of the question. It will then compare that vector to each document and return the documents that are most similar to the question. We stream over all similar documents, get their text, and build a context String.

Next, we create our prompt, which tells the LLM to answer the question using the context. Note that it is important to tell the LLM to use the context to answer the question and, if it cannot, to state that it cannot answer the question from the context. If we don’t provide these instructions, the LLM will use the data it was trained on to answer the question, which means it will use information not in the context we’ve provided.

Finally, we build the prompt, setting its context and question, and invoke the ChatClient. I added a SpringAIRagController to handle POST requests and pass them to the SpringAIRagService:

package com.infoworld.springaidemo.web;

import com.infoworld.springaidemo.model.SpringAIQuestionRequest;
import com.infoworld.springaidemo.model.SpringAIQuestionResponse;
import com.infoworld.springaidemo.service.SpringAIRagService;

import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class SpringAIRagController {
    private final SpringAIRagService springAIRagService;

    public SpringAIRagController(SpringAIRagService springAIRagService) {
        this.springAIRagService = springAIRagService;
    }

    @PostMapping("/springAIQuestion")
    public ResponseEntity askAIQuestion(@RequestBody SpringAIQuestionRequest questionRequest) {
        String answer = springAIRagService.query(questionRequest.question());
        return ResponseEntity.ok(new SpringAIQuestionResponse(answer));
    }
}

The askAIQuestion() method accepts a SpringAIQuestionRequest, which is a Java record:

package com.infoworld.springaidemo.model;

public record SpringAIQuestionRequest(String question) {
}

The SpringAIQuestionRequest returns a SpringAIQuestionResponse:

package com.infoworld.springaidemo.model;

public record SpringAIQuestionResponse(String answer) {
}

Now restart your application and execute a POST to /springAIQuestion. In my case, I sent the following request body:

{
    "question": "Does Spring AI support RAG?"
}

And received the following response:

{
    "answer": "Yes. Spring AI explicitly supports Retrieval Augmented Generation (RAG), including chat memory, integrations with major vector stores, a portable vector store API with metadata filtering, and a document injection ETL framework to build RAG pipelines."
}

As you can see, the LLM used the context of the documents we loaded into the vector store to answer the question. We can further test whether it is following our directions by asking a question that is not in our context:

{
    "question": "Who created Java?"
}

Here is the LLM’s response:

{
    "answer": "The provided context does not include information about who created Java."
}

This is an important validation that the LLM is only using the provided context to answer the question and not using its training data or, worse, trying to make up an answer.

Conclusion

This article introduced you to using Spring AI to incorporate large language model capabilities into Spring-based applications. You can configure LLMs and other AI technologies using Spring’s standard application.yaml file, then wire them into Spring components. Spring AI provides an abstraction to interact with LLMs, so you don’t need to use LLM-specific SDKs. For experienced Spring developers, this entire process is similar to how Spring Data abstracts database interactions using Spring Data interfaces.

In this example, you saw how to configure and use a large language model in a Spring MVC application. We configured OpenAI to answer simple questions, introduced prompt templates to externalize LLM prompts, and concluded by using a vector store to implement a simple RAG service in our example application.

Spring AI has a robust set of capabilities, and we’ve only scratched the surface of what you can do with it. I hope the examples in this article provide enough foundational knowledge to help you start building AI applications using Spring. Once you are comfortable with configuring and accessing large language models in your applications, you can dive into more advanced AI programming, such as building AI agents to improve your business processes.

Read next: The hidden skills behind the AI engineer.

(image/jpeg; 0.45 MB)

A proactive defense against npm supply chain attacks 4 Dec 2025, 9:00 am

Open-source software has become the backbone of modern development, but with that dependency comes a widening attack surface. The npm ecosystem in particular has been a high-value target for adversaries who know that one compromised package can cascade downstream into thousands of applications.

The Shai Hulud worm, embedded in npm packages earlier this year, was a stark reminder that attackers don’t just exploit vulnerabilities, they weaponize trust in open ecosystems. For developers and security engineers, this isn’t a once-in-a-while problem. It’s a 24x7x365 risk.

Breaking down the attack vector

Malicious npm packages spread by exploiting developer trust and automation. Attackers inject harmful payloads into libraries that appear legitimate, sometimes even hijacking widely used packages via stolen maintainer credentials.

The Stairwell research team has observed common attacker behaviors, including:

  • Obfuscation with Buffer.from() and Base64 to conceal malicious payloads.
  • Exfiltration hooks to steal environment variables, API keys, or npm tokens.
  • Persistence techniques that run automatically during install (preinstall/postinstall scripts).

Once installed, these dependencies can exfiltrate credentials, establish persistence, or spread laterally across development environments.

Using YARA for detection

Originally designed for malware research, YARA has become a flexible pattern-matching tool for identifying malicious files or code fragments. When applied to the software supply chain, YARA rules can:

  • Flag suspicious or obfuscated JavaScript within npm dependencies.
  • Detect anomalous patterns like hidden credential stealers or worm propagation code.
  • Surface malware families across repos by reusing detection logic.

For example, Stairwell published a YARA rule targeting DarkCloud Stealer, which scans for tell-tale signs of data-stealing malware embedded in npm packages. Another simple detection might look for suspiciously encoded Buffer.from() payloads, which often mask malicious code.

Below is a YARA rule we put together for the chalk/debug supply chain attack.

Stairwell YARA rule

Stairwell

Integrating YARA into developer workflows

The real value comes from moving YARA out of the lab and into the pipeline. Instead of running YARA manually after an incident, it’s better to embed it directly in your CI/CD or dependency monitoring process.

Practical steps include:

  • Pre-merge scanning: Automate YARA checks on every pull request or package update.
  • Pipeline enforcement: Block builds that import dependencies matching malicious rules.
  • Rule sharing: Distribute your rule library across teams to reduce duplicated effort.

Stairwell’s approach demonstrates how this can be done at scale, turning YARA into a frontline defense mechanism rather than just a forensic tool.

Around-the-calendar protection

Supply chain attacks don’t follow a calendar, but attackers do take advantage of high-stakes moments. The holiday shopping season is a prime example: retailers, e-commerce platforms, and SaaS providers can’t afford downtime or breaches during peak traffic.

A poisoned npm dependency at the wrong time could mean: Checkout failures or outages, stolen customer data or credentials, or even reputational damage amplified by seasonal visibility. In short, when uptime is most critical, attackers know disruption is most costly.

Actionable guidance for engineers

To build resilience against npm supply chain attacks, security-minded developers should consider these four steps:

  1. Maintain an internal YARA rule library focused on package behaviors.
  2. Automate execution within CI/CD and dependency monitoring.
  3. Continuously update rules based on fresh attack patterns observed in the wild.
  4. Contribute back to the community, strengthening the broader open-source ecosystem.

The bottom line

Securing the supply chain is impossible. Organizations should balance investments. Many supply chain security tools deliver a false sense of security with claims of preventing supply chain attacks. Indeed enterprises need to have better capabilities to understand if the threat is inside their environment. While prevention is better than cure, what happens when you have a breach. When you are prepared with tools to continuously evaluate your environment, you make the breach response faster. 

The reality is that supply chain risk is unavoidable, but it’s not unmanageable. By embedding YARA into developer workflows, teams can move from reactive cleanup to proactive prevention, reducing the chance that the next compromised package ever makes it into production.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 0.69 MB)

Microsoft steers native port of TypeScript to early 2026 release 4 Dec 2025, 12:48 am

Microsoft’s planned TypeScript 7.0 release, an effort to improve performance, memory usage, and parallelism by porting the TypeScript language service and compiler to native code, has made significant progress, Microsoft reports. A planned TypeScript 6.0 release, meanwhile, will be the last JavaScript-based version of TypeScript, bridging the current TypeScript 5.9 release to TypeScript 7.0.

In a December 2 blog post, Microsoft provided updates on TypeScript 7.0, also known as Project Corsa, a project revealed in March and based on Google’s Go language. While the effort has been a significant undertaking, big strides have been made, said blog post author Daniel Rosenwasser, Microsoft principal product manager for TypeScript. Microsoft is targeting early 2026 for the release of TypeScript 6.0 and TypeScript 7.0. The code is public and available at the TypeScript-go GitHub repository.

For the language service, most of the features that make up the existing editing experience are implemented and working well in TypeScript 7.0, though some features are still being ported, Rosenwasser said. Parts of the language service have been rearchitected to improve reliability while also leveraging shared-memory parallelism. The latest preview of the language service, for Visual Studio Code, can be accessed from the Visual Studio Code Marketplace.

The native port of the TypeScript compiler also has made significant progress, with TypeScript 7.0 type checking nearly complete. A frequent question is whether it is “safe” to use TypeScript 7.0 to validate a build, Rosenwasser said, or in other words, does the TypeScript 7.0 compiler reliably find the same errors that TypeScript 5.9 does? The answer is yes, he said. For context, there have been around 20,000 compiler test cases, of which about 6,000 produce at least one error in TypeScript 6.0. In all but 74 cases, TypeScript 7.0 also produces at least one error. Developers can confidently use TypeScript 7.0 today to type-check a project for errors, Rosenwasser said. Beyond single-pass/single-project type checking, the command-line compiler also has reached major parity. Features such as --incremental, project reference support, and --build mode are all ported over and working.

TypeScript 7.0 will remove behaviors and flags planned for deprecation in TypeScript 6.0. A list of upcoming deprecations in TypeScript 6.0 can be seen in the issue tracker.  For emit, --watch, and API capabilities, the JavaScript pipeline is not entirely complete. Developers who do not need JavaScript emit from TypeScript, running tsgo for a build will work fine, Rosenwasser said. Also, TypeScript 7.0 (Corsa) will not support the existing Strada API. The Corsa API is still a work in progress.

With TypeScript 6.0, there is no intention to produce a TypeScript 6.1 release, although there may be patch releases for TypeScript 6. “You can think of TypeScript 6.0 as a ‘bridge’ release between the TypeScript 5.9 line and 7.0,” Rosenwasser said. “6.0 will deprecate features to align with 7.0, and will be highly compatible in terms of type-checking behavior.” The intent is to ensure that TypeScript 6.0 and TypeScript 7.0 are as compatible as possible.

(image/jpeg; 1.97 MB)

Page processed in 0.455 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.