Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Clarifai AI Runners connect local models to cloud | InfoWorld
Technology insight for the enterpriseClarifai AI Runners connect local models to cloud 9 Jul 2025, 12:12 am
AI platform company Clarifai has launched AI Runners, an offering designed to give developers and MLops engineers flexible options for deploying and managing AI models.
Unveiled July 8, AI Runners let users connect models running on local machines or private servers directly to Clarifai’s AI platform via a publicly accessible API, the company said. Noting the rise of agentic AI, Clarifai said AI Runners provide a cost-effective, secure solution for managing the escalating demands of AI workloads, describing them as “essentially ngrok for AI models, letting you build on your current setup and keep your models exactly where you want them, yet still get all the power and robustness of Clarifai’s API for your biggest agentic AI ideas.”
Clarifai said its platform allows developers to run their models or MCP (Model Context Protocol) tools on a local development machine, an on-premises server, or a private cloud cluster. Connection to the Clarifai API then can be done without complex networking, the company said. This means users can keep sensitive data and custom models within their own environment and leverage existing compute infrastructure without vendor lock-in. AI Runners enable serving of custom models through the Clarifai’s publicly accessible API, enabling integration into any application. Users can build multi-step AI workflows by chaining local models with thousands of models available on the Clarifai platform.
AI Runners thereby simplify the development workflow, making AI development accessible and cost-effectve by starting locally, then scaling to production in Kubernetes-based clusters on the Clarifai platform, the company said.
ECMAScript 2025 JavaScript standard approved 8 Jul 2025, 8:10 pm
ECMAScript 2025, the latest version of the ECMA International standard for JavaScript, has been officially approved. The specification standardizes new JavaScript capabilities including JSON modules, import attributes, new Set methods, sync iterator helpers, and regular expression modifiers.
The ECMAScript 2025 specification was finalized by ECMA International on June 25. All told, nine finished proposals on the ECMAScript development committee’s GitHub page were approved. Another proposal slated for 2025, for time duration formatting objects, appears on a different page. Development of ECMAScript is under the jurisdiction of the ECMA International Technical Committee 39 (TC39).
Note that many new JavaScript features appear in browsers even before new ECMAScript standards are approved. “One thing to note is that the vast majority of web developers are more attentive when these various features become available in their favorite browser or runtime as opposed to it being added to the JS spec, which happens afterwards,” said Ujjwal Sharma, co-chair of TC39 and co-editor of ECMA-402, the ECMAScript internationalization API specification.
For JSON modules, the proposal calls for importing JSON files as modules. This plan builds on the import attributes proposal to add the ability to import a JSON module in a common way across JavaScript environments.
For regular expressions, the regular expression escaping proposal is intended to address a situation in which developers want to build a regular expression out of a string without treating special characters from the string as special regular expression tokens, while the regular expression pattern modifiers provides the capability to control a subset of regular expression flags with a subexpression. Modifiers are especially helpful when regular expressions are defined in a context where executable code cannot be evaluated, such as a JSON configuration file of a Textmate language grammar file, the proposal states.
Also in the “regex” vein, the duplicate named capturing groups proposal allows regular expression capturing group names to be repeated. Prior to this proposal, named capturing groups in JavaScript were required to be unique.
The sync iterator helpers proposal introduces several interfaces to help with general usage and consumption of iterators in ECMAScript. Iterators are a way to represent large or possibly infinitely enumerable data sets.
Other finalized specifications for ECMAScript 2025 include:
- DurationFormat objects, an ECMAScript API specification proposal. Motivating this proposal is that users need all types of time duration formatting depending on the requirements of their application.
- Specifications and a reference implementation for Promise.try, which allows optimistically synchronous but safe execution of a function, and being able to work with a Promise afterward. It mirrors the async function.
- Float 16 on TypedArrays, DataView, and Math.f16round, which adds float16 (aka half-precision or binary16) TypedArrays to JavaScript. This plan would add a new kind of TypedArray, Float16Array, to complement the existing Float32Array and Float64Array. It also would add two new methods on DataView for reading and setting float16 values, as getFloat16 and setFloat16, to complement the existing similar methods for working with full and double precision floats. Also featured is Math.f16round, to complement the existing Math.fround. Among the benefits of this proposal is its usefulness for GPU operations.
- Import attributes, which provide syntax to import ECMAScript modules with assertions. An inline syntax for module import statements would pass on more information alongside the module specifier. The initial application for these attributes will be to support additional types of modules across JavaScript environments, beginning with JSON modules.
- Set methods for JavaScript, which add methods like union and intersection to JavaScript’s built-in Set class. Methods to be added include Set.prototype.intersection(other), Set.prototype.union(other), Set.prototype.difference(other), Set.prototype.symmetricDifference(other), Set.prototype.isSubsetOf(other), Set.prototype.isSupersetOf(other), Set.prototype.isDisjointFrom(other). These methods would require their arguments to be a Set, or at least something that looks like a Set in terms of having a numeric size property as well as keys and has methods.
The development of the ECMAScript language specification started in November 1996, based on several originating technologies including JavaScript and Microsoft’s JScript. Last year’s ECMAScript 2024 specification included features such as resizing and transferring ArrayBuffers and SharedArrayBuffers and more advanced regular expression features for working with sets of strings.
InfoWorld Technology of the Year Awards 2025 nominations now open 8 Jul 2025, 4:29 pm
Welcome to the 25th annual InfoWorld Technology of the Year Awards.
The InfoWorld Technology of the Year Awards recognize the best and most innovative products in software development, cloud computing, data analytics, and artificial intelligence and machine learning (AI/ML).
Since 2001, the InfoWorld Technology of the Year Awards have celebrated the most groundbreaking products in information technology—the products that change how companies operate and how people work. Winners will be selected in 30 product categories by a panel of judges based on technology impact, business impact, and innovation.
Enter here to win.
Nominations cost:
- $99 through Friday, July 18, 2025
- $149 through Friday, August 2, 2025
- $199 through Friday, August 15, 2025
Products must be available for sale and supported in the US to be eligible for consideration.
If you have any questions about the awards program, please contact InfoWorldAwards@foundryco.com.
Products in the following categories are eligible to win:
- AI and machine learning: Governance
- AI and machine learning: MLOps
- AI and machine learning: Models
- AI and machine learning: Platforms
- AI and machine learning: Security
- AI and machine learning: Tools
- API management
- API security
- Application management
- Application networking
- Application security
- Business intelligence and analytics
- Cloud backup and disaster recovery
- Cloud compliance and governance
- Cloud cost management
- Cloud security
- Data management: Databases
- Data management: Governance
- Data management: Integration
- Data management: Pipelines
- Data management: Streaming
- DevOps: Analytics
- DevOps: CI/CD
- DevOps: Code quality
- DevOps: Observability
- DevOps: Productivity
- Software development: Platforms
- Software development: Security
- Software development: Testing
- Software development: Tools
Read about the winners of InfoWorld’s 2024 Technology of the Year Awards here.
Metadata: Your ticket to the AI party 8 Jul 2025, 1:14 pm
Agentic AI is fundamentally reshaping how software interacts with the world. New frameworks for agent-to-agent collaboration and multi-agent control planes promise a future where software acts with more autonomy and shared context than ever before. Yet amid all this excitement, one quietly persistent idea holds everything together: metadata.
Known in data management circles for decades, metadata is the foundational layer determining whether your AI goals scale with confidence—across petabytes of data and hundreds of initiatives—or stutter into chaos and unreliability.
Many teams pour energy into large models and orchestration logic but overlook a simple truth: Without a modern metadata strategy, even the most advanced AI systems struggle to find the right data, interpret it correctly, and use it responsibly.
Metadata is the key that lets every asset, model, and agent know where it is, how it’s found, and what rules apply. In this new era of autonomous workflows and dynamic reasoning, it’s no exaggeration to call metadata your ticket to the AI party.
Discover, understand, trust, and use
Modern AI needs more than raw data. It needs context that evolves as new sources appear and applications multiply. This context is reflected in four practical capabilities essential for any robust metadata infrastructure: discover, understand, trust, and use.
Discover means navigating billions of objects without tedious manual work. A modern metadata system automates metadata harvesting across diverse data stores, lakes, and third-party databases. Smart cataloging and search capabilities let anyone ask, “Where is my customer data?” and get precise, policy-safe answers instantly.
Understand turns raw schema into human-friendly context. An effective metadata strategy enriches cataloged assets with business glossaries and collaborative documentation. Generative AI can help auto-describe technical fields and align them with familiar business language. These context shells ensure people and agents can reason clearly about what the data represents.
Trust flows from continuous quality and visible lineage. Metadata infrastructure should profile and score data health, flag issues automatically, and generate quality rules that scale as your footprint grows. Lineage graphs reveal how raw feeds turn into curated data products. This is governance at work behind the scenes, ensuring consistency and reliability without the overhead.
Use is where value becomes real. When discovery, understanding, and trust are robust, reliable data products become achievable. Teams can design these products with clear service level expectations, just like application contracts. They support dashboards for analysts and APIs for agents, all backed by real-time governance that follows the data.
From classic management to agentic reality
Metadata’s role has evolved dramatically. It used to index static tables for scheduled reports. Today’s agentic AI demands an always-on metadata layer that stays synchronized across petabytes and thousands of ever-changing sources.
Take a simple natural language query. A business user might ask, “Show me my top selling products this quarter.” A well-architected metadata layer resolves vague terms, maps them to trusted data sources, applies governance rules, and returns reliable, explainable answers. This happens instantly whether the request comes from a human analyst or an agent managing supply chain forecasts in real time.
Dataplex Universal Catalog: A unified approach to metadata management
At Google Cloud, we built Dataplex Universal Catalog to turn this vision into everyday reality. Rather than cobbling together separate catalogs, policy engines, and quality checks, Dataplex Universal Catalog weaves discovery, governance, and intelligent metadata management into a single cloud-native fabric. It transforms fragmented data silos into a governed, context-rich foundation ready to power both humans and agents.
Dataplex Universal Catalog combines cataloging, quality, governance, and intelligence in a single managed fabric. There’s no need to stitch together custom scripts to sync multiple tools. It automatically discovers and classifies assets from BigQuery, Cloud Storage, and other connected sources, stitching them into a unified searchable map. Its built-in quality engine runs profiling jobs “serverlessly” and surfaces issues early, preventing downstream problems.
Logical domains add another advantage. Teams can organize data by department, product line, or any meaningful business structure while governance policies cascade automatically. Sensitive information remains protected even when data is shared broadly or crosses projects and clouds. This is autonomous governance in action, where contracts and rules follow the data rather than relying on manual enforcement.
Open formats like Apache Iceberg make this approach portable. By integrating Iceberg, Dataplex Universal Catalog ensures tables stay versioned and compatible across engines and clouds. This supports hybrid lakes and multi-cloud setups without compromising fidelity or audit trails.
Winners and losers in the metadata race
Organizations that get this right will find that agentic AI drives speed and trust, not chaos. Their teams and agents will collaborate fluidly using governed, well-described data products. Natural language queries and autonomous workflows will operate as intended, the metadata layer handling complexity behind the scenes.
Those who neglect this foundation will likely find themselves reactively fixing errors, chasing missing context, and slowing innovation. Hallucinations, compliance slips, and unreliable AI outcomes often stem from weak metadata strategy.
In this new era, the smartest AI still depends on knowing what to trust and where to find it. Metadata is that compass. Dataplex provides the fabric to make it dynamic, secure, and open, your guaranteed ticket to join the AI party with confidence.
Learn more about Google Cloud’s data to AI governance solution here.
Microsoft brings OpenAI-powered Deep Research to Azure AI Foundry agents 8 Jul 2025, 11:26 am
Microsoft has added OpenAI-developed Deep Research capability to its Azure AI Foundry Agent service to help enterprises integrate research automation into their business applications.
The integration of research automation is made possible by Deep Research API and SDK, which can be used by developers to embed, extend, and orchestrate Deep Research-as-a-service across an enterprise’s ecosystem, including data and existing systems, Yina Arenas, VP of product at Microsoft’s Core AI division, wrote in a blog post.
[ Related: More OpenAi news and insights ]
Developers can use Deep Research to automate large-scale, source-traceable insights, programmatically build and deploy agents as services invokable by apps, workflows, or other agents, and orchestrate complex tasks using Logic Apps, Azure Functions, and Foundry connectors, Arenas added.
Essentially, the new capability is designed to help enterprises enhance their AI agents to conduct deeper analysis of complex data, enabling better decision-making and productivity, said Charlie Dai, vice president and principal analyst at Forrester.
“All major industries will benefit from this, such as investment insights generation for finance, drug discovery acceleration for healthcare, and supply chain optimization for manufacturing,” Dai added.
How does Deep Research work?
Deep Research, at its core, uses a combination of OpenAI and Microsoft technologies, such as o3-deep-research, other GPT models, and Grounding with Bing Search, when integrated into an agent.
When a research request is received by the agent that has Deep Research integrated — whether from a user or another application — the agent taps into GPT-4o and GPT-4.1 to interpret the intent, fill in any missing details, and define a clear, actionable scope for the task.
After the task has been defined, the agent activates the Bing-powered grounding tool to retrieve a refined selection of recent, high-quality web content.
Post this step, the o3-deep-research agent initiates the research process by reasoning through the gathered information and instead of simply summarizing content, it evaluates, adapts, and synthesizes insights across multiple sources, adjusting its approach as new data emerges.
The entire process results in a final output that is a structured report that documents not only the answer, but also the model’s reasoning path, source citations, and any clarifications requested during the session, Arenas explained.
Competition, pricing, and availability
Microsoft isn’t the only hyperscaler offering deep research capability.
“Google Cloud already provides Gemini Deep Research with its Gemini 2.5 Pro. AWS hasn’t offered cloud services on it, but it showcased Bedrock Deep Researcher as a sample application to automate the generation of articles and reports,” Dai said.
Microsoft, itself, offers the deep research capability inside its office suite of applications as Researcher in Microsoft 365 Copilot. OpenAI, too, has added the deep research capability inside its generative AI-based assistant, ChatGPT.
In terms of pricing, Deep Research inside Azure AI Foundry Agent Service will set back enterprises by $10 per million input tokens and $40 per million output tokens for just the 03-deep-research model.
Cached inputs for the model will cost $2.50 per million tokens, the company said.
Further, enterprises will incur separate charges for Grounding with Bing Search and the base GPT model being used for clarifying questions, it added.
How Deutsche Telekom designed AI agents for scale 8 Jul 2025, 9:00 am
Across 10 countries in Europe, Deutsche Telekom serves millions of users, each with their own questions, needs, and contexts. Responding quickly and accurately isn’t just good service; it builds trust, drives efficiency, and impacts the bottom line. But doing that consistently depends on surfacing the right information at the right time, in the right context.
In early 2023, I joined a small cross-functional team formed under an initiative led by our chief product officer, Jonathan Abrahamson. I was responsible for engineering and architecture within the newly formed AI Competence Center (AICC), with a clear goal: Improve customer service across our European operations. As large language models began to show real promise, it became clear that generative AI could be a turning point enabling faster, more relevant, and context-aware responses at scale.
This kicked off a focused effort to solve a core challenge: How to deploy AI-powered assistants reliably across a multi-country ecosystem? That led to the development of LMOS, a sovereign, developer-friendly platform for building and scaling AI agents across Telekom. Frag Magenta OneBOT, our customer-facing assistant for sales and service across Europe, was one of the first major products built on top of it. Today, LMOS supports millions of interactions, significantly reducing resolution time and human handover rates.
Just as important, LMOS was designed to let engineers work with tools they already know to build AI agents and has now reached a point where business teams can define and maintain agents for new use cases. That shift has been key to scaling AI with speed, autonomy, and shared ownership across the organization.
Building a sovereign, scalable agentic AI platform
Amid the urgency, there was also a quiet shift in perspective. This wasn’t just a short-term response; it was an opportunity to build something foundational — a sovereign platform, grounded in open standards, that would let our existing engineering teams build AI applications faster and with more flexibility.
In early 2023, production-ready generative AI applications were rare. Most work was still in early-stage retrieval-augmented generation (RAG) experiments, and the risk of becoming overly dependent on closed third-party platforms was hard to ignore. So instead of assembling a stack from scattered tools, we focused on the infrastructure itself, something that could grow into a long-term foundation for scalable, enterprise-grade AI agents.
It wasn’t just about solving the immediate problem. It was about designing for what would come next.
LMOS: Language Model Operating System
What started as a focused effort on chatbot development quickly surfaced deeper architectural challenges. We experimented with frameworks like LangChain, a popular framework for integrating LLMs into applications, and fine-tuned Dense Passage Retrieval (DPR) models for German-language use cases. These early prototypes helped us learn fast, but as we moved beyond experimentation, cracks started to show.
The stack became hard to manage. Memory issues, instability, and a growing maintenance burden made it clear this approach wouldn’t scale. At the same time, our engineers were already deeply familiar with Deutsche Telekom’s JVM-based systems, APIs, and tools. Introducing unfamiliar abstractions would have slowed us down.
So we shifted focus. Instead of forcing generative AI into fragmented workflows, we set out to design a platform that felt native to our existing environment. That led to LMOS, the Language Model Operating System, a sovereign PaaS for building and scaling AI agents across Deutsche Telekom. LMOS offers a Heroku-like experience for agents, abstracting away life-cycle management, deployment models, classifiers, observability, and scaling while supporting versioning, multitenancy, and enterprise-grade reliability.
At the core of LMOS is Arc, a Kotlin-based framework for defining agent behavior through a concise domain-specific language (DSL). Engineers could build agents using the APIs and libraries they already knew. No need to introduce entirely new stacks or rewire development workflows. At the same time, Arc was built to integrate cleanly with existing data science tools, making it easy to plug in custom components for evaluation, fine-tuning, or experimentation where needed.
Arc also introduced ADL (Agent Definition Language), which allows business teams to define agent logic and workflows directly, reducing the need for engineering involvement in every iteration and enabling faster collaboration across roles. Together, LMOS Arc, and ADL helped bridge the gap between business and engineering, while integrating cleanly with open standards and data science tools, accelerating how agents were built, iterated, and deployed across the organization.
Vector search and the role of contextual retrieval
By grounding LMOS in open standards and avoiding unnecessary architectural reinvention, we built a foundation that allowed AI agents to be designed, deployed, and scaled across geographies. But platform infrastructure alone wasn’t enough. Agent responses often depend on domain knowledge buried in documentation, policies, and internal data sources and that required retrieval infrastructure that could scale with the platform.
We built structured RAG pipelines powered by vector search to provide relevant context to agents at run time. Choosing the right vector store was essential. After evaluating various options from traditional database extensions to full-featured, dedicated vector systems we selected Qdrant, an open-source, Rust-based vector database that aligned with our operational and architectural goals. Its simplicity, performance, and support for multitenancy and metadata filtering made it a natural fit, allowing us to segment data sets by country, domain, and agent type, ensuring localized compliance and operational clarity as we scaled across markets.
Wurzel: Rooting retrieval in reusability
To support retrieval at scale, we also built Wurzel, an open-source Python ETL (extract, transform, load) framework tailored for RAG. Named after the German word for “root,” Wurzel enabled us to decentralize RAG workflows while standardizing how teams prepared and managed unstructured data. With built-in support for multitenancy, job scheduling, and back-end integrations, Wurzel made retrieval pipelines reusable, consistent, and easy to maintain across diverse teams and markets.
Wurzel also gave us the flexibility to plug in the right tools for the job without fragmenting the architecture or introducing bottlenecks. In practice, this meant faster iteration, shared infrastructure, and fewer one-off integrations.
Agent building with LMOS Arc and semantic routing
Agent development in LMOS starts with Arc. Engineers use its DSL to define behavior, connect to APIs, and deploy agents using microservice-style workflows. Once built, agents are deployed to Kubernetes environments via LMOS, which handles versioning, monitoring, and scaling behind the scenes.
But defining behavior wasn’t enough. Agents needed access to relevant knowledge to respond intelligently. Vector-powered retrieval pipelines fed agents with context from internal documentation, FAQs, and structured policies. Qdrant’s multi-tenant vector store provided localized, efficient, and compliant data access.
To make agent collaboration more effective, we also introduced semantic routing. Using embeddings and vector similarity, agents could classify and route customer queries, complaints, billing, and sales without relying entirely on LLMs. This brought greater structure, interpretability, and precision to how agents operated together.
Together, Arc, Wurzel, Qdrant, and the broader LMOS platform enabled us to build agents quickly, operate them reliably, and scale them across business domains without compromising developer speed or enterprise control.
‘Heroku’ for agents
I often describe LMOS as “Heroku for agents.” Just like Heroku abstracted the complexity of deploying web apps, LMOS abstracts the complexity of running production-grade AI agents. Engineers don’t need to manage deployment models, classifiers, monitoring, or scaling — LMOS handles all that.
Today, LMOS powers customer-facing agents, including the Frag Magenta OneBOT assistant. We believe this is one of the first multi-agent platforms to go live, with planning and deployment beginning before OpenAI released its agent SDK in early 2024. It is arguably the largest enterprise deployment of multiple AI agents in Europe, currently supporting millions of conversations across Deutsche Telekom’s markets.
The time required to develop a new agent has dropped to a day or less, with business teams now able to define and update operating procedures without relying on engineers. Handovers to human support for API-triggering Arc agents are around 30%, and we expect this to decrease as knowledge coverage, back-end integration, and platform maturity continue to improve.
Scaling sovereign AI with open source and community collaboration
Looking ahead, we see the potential applications of LMOS continuing to grow especially as agentic computing and retrieval infrastructure mature. From the beginning, we built LMOS on open standards and infrastructure primitives like Kubernetes, ensuring portability across developer machines, private clouds, and data centers.
In that same spirit, we decided to contribute LMOS to the Eclipse Foundation, allowing it to evolve with community participation and remain accessible to organizations beyond our own. As more teams begin to understand how semantic search and structured retrieval ground AI in trusted information, we expect interest in building on LMOS to increase.
What’s guided us so far isn’t just technology. It’s been a focus on practical developer experience, interoperable architecture, and hard-won lessons from building in production. That mindset has helped shift us from model-centric experimentation toward a scalable, open, and opinionated AI stack, something we believe is critical for bringing agentic AI into the real world, at enterprise scale.
Arun Joseph is former engineering and architecture lead at Deutsche Telecom.
—
Generative AI Insights provides a venue for technology leaders to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
What you absolutely cannot vibe code right now 8 Jul 2025, 9:00 am
LinkedIn has become the new Twitter now that Twitter is… well, X. LinkedIn is a place of shockingly bold claims. One person claimed to be so confident in agentic development that they are going to generate their own operating system on the level of iOS or Android. Ever the iconoclast, I pointed out that it was not possible they would ever publish or install it.
Another pitchman promoted the idea that large language models (LLMs) are producing more and higher-quality pull requests (PRs) than humans, based on the number of PRs on a tool and their acceptance rate. I pointed out that this isn’t possibly true. I wasn’t motivated to write something to classify them, but I sampled about 20. It turned out that the dashboard our enthusiast was looking at is picking up mainly people’s private projects, where they are basically auto-approving whatever the LLMs send (YOLO style), and a large number of the commits are LLM-style “everything that didn’t need to be said” documentation. Or as one person accepting the merge put it, “Feels like lot of garbage added — but looks relavant [sic]. Merging it as baseline, will refine later (if applicable).”
Don’t get me wrong, I think you should learn to use LLMs in your everyday coding process. And if any statistics or reported numbers are accurate, most of you are at least to some degree. However, I also think it is essential not to misrepresent what LLMs can currently do and what is beyond their capabilities at this point.
As mentioned in previous posts, all the current LLM-based tools are somewhat limiting and, frankly, annoying. So I’m writing my own. Honestly, I expected to be able to flat-out vibe code and generate the patch system. Surely the LLM knows how to make a system to accept patches from an LLM. It turns out that nothing could be further from the truth. First of all, diffing and patching are one of those deceptively complex areas of computing. It was a lesson I forgot. Secondly, writing a patch system to accept patches from something that isn’t very good at generating clean patches is much more complicated than writing one for something that produces patches with a clean algorithm. Generating a patch system that accepts patches from multiple models, each with its own quirks, is very challenging. It was so hard that I gave up and decided to find the best one and just copy it.
Trial and errors
The best patch system is Aider AI’s patch system. They publish benchmarks for every LLM, evaluating how well they generate one-shot patches. Their system isn’t state-of-the-art; it doesn’t even use tool calls. It’s largely hand-rolled, hard-won Python. The obvious thing to do was to use an LLM to port this to TypeScript, enabling me to use it in my Visual Studio Code plugin. That should be simple. Aside from that part, Aider had already figured out it’s a bunch of string utilities. There is no Pandas. There is no MATLAB. This is simply a string replacement.
I also wanted to benchmark OpenAI’s o3 running in Cursor vs. Anthropic’s Claude Opus 4 running in Claude Code. I had both of them create plans and critique each other’s plans. To paraphrase o3, Opus’s plan was overcomplicated and destined to fail. To paraphrase Claude Opus, o3’s code was too simplistic, and the approach pushed all the hard stuff to the end and was destined to fail.
Both failed miserably. In the process, I lost faith in Claude Opus to notice a simple problem and created a command-line tool I called asko3 (which later became “o3Helper”) so that Claude could just ask o3 before it made any more mistakes. I lost faith in Cursor being able to keep their back end running and reply to any requests, so o3 in Cursor lost by default. Onward with the next combo, standalone Claude Opus 4 advised by standalone o3.
That plan also failed miserably. o3 suggested that Opus had created a “cargo cult” implementation (its term, not mine) of what Aider’s algorithm did. It suggested that the system I use for creating plans was part of the problem. Instead, I created a single document plan. Then I had o3 do most of the implementation (from inside Claude Code). It bungled it completely. I had Claude ask o3 to review its design without telling it that it was its own design. It eviscerated it. Claude called the review “brutal but accurate.”
Finally, I still needed my patch system to work and really didn’t care to hand-code the TypeScript. I had Claude copy the comments over from Aider’s implementation and create a main method that served as a unit test. Then I had Claude port each method over one at a time. When something failed, I suggested a realignment method by method. I reviewed each decision, and then we reviewed the entire process — success. This was as far from vibe coding as you can be. It wasn’t much faster than typing it myself. This was just a patch algorithm.
The fellow hoping to “generate an operating system” faces many challenges. LLMs are trained on a mountain of CRUD (create, read, update, delete) code and web apps. If that is what you are writing, then use an LLM to generate virtually all of it — there is no reason not to. If you get down into the dirty weeds of an algorithm, you can generate it in part, but you’ll have to know what you’re doing and constantly re-align it. It will not be simple.
Good at easy
This isn’t just me saying this, this is what studies show as well. LLMs fail at hard and medium difficulty problems where they can’t stitch together well-known templates. They also have a half-life and fail when problems get longer. Despite o3’s (erroneous in this case) supposition that my planning system caused the problem, it succeeds most of the time by breaking up the problem into smaller parts and forcing the LLM to align to a design without having to understand the whole context. In short I give it small tasks it can succeed at. However, one reason the failed is that despite all the tools created there are only about 50 patch systems out there in public code. With few examples to learn from, they inferred that unified diffs might be a good way (they aren’t generally). For web apps, there are many, many examples. They know that field very well.
What to take from this? Ignore the hype. LLMs are helpful, but truly autonomous agents are not developing production-level code at least not yet. LLMs do best at repetitive, well-understood areas of software development (which are also the most boring). LLMs fail at novel ideas or real algorithmic design. They probably won’t (by themselves) succeed anywhere there aren’t a lot of examples in GitHub.
What not to take from this? Don’t conclude that LLM’s are totally useless, and that you must be a software craftsman and lovingly hand-code your CSS and HTML and repetitive CRUD code like your pappy before you. Don’t think that LLMs are useless if you are working on a hard problem. They can help; they just can’t implement the whole thing for you. I didn’t have to search for the name of every TypeScript string library that matched the Python libraries. The LLM did it for me. Had I started with that as a plan, it would have gone quickly.
If you’re doing a CRUD app, doing something repetitive, or tackling a problem domain where there are lots of training materials out there, you can rely on the LLMs. If you’re writing an operating system, then you will need to know how to write an operating system and the LLM can type for you. Maybe it can do it in Rust where you did it last time in C, because you know all about how to write a completely fair scheduler. If you’re a full-stack Node.js developer, you will not be (successfully) ChatGPT-ing an iOS alternative because you are mad at Apple.
Nvidia doubles down on GPUs as a service 8 Jul 2025, 9:00 am
Nvidia’s recent initiative to dive deeper into the GPU-as-a-service (GPUaaS) model marks a significant and strategic shift that reflects an evolving landscape within the cloud computing market. As enterprises increase their reliance on artificial intelligence (AI) and machine learning (ML) technologies, the demand for high-performance computing has surged. Nvidia’s move is not only timely, but also could prove to be a game-changer, particularly as organizations aim to adopt more cost-effective GPU solutions while still leveraging public cloud resources.
Services like Nvidia’s DGX Cloud Lepton are designed to connect AI developers with a vast network of cloud service providers. Nvidia is offering access to its unparalleled GPU technology through various platforms, allowing enterprises to scale their AI initiatives without significant capital expenditures on hardware.
The crowded GPU cloud market
Nvidia’s innovations are groundbreaking, but the dominant players—Amazon Web Services, Google Cloud, and Microsoft Azure—continue to hold substantial market share. Each of these hyperscalers has developed in-house alternatives, such as AWS’s Trainium, Google’s Tensor processing units (TPUs), and Microsoft’s Maia. This competition, more than mere rivalry, also caters to the unique requirements of different workloads, prompting enterprises to carefully evaluate their GPU needs.
Organizations need to consider that although these solutions offer state-of-the-art GPU capabilities, they often come with significant costs. Accessing GPU cloud services can strain budgets, especially when the rates charged by hyperscalers tend to far exceed the purchase costs of the GPUs themselves. Therefore, it’s vital for enterprises to assess the long-term affordability of their GPU solution choices carefully.
Enterprises seeking to adopt AI and ML are driven to find more cost-effective GPU solutions, and Nvidia’s foray into GPUaaS presents an attractive alternative. Leveraging Nvidia’s technology as a cloud service allows organizations to access GPU resources on a consumption basis, eliminating the need for significant upfront investments while ensuring access to leading-edge technology.
This does not negate the necessity for organizations to evaluate their GPU consumption strategies. In an escalating trend where enterprises are drawn to the benefits of GPUaaS to streamline their operations, decisions made today will have lasting implications for 10 or more years into the future. Given the rapid pace of technology advancement and market shifts, enterprise leaders should consider a strategy that remains adaptable and financially sustainable.
Embracing a multicloud strategy
In the crowded GPU marketplace, enterprises should strongly consider a multicloud strategy. By leveraging multiple cloud providers, organizations can access a diverse range of GPU offerings. They retain the flexibility to assess and select the services and pricing that best meet their evolving needs while keeping options open for future innovation.
A multicloud approach also effectively dispels concerns over price increases or shifts in capabilities. Greater diversity in cloud resources can alleviate risks associated with relying on a single provider. With Nvidia’s DGX Cloud Lepton service and its Industrial AI Cloud tailored for specific industries, companies can harness more specialized GPU resources based on their industry needs, further enhancing their operational efficiencies.
In the pursuit of optimal performance, enterprises should prioritize a best-of-breed cloud strategy that incorporates GPU solutions. This strategy emphasizes selecting cloud providers and GPU services that offer unparalleled capabilities tailored to business needs. By critically evaluating each option based on performance, pricing, and future scalability, businesses can harness the best tools available to meet their needs.
Nvidia’s current offerings serve as a prime example of why a best-of-breed approach is essential. Their focus on specialized services for diverse industrial sectors—like the Industrial AI Cloud—demonstrates an understanding of the unique demands of various industries. As enterprises pursue digital transformation, aligning with providers that deliver tailored solutions can offer competitive advantages and help streamline operations.
Long-term implications
The transition to AI-driven frameworks and the urgency surrounding digital transformation mean that businesses stand at a crossroads where choices must be grounded in readiness for the future. The strategic implications of selecting between hyperscaler offerings and Nvidia’s innovations should not be taken lightly.
Additionally, the cost of GPUs should always be weighed against the operational needs they fulfill. Many organizations are eager to consume GPU services, but it is critical to remember that the cost of these services is often higher than purchasing the hardware outright. A fully informed decision will consider total cost of ownership, performance metrics, and long-term strategic alignment.
As enterprises increasingly turn towards AI and ML technologies, Nvidia’s strategic move into the GPUaaS landscape shapes the future of cloud computing. Although the GPU cloud market may be saturated with options, Nvidia’s moves introduce new avenues for cost-effective and tailored GPU access, positioning Nvidia as a formidable player alongside its hyperscaler competitors.
A multicloud deployment, alongside a commitment to best-of-breed cloud solutions, will empower organizations to make informed decisions that drive long-term success. Ultimately, investing time and resources into these strategic considerations today may define operational efficiency and competitiveness for a decade into the future. As the landscape continues to change, being able to adapt will be key to thriving in the new era of cloud computing.
Deno 2.4 restores JavaScript bundling subcommand 7 Jul 2025, 10:19 pm
Deno 2.4, the latest version of Deno Land’s JavaScript and TypeScript runtime, has been released with the restoration of the deno bundle
subcommand for creating single-file JavaScript bundles.
Announced July 2, Deno 2.4 also stabilizes Deno’s built-in OpenTelemetry support for collecting and exporting telemetry data, and offers easier dependency management, Deno Land said. Current users of Deno can upgrade to Deno 2.4 by running the deno upgrade
command in their terminal. Installation instructions for new users can be found here.
Deno 2.4 restores the deno bundle
subcommand for creating single-file JavaScript bundles from JavaScript or TypeScript. This command supports both server-side and browser platforms and works with NPM and JSR (JavaScript Registry) dependencies. Automatic tree-shaking and minification are supported via the esbuild bundler. Future plans call for adding a runtime to make bundling available programmatically. Additionally, plugins are to be added for customizing how the bundler processes modules during the build process.
Also in Deno 2.4, OpenTelemetry support, which auto-instruments the collection of logs, metrics, and traces for a project, is now stable. OpenTelemetry support was introduced in Deno 2.2 in February. Improving dependency management in Deno 2.4, a new deno update
subcommand lets developers update dependencies to the latest versions. The command will update NPM and JSR dependencies listed in deno.json or package.json files to the latest semver-compatible versions.
Elsewhere in Deno 2.4:
- The Deno environment now can be modified with a new
--preload
flag that executes code before a main script. This is useful when a developer is building their own platform and needs to modify globals, load data, connect to databases, install dependencies, or provide other APIs. - Node global variables were added, including
Buffer
,global
,setImmediate
, andclearImmediate
. An--unstable-node-globals
flag is no longer needed for exposing this set of globals. - Support for Node.js APIs has again been improved.
- A new environment variable,
DENO_COMPAT=1
, was introduced that will tell Deno to enable a set of flags to improve ergonomics when using Deno in package.json projects. fetch
now works over Unix and Vsock sockets.
Advanced unit testing with JUnit 5, Mockito, and Hamcrest 7 Jul 2025, 9:00 am
In this second half of a two-part introduction to JUnit 5, we’ll move beyond the basics and learn how to test more complicated scenarios. In the previous article, you learned how to write tests using the @Test
and @ParameterizedTest
annotations, validate test results using JUnit 5’s built-in assertions, and work with JUnit 5 lifecycle annotations and tags. In this article, we’ll focus more on integrating external tools with JUnit 5.
You’ll learn:
- How to use Hamcrest to write more flexible and readable test cases.
- How to use Mockito to create mock dependencies that let you simulate any scenario you want to test.
- How to use Mockito spies to ensure that method calls return the correct values, as well as verify their behavior.
Using JUnit 5 with an assertions library
For most circumstances, the default assertions methods in JUnit 5 will meet your needs. But if you would like to use a more robust assertions library, such as AssertJ, Hamcrest, or Truth, JUnit 5 provides support for doing so. In this section, you’ll learn how to integrate Hamcrest with JUnit 5.
Hamcrest with JUnit 5
Hamcrest is based on the concept of a matcher, which can be a very natural way of asserting whether or not the result of a test is in a desired state. If you have not used Hamcrest, examples in this section should give you a good sense of what it does and how it works.
The first thing we need to do is add the following additional dependency to our Maven POM file (see the previous article for a refresher on including JUnit 5 dependencies in the POM):
org.hamcrest
hamcrest
3.0
test
When we want to use Hamcrest in our test classes, we need to leverage the org.hamcrest.MatcherAssert.assertThat
method, which works in combination with one or more of its matchers. For example, a test for String
equality might look like this:
assertThat(name, is("Steve"));
Or, if you prefer:
assertThat(name, equalTo("Steve"));
Both matchers do the same thing—the is()
method in the first example is just syntactic sugar for equalTo()
.
Hamcrest defines the following common matchers:
- Objects:
equalTo, hasToString, instanceOf, isCompatibleType, notNullValue, nullValue, sameInstance
- Text:
equalToIgnoringCase, equalToIgnoringWhiteSpace, containsString, endsWith, startsWith
- Numbers:
closeTo, greaterThan, greaterThanOrEqualTo, lessThan, lessThanOrEqualTo
- Logical:
allOf, anyOf, not
- Collections:
array
(compare an array to an array of matchers),hasEntry, hasKey, hasValue, hasItem, hasItems, hasItemInArray
The following code sample shows a few examples of using Hamcrest in a JUnit 5 test class.
Listing 1. Using Hamcrest in a JUnit 5 test class (HamcrestDemoTest.java)
package com.javaworld.geekcap.hamcrest;
import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;
import java.util.ArrayList;
import java.util.List;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.*;
class HamcrestDemoTest {
@Test
@DisplayName("String Examples")
void stringExamples() {
String s1 = "Hello";
String s2 = "Hello";
assertThat("Comparing Strings", s1, is(s2));
assertThat(s1, equalTo(s2));
assertThat("ABCDE", containsString("BC"));
assertThat("ABCDE", not(containsString("EF")));
}
@Test
@DisplayName("List Examples")
void listExamples() {
// Create an empty list
List list = new ArrayList();
assertThat(list, isA(List.class));
assertThat(list, empty());
// Add a couple items
list.add("One");
list.add("Two");
assertThat(list, not(empty()));
assertThat(list, hasSize(2));
assertThat(list, contains("One", "Two"));
assertThat(list, containsInAnyOrder("Two", "One"));
assertThat(list, hasItem("Two"));
}
@Test
@DisplayName("Number Examples")
void numberExamples() {
assertThat(5, lessThan(10));
assertThat(5, lessThanOrEqualTo(5));
assertThat(5.01, closeTo(5.0, 0.01));
}
}
One thing I like about Hamcrest is that it is very easy to read. For example, “assert that name is Steve
,” “assert that list has size 2
,” and “assert that list has item Two
” all read like regular sentences in the English language. In Listing 1, the stringExamples
test first compares two String
s for equality and then checks for substrings using the containsString()
method. An optional first argument to assertThat()
is the “reason” for the test, which is the same as the message in a JUnit assertion and will be displayed if the test fails. For example, if we added the following test, we would see the assertion error below it:
assertThat("Comparing Strings", s1, is("Goodbye"));
java.lang.AssertionError: Comparing Strings
Expected: is "Goodbye"
but: was "Hello"
Also note that we can combine the not()
logical method with a condition to verify that a condition is not true. In Listing 1, we check that the ABCDE String
does not contain substring EF
using the not()
method combined with containsString()
.
The listExamples
creates a new list and validates that it is a List.class
, and that it’s empty. Next, it adds two items, then validates that it is not empty and contains the two elements. Finally, it validates that it contains the two String
s, "One"
and "Two"
, that it contains those String
s in any order, and that it has the item "Two"
.
Finally, the numberExamples
checks to see that 5 is less than 10, that 5 is less than or equal to 5, and that the double 5.01 is close to 5.0 with a delta of 0.01, which is similar to the assertEquals
method using a delta, but with a cleaner syntax.
If you’re new to Hamcrest, I encourage you to learn more about it from the Hamcrest website.
Introduction to Mock objects with Mockito
Thus far, we’ve only reviewed testing simple methods that do not rely on external dependencies, but this is far from typical for large applications. For example, a business service probably relies on either a database or web service call to retrieve the data that it operates on. So, how would we test a method in such a class? And how would we simulate problematic conditions, such as a database connection error or timeout?
The strategy of mock objects is to analyze the code behind the class under test and create mock versions of all its dependencies, creating the scenarios that we want to test. You can do this manually—which is a lot of work—or you could leverage a tool like Mockito, which simplifies the creation and injection of mock objects into your classes. Mockito provides a simple API to create mock implementations of your dependent classes, inject the mocks into your classes, and control the behavior of the mocks.
The example below shows the source code for a simple repository.
Listing 2. Example repository (Repository.java)
package com.javaworld.geekcap.mockito;
import java.sql.SQLException;
import java.util.Arrays;
import java.util.List;
public class Repository {
public List getStuff() throws SQLException {
// Execute Query
// Return results
return Arrays.asList("One", "Two", "Three");
}
}
This next listing shows the source code for a service that uses this repository.
Listing 3. Example service (Service.java)
package com.javaworld.geekcap.mockito;
import java.sql.SQLException;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
public class Service {
private Repository repository;
public Service(Repository repository) {
this.repository = repository;
}
public List getStuffWithLengthLessThanFive() {
try {
return repository.getStuff().stream()
.filter(stuff -> stuff.length()
The Repository
class in Listing 2 has a single method, getStuff
, that would presumably connect to a database, execute a query, and return the results. In this example, it simply returns a list of three String
s. The Service
class in Listing 3 receives the Repository
through its constructor and defines a single method, getStuffWithLengthLessThanFive
, which returns all String
s with a length less than 5. If the repository throws an SQLException
, then it returns an empty list.
Unit testing with JUnit 5 and Mockito
Now let’s look at how we can test our service using JUnit 5 and Mockito. Listing 4 shows the source code for a ServiceTest
class.
Listing 4. Testing the service (ServiceTest.java)
package com.javaworld.geekcap.mockito;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.Mockito;
import org.mockito.junit.jupiter.MockitoExtension;
import java.sql.SQLException;
import java.util.Arrays;
import java.util.List;
@ExtendWith(MockitoExtension.class)
class ServiceTest {
@Mock
Repository repository;
@InjectMocks
Service service;
@Test
void testSuccess() {
// Setup mock scenario
try {
Mockito.when(repository.getStuff()).thenReturn(Arrays.asList("A", "B", "CDEFGHIJK", "12345", "1234"));
} catch (SQLException e) {
e.printStackTrace();
}
// Execute the service that uses the mocked repository
List stuff = service.getStuffWithLengthLessThanFive();
// Validate the response
Assertions.assertNotNull(stuff);
Assertions.assertEquals(3, stuff.size());
}
@Test
void testException() {
// Setup mock scenario
try {
Mockito.when(repository.getStuff()).thenThrow(new SQLException("Connection Exception"));
} catch (SQLException e) {
e.printStackTrace();
}
// Execute the service that uses the mocked repository
List stuff = service.getStuffWithLengthLessThanFive();
// Validate the response
Assertions.assertNotNull(stuff);
Assertions.assertEquals(0, stuff.size());
}
}
The first thing to notice about this test class is that it is annotated with @ExtendWith(MockitoExtension.class)
. The @ExtendWith
annotation is used to load a JUnit 5 extension. JUnit defines an extension API, which allows third-party vendors like Mockito to hook into the lifecycle of running test classes and add additional functionality. The MockitoExtension
looks at the test class, finds member variables annotated with the @Mock
annotation, and creates a mock implementation of those variables. It then finds member variables annotated with the @InjectMocks
annotation and attempts to inject its mocks into those classes, using either construction injection or setter injection.
In this example, MockitoExtension
finds the @Mock
annotation on the Repository
member variable, so it creates a mock implementation and assigns it to the repository variable. When it discovers the @InjectMocks
annotation on the Service
member variable, it creates an instance of the Service
class, passing the mock Repository
to its constructor. This allows us to control the behavior of the mock Repository
class using Mockito’s APIs.
In the testSuccess
method, we use the Mockito API to return a specific result set when its getStuff
method is called. The API works as follows:
- First, the Mockito::when method defines the condition, which in this case is the invocation of the
repository.getStuff()
method. - Then, the
when()
method returns anorg.mockito.stubbing.OngoingStubbing
instance, which defines a set of methods that determine what to do when the specified method is called. - Finally, in this case, we invoke the
thenReturn()
method to tell the stub to return a specificList
ofString
s.
At this point, we have a Service
instance with a mock Repository
. When the Repository
’s getStuff
method is called, it returns a list of five known strings. We invoke the Service
’s getStuffWithLengthLessThanFive()
method, which will invoke the Repository
’s getStuff()
method, and return a filtered list of String
s whose length is less than five. We can then assert that the returned list is not null and that the size of it is three. This process allows us to test the logic in the specific Service
method, with a known response from the Repository
.
The testException
method configures Mockito so that when the Repository
’s getStuff()
method is called, it throws an SQLException
. It does this by invoking the OngoingStubbing
object’s thenThrow()
method, passing it a new SQLException
instance. When this happens, the Service
should catch the exception and return an empty list.
Mocking is powerful because it allows us to simulate scenarios that would otherwise be difficult to replicate. For example, you may invoke a method that throws a network or I/O error and write code to handle it. But unless you turn off your WiFi or disconnect an external drive at the exact right moment, how do you know the code works? With mock objects, you can throw those exceptions and prove that your code handles them properly. With mocking, you can simulate rare edge cases of any type.
Introduction to Mockito spies
In addition to mocking the behavior of classes, Mockito allows you to verify their behavior. Mockito provides “spies” that watch an object so you can ensure specific methods are called with specific values. For example, you may want to ensure that, when you call a service, it makes a specific call to a repository. Or you might want to ensure that it does not call the repository but rather loads the item from a cache. Using Mockito spies lets you not only validate the response of a method call but ensure the method does what you expect.
This may seem a little abstract, so let’s start with a simple example that works with a list of String
s. Listing 5 shows a test method that adds two String
s to a list and then checks the size of the list after each addition. We’ll then verify that the list’s add()
method is called for the two String
s, and that the size()
method is called twice.
Listing 5. Testing a List with spies (SimpleSpyTest.java)
package com.javaworld.geekcap.mockito;
import static org.mockito.Mockito.atLeastOnce;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;
import java.util.ArrayList;
import java.util.List;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.Spy;
import org.mockito.junit.jupiter.MockitoExtension;
@ExtendWith(MockitoExtension.class)
public class SimpleSpyTest {
@Spy
List stringList = new ArrayList();
@Test
public void testStringListAdd() {
// Add an item to the list and verify that it has one element
stringList.add("One");
Assertions.assertEquals(1, stringList.size());
// Add another item to the list and verify that it has two elements
stringList.add("Two");
Assertions.assertEquals(2, stringList.size());
// Verify that add was called with arguments "One" and "Two"
verify(stringList).add("One");
verify(stringList).add("Two");
// Verify that add was never called with an argument of "Three"
verify(stringList, never()).add("Three");
// Verify that the size() method was called twice
verify(stringList, times(2)).size();
// Verify that the size() method was called at least once
verify(stringList, atLeastOnce()).size();
}
}
Listing 5 starts by defining an ArrayList
of String
s and annotates it with Mockito’s @Spy
annotation. The @Spy
annotation tells Mockito to watch and record every method called on the annotated object. We add the String "One"
to the list, assert that its size is 1, and then add the String "Two"
and assert that its size is 2.
After we do this, we can use Mockito to verify everything we did. The org.mockito.Mockito.verify()
method accepts a spied object and returns a version of the object that we can use to verify that specific method calls were made. If those methods were called, then the test continues, and if those methods were not called, the test case fails. For example, we can verify that add("One")
was called as follows:
If the String
list’s add()
method is called with an argument of "One"
then the test continues to the next line, and if it’s not the test fails. After verifying "One"
and "Two"
are added to the list, we verify that add("Three")
was never called by passing an org.mockito.verification.VerificationMode
to the verify()
method. VerificationModes
validate the number of times that a method is invoked, with whatever arguments are specified, and include the following:
times(n)
: specifies that you expect the method to be called n times.never()
: specifies that you do not expect the method to be called.atLeastOnce()
: specifies that you expect the method to be called at least once.atLeast(n)
: specifies that you expect the method to be called at least n times.atMost(n)
: specifies that you expect the method to be called at most n times.
Knowing this, we can verify that add("Three")
is not called by executing the following:
verify(stringList, never()).add("Three")
It’s worth noting that when we do not specify a VerificationMode
, it defaults to times(1)
, so the earlier calls were verifying that, for example, the add("One")
was called once. Likewise, we verify that size()
was invoked twice. Then, just to show how it works, we also verify that it was invoked at least once.
Now let’s test our service and repository from Listings 2 and 3 by spying on the repository and verifying the service calls the repository’s getStuff()
method. This test is shown in Listing 6.
Listing 6. Testing a spied mock object (SpyAndMockTest.java)
package com.javaworld.geekcap.mockito;
import static org.junit.jupiter.api.Assertions.fail;
import static org.mockito.Mockito.spy;
import static org.mockito.Mockito.verify;
import java.sql.SQLException;
import java.util.Arrays;
import java.util.List;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.mockito.InjectMocks;
import org.mockito.Mock;
import org.mockito.Mockito;
import org.mockito.junit.jupiter.MockitoExtension;
@ExtendWith(MockitoExtension.class)
public class SpyAndMockTest {
// Create a Mock of a spied Repository
@Mock
Repository repository = spy(new Repository());
// Inject the respository into the service
@InjectMocks
Service service;
@Test
public void verifyRepositoryGetStuffIsCalled() {
try {
// Setup mock scenario
Mockito.when(repository.getStuff()).thenReturn(Arrays.asList("A", "B", "CDEFGHIJK", "12345", "1234"));
} catch (SQLException e) {
fail(e.getMessage());
}
// Execute the service that uses the mocked repository
List stuff = service.getStuffWithLengthLessThanFive();
// Validate the response
Assertions.assertNotNull(stuff);
Assertions.assertEquals(3, stuff.size());
try {
// Verify that the repository getStuff() method was called
verify(repository).getStuff();
} catch (SQLException e) {
fail(e.getMessage());
}
}
}
Most of the code in Listing 6 is the same as the test we wrote in Listing 4, but with two changes to support spying. First, when we want to spy on a mocked object, we cannot add both the @Mock
and @Spy
annotations to the object because Mockito only supports one of those annotations at a time. Instead, we can create a new repository, pass it to the org.mockito.Mockito.spy()
method, and then annotate that with the @Mock
annotation. The @Spy
annotation is shorthand for invoking the spy()
method, so the effect is the same. Now we have a mock object we can use to control the behavior, but Mockito will spy on all its method calls.
Next, we use the same verify()
method we used to verify that add("One")
was called to now verify the getStuff()
method is called:
We need to wrap the method call in a try-catch block because the method signature defines that it can throw an SQLException
, but since it doesn’t actually call the method, we would never expect the exception to be thrown.
You can extend this test with any of the VerificationMode
variations I listed earlier. As a practical example, your service may maintain a cache of values and only query the repository when the requested value is not in the cache. If you mocked the repository and cache and then invoked a service method, Mockito assertions would allow you to validate that you got the correct response. With the proper cache values, you could infer that the service was getting the value from the cache, but you couldn’t know for sure. Spies, on the other hand, allow you to verify absolutely that the cache is checked and that the repository call is never made. So, combining mocks with spies allows you to more fully test your classes.
Mockito is a powerful tool, and we’ve only scratched the surface of what it can do. If you’ve ever wondered how you can test abhorrent conditions—such as network, database, timeout, or other I/O error conditions—Mockito is the tool for you. And, as you’ve seen here, it works elegantly with JUnit 5.
Conclusion
This article was a deeper exploration of JUnit 5’s unit testing capabilities, involving its integration with external tools. You saw how to integrate and use JUnit 5 with Hamcrest, a more advanced assertions framework, and Mockito, which you can use to mock and control a class’s dependencies. We also looked at Mockito’s spying capabilities, which you can use to not only test the return value of a method but also verify its behavior.
At this point, I hope you feel comfortable writing unit tests that not only test happy-path scenarios but also leverage tools like Mockito, which let you simulate edge cases and ensure your applications handle them correctly.
Arriving at ‘Hello World’ in enterprise AI 7 Jul 2025, 9:00 am
Brendan Falk didn’t set out to become a cautionary tale. Three months after leaving AWS to build what he called an “AI-native Palantir,” he’s pivoting away from enterprise AI projects altogether. In a widely shared X thread, Falk offers some of the reasons: 18-month sales cycles, labyrinthine integrations, and post-sale maintenance that swallows margins. In other words, all the assembly required to make AI work in the enterprise, regardless of the pseudo instant gratification that consumer-level ChatGPT prompts may return.
Just ask Johnson & Johnson, which recently culled 900 generative AI pilots, keeping only the 10% to 15% that delivered real value (though it continues to struggle to anticipate which will yield fruit). Look to data from IBM Consulting, which says just 1% of companies manage to scale AI beyond proof of concept. Worried? Don’t be. After all, we’ve been here before. A decade ago, I wrote about how enterprises struggled to put “big data” to use effectively. Eventually, we got there, and it’s that “eventually” we need to keep in mind as we get caught up in the mania of believing that AI is changing everything everywhere all at once.
Falk’s three lessons
Though Falk has solid startup experience (he cofounded and ran Fig before its acquisition by Amazon), he was unprepared for the ugly stodginess of the enterprise. His findings:
- Enterprise AI sells like middleware, not SaaS. You’re not dropping an API into Slack; you’re rewiring 20-year-old ERP systems. Procurement cycles are long and bespoke scoping kills product velocity. Then there’s the potential for things to go very wrong. “Small deals are just as much work as larger deals, but are just way less lucrative,” Falk says. Yep.
- Systems integrators capture the upside. By the time Accenture or Deloitte finishes the rollout, your startup’s software is a rounding error on the services bill.
- Maintenance is greater than innovation. Enterprises don’t want models that drift; they want uptime, and AI’s non-deterministic “feature” is very much a bug for the enterprise. “Enterprise processes have countless edge cases that are incredibly difficult to account for up front,” he says. Your best engineers end up writing compliance documentation instead of shipping features.
These aren’t new insights, per se, but they’re easy to forget in an era when every slide deck says “GPT-4o will change everything.” It will, but it currently can’t for most enterprises. Not in the “I vibe-coded a new app; let’s roll it into production” sort of way. That works on X, but not so much in serious enterprises.
Palantir’s “told-you-so” moment
Ted Mabrey, Palantir’s head of commercial, couldn’t resist dunking on Falk: “If you want to build the next Palantir, build on Palantir.” He’s not wrong. Palantir has productized the grunt work—data ontologies, security models, workflow plumbing—that startups discover the hard way.
Yet Mabrey’s smugness masks a bigger point: Enterprises don’t buy AI platforms; they buy outcomes. Palantir succeeds when it shows an oil company how to shave days off planning the site for a new well, or helps a defense ministry fuse sensor data into targeting decisions. The platform is invisible.
Developers, not boardrooms, will mainstream AI
In prior InfoWorld columns, I’ve argued that technology adoption starts with “bottom-up” developer enthusiasm and then bubbles upward. Kubernetes, MongoDB, even AWS followed that path. Falk’s experience proves that the opposite route—“top-down AI transformation” pitched to CIOs—remains a quagmire.
The practical route looks like this:
- Start with a narrow, high-value workflow. Johnson & Johnson’s “Rep Copilot” is a sales assistant not a moon shot. A narrow scope makes ROI obvious.
- Ship fast, measure faster. Enterprises are comfortable killing projects that don’t move KPIs. Make it easy for them.
- Expose an API, earn love. Developers don’t read Gartner reports; they copy code from GitHub. Give them something to build with and they’ll drag procurement along later.
What’s next
Falk says his team will now “get into the arena” by launching products with shorter feedback loops. That’s good. Build for developers, price like a utility, and let usage (not enterprise promises) guide the road map. The big money will come from the Fortune 500 eventually, but only after thousands of engineers have already smuggled your API through the firewall.
Enterprise AI transformation isn’t dead; it’s just repeating history. When visionary decks meet ossified org charts, physics wins. The lesson is to respect that and abstract away the integration sludge, price for experimentation, and, above all, court the builders who actually make new tech stick.
Falk’s pivot is a reminder that the fastest way into the enterprise is often through the side door of developer adoption, not the fancy lobby of the boardroom.
OutSystems Mentor: An AI-powered digital assistant for the entire SDLC 7 Jul 2025, 9:00 am
Today’s developers navigate a complex landscape marked by slow development cycles, skills gaps, talent shortages, high customization costs, and tightening budgets. The burden of maintaining legacy systems, while adapting to business evolution, adds further strain. With the pressure to drive ROI from software investments, organizations often find themselves caught in the “build vs. buy” dilemma, only to find that off-the-shelf software falls short of meeting their digital transformation needs or delivering a competitive advantage.
At the same time, the recent AI coding assistants and AI platforms that promise to generate full-stack applications often fall short. Ungoverned generative AI (genAI) generates as many risks as it does lines of code, including hallucinations, security holes, architectural flaws, and unexcutable code. In fact, 62% of IT professionals report that using genAI for coding has raised new concerns around security and governance.
It’s the perfect storm for IT teams, but there is hope. A new generation of AI-powered low-code development tools addresses these challenges head-on. Designed to help teams accelerate enterprise application development, free up time for innovation, and maintain observability and governance, offerings like OutSystems Mentor reimagine the software development life cycle (SDLC) altogether.
Introducing OutSystems Mentor
In October 2024, OutSystems introduced Mentor, an AI-powered digital assistant for the entire SDLC. Combining low-code, generative AI, and AI-driven guidance, Mentor represents a major step forward in how software is created, managed, and updated—all within the OutSystems Developer Cloud (ODC) platform.
Mentor is an AI-powered team member trained to support or complete sequential tasks and even entire processes. Integrating generative AI, natural language processing, industry-leading AI models, and machine learning, Mentor provides intelligent, context-aware support across the development workflow, accelerating the creation of full-stack enterprise applications that are easy to maintain via low-code. In doing so, Mentor helps teams maintain a competitive edge without the high costs associated with traditional coding, and without having to maintain hard-to-understand code created by other developers, AI co-pilots, or conversational app builders.
Mentor is fully integrated with the OutSystems Developer Cloud (ODC), allowing organizations to scale from prototype to enterprise-grade deployment while maintaining control and reducing technical debt. With ODC as the foundation, Mentor enables a more efficient, secure, and scalable approach to AI-driven development, supporting teams in delivering high-quality applications faster and with greater consistency.
Generate fully functional, scalable applications
From discovery to rapid prototyping, IT teams can use Mentor to validate ideas and refine initial designs, ensuring alignment before committing to full-scale development. This early-stage clarity helps save time and maximize the impact of every effort, while reducing friction between business and technical teams by creating a shared understanding from day one.
Mentor’s development process is intuitive. Developers describe the application they need, whether through a short prompt or a detailed requirements document. Mentor then combines this input with contextual awareness from the user’s environment, automatically generating an initial version of the app. This includes front-end functionality, data models, and embedded business logic tailored to the organization’s ecosystem.
Further differentiating itself from other AI coding tools, Mentor identifies relevant entities from existing applications or external systems, along with key roles, attributes, records, and workflows needed to generate the app, giving users a head start on development that’s integrated and in tune with real-world needs and organizational context.
Improve, iterate, and evolve applications with AI-powered suggestions
Creating a minimum viable product (MVP) is arguably the most challenging hurdle to overcome in application development. Constant improvement and quick iterations are critical because they ensure products, processes, and services remain relevant, competitive, and aligned with evolving customer needs.
The journey from MVP to production-ready is accelerated with Mentor’s intelligent App Editor, which analyzes an application’s structure and data model and offers AI-powered suggestions for real-time improvements. Whether optimizing performance, refining UX, or improving maintainability, developers can level up their applications with speed, confidence, and precision, reducing rework and accelerating delivery.
Developers also have the option to step outside the App Editor for any additional editing and customization using the ODC Studio, the integrated development environment (IDE) for OutSystems Developer Cloud. Any changes made to the application are reflected back in App Editor, and users can continue refining their app with AI-driven suggestions, enabling a fluid workflow.
Embed AI agents within apps to automate digital interactions
Also built on ODC, OutSystems AI Agent Builder automates tasks with customizable intelligent agents and integrates them into OutSystems Developer Cloud apps to leverage generative AI. This enables IT teams to quickly and securely infuse apps with genAI agents, bringing conversational agentic interactions to life in minutes, without any coding or deep AI expertise.
AI Agent Builder streamlines the process of AI integration, making powerful functionality accessible to a broader range of teams. With a combination of large language models and retrieval-augmented generation, the AI Agent Builder allows users to create intelligent agents (backed by the organization’s proprietary knowledge) via simple, natural language inputs. By enhancing applications with intelligent, conversational interfaces, the platform drives real outcomes, such as when an AI customer service agent can automatically address and resolve customer issues.
IT teams using OutSystems AI Agent Builder gain full control and access to models, proprietary data, and agents in a single, scalable platform. With built-in monitoring tools, the AI Agent Builder enables continuous performance tracking and optimization, ensuring agents not only adapt and improve over time but also deliver accurate, trustworthy results aligned with evolving business goals. Organizations can scale AI capabilities confidently, without sacrificing oversight or accountability.
Validate and maintain applications through AI-powered code reviews
Before any application is deployed to production, Mentor validates and maintains applications via AI-powered code reviews across every layer of the stack. Doing so ensures all applications meet the highest possible standards for development, security, performance, architecture, and long-term maintainability. Catching issues early and enforcing consistent standards also helps teams avoid costly rework and technical debt.
Taking a proactive approach to quality assurance, Mentor continuously scans applications every 12 hours, flagging potential issues or security risks within a unified dashboard. This ongoing monitoring ensures that quality and compliance are embedded into the development life cycle, helping teams stay ahead of issues and deliver software with confidence.
Shaping the future of AI app and agent development
At this inflection point, software development is undergoing a transformative shift, enabling faster, more efficient, and smarter processes. With the integration of AI-driven low-code tools like OutSystems Mentor, development teams can harness the power of automation, intelligent insights, and SDLC recommendations, accelerating innovation and staying ahead in an ever-evolving digital landscape.
AI isn’t replacing developers—it’s handling the grunt work, from troubleshooting code to enabling rapid app creation, freeing developers to be creative orchestrators and grow into leadership roles. But this is only the beginning.
As teams grow more comfortable integrating AI into their development processes, the next step is embracing AI agents. OutSystems is actively building towards this future, where autonomous AI agents take on more complex tasks and interact with systems independently, driving business processes and decision-making.
At OutSystems, our vision is to empower every company to innovate through software. OutSystems Mentor and our forthcoming AI-powered products are testaments to this vision, revolutionizing how applications are built and orchestrated, driving faster, smarter innovation at scale.
Luis Blando is chief product and technology officer at OutSystems.
—
New Tech Forum provides a venue for technology leaders to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Risk management in the public cloud is your job 4 Jul 2025, 9:00 am
I was excited to speak at a regional cloud computing summit hosted by one of the major cloud providers. My presentation focused on the many opportunities of public cloud and the essential need for risk management. Just before the event, I received an email stating that three of my slides, which discussed cloud outages and the risks of over-reliance on providers, had to be removed. Mentioning failures didn’t align with the host’s narrative of reliability.
Frustrated but not surprised, I removed the slides. During my presentation, I highlighted the importance of preparing for outages, disruptions, and other potential risks. I shared real-life incidents, such as major outages at top providers, that demonstrated how businesses unprepared for third-party failures can face significant financial, operational, and reputational damage. The audience’s response was mixed. Some nodded, clearly understanding the risks. Others, including event organizers at the back, appeared uneasy. Unsurprisingly, I haven’t been invited again.
Here’s the truth: Managing risk isn’t about doubting the effectiveness of cloud providers—it’s about ensuring resilience when the unexpected happens. If sharing that message makes people uncomfortable, I know I’m doing my job.
Reality does not care about your bias
Here’s another truth: Cloud outsourcing doesn’t eliminate risk; it simply shares it with the provider. The shared responsibility model of cloud governance clarifies certain aspects of risk management. A public cloud provider guarantees the reliability of their infrastructure, but the responsibility for the operating environment—applications, data, and workflows—still rests with the customer organization.
For example, providers will ensure their data centers meet uptime requirements and can withstand disasters at the physical or infrastructure level. However, they cannot control how a business organizes its data, enforces access policies, or mitigates the ripple effects of service provider outages on critical workflows. Businesses still bear the responsibility of maintaining continuity in the event of unexpected technical incidents.
Public cloud providers excel at scalability and innovation, but they aren’t immune to outages, latency issues, or cybersecurity risks. Organizations that fail to prepare for such possibilities become vulnerable to operational, financial, and reputational damage.
High-profile cloud incidents
Recent history provides clear examples of the risks associated with over-reliance on public cloud providers.
- AWS outage (December 2021): The reliability of one of the world’s largest cloud providers came into question during this outage. Many businesses, including cloud-dependent logistics firms and e-commerce platforms, experienced service disruptions that halted deliveries and hampered operations during the critical holiday season.
- Azure downtime (2022): A system failure in Microsoft Azure impacted SaaS applications and global enterprises alike, with financial services and regulated industries experiencing significant disruptions. These setbacks exposed the risks of relying heavily on a single provider.
- Google Cloud outage (2020): This incident disrupted major platforms such as Gmail and YouTube, as well as third-party applications operating on Google Cloud. Businesses without backup plans faced revenue losses.
Such incidents underscore the primary risks associated with relying on third-party cloud vendors. Despite their technological sophistication, the providers are not infallible, and their failures can have a direct impact on dependent businesses.
The ripple effect of third-party failures
When third-party providers face disruptions, the impact can be extensive. Public cloud providers are the foundation of many industries today, so any failure creates a ripple effect across organizations, markets, and consumers.
- Operational delays: Interruptions to essential services lead to productivity losses and, in some cases, operational paralysis. This is especially noticeable in industries such as healthcare or finance, where downtime can have serious real-world consequences for customers or lead to regulatory noncompliance.
- Financial losses: The cost of cloud-induced downtime can reach staggering levels. In highly regulated industries, losses can surpass millions of dollars per hour, considering missed business opportunities, regulatory fines, and remediation efforts.
- Regulatory and compliance risks: Certain industries are subject to stringent compliance standards. An outage caused by a third-party provider could prevent organizations from meeting these requirements, resulting in significant penalties and legal risks.
- Reputational damage: Customers and stakeholders often associate poor service with the business even if the issue lies with the cloud provider. Recovering from reputational loss is an expensive, extended process that can impact long-term business viability.
- Concentration risks: Relying too heavily on a single cloud service creates a single point of failure. If that provider goes down, operations in the dependent organization could come to a complete halt.
Risk management remains critical
Migrating systems to public cloud platforms does not exempt organizations from the need to build strong risk management frameworks. Viewing public cloud providers as strategic partners rather than infallible utilities helps businesses safeguard themselves against downstream risks.
- Thoroughly evaluate vendors: Look beyond their current service offerings to document their resiliency plans, security practices, and compliance certifications.
- Diversify cloud investments: Many organizations now adopt multicloud or hybrid solutions that combine services from multiple providers. This minimizes the risks of relying on a single provider and increases flexibility during incident recovery.
- Develop incident response plans for cloud disruption: Business continuity strategies should cover potential cloud outages, simulate disruptions, and establish clear action plans for rerouting workloads or restoring operations.
- Monitor cloud vendor dependencies: Consider active monitoring solutions to identify vulnerabilities or performance issues within your cloud ecosystem before they lead to outages.
- Engage in contractual risk protections: Contracts with public cloud providers should clearly define recovery expectations, contingency plans, and resolution timelines to ensure effective risk management. Auditing rights and regular performance evaluations must also be included in these agreements.
- Prioritize data and infrastructure backups: Data replicas and backup systems independent of your primary cloud service lower the risk of business stagnation during a disaster.
Outsourcing to the public cloud provides enterprises with opportunities to become more efficient and flexible; however, the inherent nature of cloud services requires careful oversight. The public cloud connects a business to global ecosystems where minor disruptions can lead to much larger problems. Effective use of cloud services doesn’t mean outsourcing responsibility. It involves taking proactive steps to reduce risks from the start. Only then can organizations fully realize the benefits of the public cloud, without compromising operational security or long-term success.
Developing JavaScript apps with AI agents 4 Jul 2025, 9:00 am
There is a phenomenon in low-code and 4GL systems called the inner platform effect. Essentially, you take a tool and build an abstraction on top designed to make it simpler and end up creating a less powerful version of the same underlying system.
Artificial intelligence is producing something similar at the level of learning. We begin by using AI to control the underlying technology by telling it what we want. Then, we come to the gradual realization that we need to understand those underlying technologies and AI’s role in using them. We try to build an “inner platform” of understanding inside AI, only to discover that we must assume the work of learning ourselves, with AI as only part of that understanding.
With that in mind, this month’s report features the latest news and insights to fuel your JavaScript learning journey.
Top picks for JavaScript readers on InfoWorld
Putting agentic AI to work in Firebase Studio
Firebase Studio is among the best-in-class tools for AI-powered development, and it still has a few wrinkles to work out. Here’s a first look at building a full-stack JavaScript/TypeScript app with agentic AI.
Better together: Developing web apps with Astro and Alpine
Astro gives you next-generation server-side flexibility, and Alpine provides client-side reactivity in a tightly focused package. Put them together, and you get the best of both worlds.
10 JavaScript concepts you need to succeed with Node
Node is half of JavaScript’s universe. It’s also a major part of the enterprise infrastructure landscape. This article brings together some of the most important JavaScript concepts to understand when using Node, Bun, and Deno.
More good reads and JavaScript updates elsewhere
Vite reaches 7.0
As seen in the most recent State of JavaScript report, Vite is now a central component of JavaScript’s configuration ecosystem. An exciting sidebar in the latest release announcement is the connections to the VoidZero project and Rolldown, a Rust-based next-generation bundler that is part of the push to modernize Vite’s core. To check it out, just replace the default vite
package with rolldown-vite
.
What’s coming to JavaScript
This is a great overview from Deno of JavaScript proposals at all stages of development. One of the more exciting soon-to-be official updates is explicit resource management with using
, which lets you declare a resource that will automatically be cleaned up when the block completes. Another is a new async version of Array.from
, and much more.
Deno keeps up the fight over JavaScript trademark
You might not know that Deno is involved in a dispute with Oracle over Oracle’s use of the JavaScript trademark. This is an important area of IP that many JavaScript users will find interesting. In this blog post, Deno and Node creator Ryan Dahl asserts that JavaScript should not be a trademarked brand.
V8 deep dive on optimizations and Wasm
Here’s a great nerd-out on JavaScript engine internals and their relationship to WebAssembly. This piece is both a close look into the implementation of JavaScript in the real world and a bracing reminder of how much work and mind-power goes into the tools we use in our daily lives. I get the same feeling sometimes on a long road trip, when I suddenly realize: Hey, somebody built all this.
AWS adds incremental and distributed training to Clean Rooms for scalable ML collaboration 3 Jul 2025, 11:59 am
AWS has rolled out new capabilities for its Clean Rooms service, designed to accelerate machine learning model development for enterprises while addressing data privacy concerns.
The updates, including incremental and distributed training, are designed to help enterprises, particularly in regulated industries, analyze shared datasets securely without copying or exposing sensitive information.
Analysts say the enhancements come amid rising demand for predictive analytics and personalized recommendations.
“The need for secure data collaboration is more critical than ever, with the need to protect sensitive information, yet share data with partners and other collaborators to improve machine learning models with collective data,” said Kathy Lange, research director at IDC.
“Often, enterprises cannot collect enough of their own data to cover a broad spectrum of outcomes, particularly in healthcare applications, disease outbreaks, or even in financial applications, like fraud or cybersecurity,” Lange added.
Incremental training to help with agility
The incremental training ability added to Clean Rooms will help enterprises build upon existing model artifacts, AWS said.
Model artifacts are the key outputs from the training process — such as files and data — that are required to deploy and operationalize a machine learning model in real-world applications.
The addition of incremental training “is a big deal,” according to Keith Townsend, founder of The Advisor Bench — a firm that provides consulting services for CIOs and CTOs.
“Incremental training allows models to be updated as new data becomes available — for example, as research partners contribute fresh datasets — without retraining from scratch,” Townsend said.
Seconding Townsend, Everest Group analyst Ishi Thakur said that the ability to update models with incremental data will bring agility to model development.
“Teams on AWS clean rooms will now be able to build on existing models, making it easier to adapt to shifting customer signals or operational patterns. This is particularly valuable in sectors like retail and finance where data flows continuously and speed matters,” Thakur said.
Typically, AWS Clean Rooms in the machine learning model context is used by enterprises for fraud detection, advertising, and marketing, said Bradley Shimmin, lead of the data and analytics practice at The Futurum Group.
“The service is focused on building lookalike models, which is a predictive ML model of the training data that can be used to find similar users in other datasets. So, something specific to advertising use cases or fraud detection,” Shimmin added.
Distributed training to help scale model development
The distributed training ability added to Clean Rooms will help enterprises scale model development, analysts said.
“This capability helps scale model development by breaking up complex training tasks across multiple compute nodes, which is a crucial advantage for enterprises grappling with high data volumes and compute-heavy use cases,” Thakur said.
Explaining how distributed training works, IDC’s Lange pointed out that AWS Clean Rooms ML — a feature inside the Clean Rooms service — uses Docker images that are SageMaker-compatible and stored in Amazon Elastic Container Registry (ECR).
“This allows users to leverage SageMaker’s distributed training capabilities, such as data parallelism and model parallelism, across multiple compute instances, enabling scalable, efficient training of custom ML models within AWS Clean Rooms,” Lange said, adding that other AWS components, such as AWS Glue — a serverless data integration service, are also involved.
Further, The Advisor Bench’s Townsend pointed out that AWS Clean Rooms’ distributed training feature will specifically help in use cases when one partner of the two-stakeholder enterprises doesn’t have deep expertise in distributed machine learning infrastructure.
Vendors such as Microsoft, Google, Snowflake, Databricks, and Salesforce also offer data clean rooms.
While Microsoft offers Azure Confidential Clean Room as a service designed to facilitate secure, multi-party data collaboration, Google offers BigQuery Clean Room — a tool that is built on BigQuery’s Analytics Hub and is focused on multi-party data analysis where data from a variety of contributors can be combined, with privacy protections in place, without the need to move or expose raw data.
Salesforce’s clean rooms feature is expected to be added to its Data Cloud by the end of the year, Shimmin said.
Demand for clean rooms expected to grow
The demand for clean rooms is expected to grow in the coming months.
“I expect we’ll see increased interest in and adoption of Clean Rooms as a service in the next 12-18 months,” said Shelly Kramer, founder and principal analyst at Kramer & Company, pointing out the deprecation of third-party cookies and increasingly challenging privacy regulations. “In data-driven industries, solutions for first-party data collaboration that can be done securely are in demand. That’s why we are seeing Clean Rooms as a service quickly becoming a standard. While today the early adopters are in some key sectors, the reality is that all enterprises today are, or should be, data-driven.”
On the other hand, IDC’s Lange pointed out that demand for clean rooms is growing specifically due to the rise in data volumes and data variety that are being stored and analyzed for patterns.
However, Kramer pointed out that enterprises may have challenges around the adoption of clean rooms.
“Integrating with existing workflows is a key challenge, as clean rooms don’t naturally fit within standard campaign planning and measurement processes. Therefore, embedding and operationalizing them effectively can require some effort,” Kramer said.
Alibaba Cloud launches Eigen+ to cut costs and boost reliability for enterprise databases 3 Jul 2025, 11:33 am
Alibaba Cloud has developed a new cluster management system called Eigen+ that achieved a 36% improvement in memory allocation efficiency while eliminating Out of Memory (OOM) errors in production database environments, according to research presented at the recent SIGMOD conference.
The system addresses a fundamental challenge facing cloud providers: how to maximize memory utilization to reduce costs while avoiding catastrophic OOM errors that can crash critical applications and violate Service Level Objectives (SLOs).
The development, detailed in a research paper titled “Eigen+: Memory Over-Subscription for Alibaba Cloud Databases,” represents a significant departure from traditional memory over-subscription approaches used by major cloud providers, including AWS, Microsoft Azure, and Google Cloud Platform.
The system has been deployed in Alibaba Cloud’s production environment. The research paper claimed that in online MySQL clusters, Eigen+ “improves the memory allocation ratio of an online MySQL cluster by 36.21% (from 75.67% to 111.88%) on average, while maintaining SLO compliance with no OOM occurrences.”
For enterprise IT leaders, these numbers can translate into significant cost savings and improved reliability. The 36% improvement in memory allocation means organizations can run more database instances on the same hardware while actually reducing the risk of outages.
Alibaba Cloud’s Eigen+ has a classification-based memory management approach, whereas peers, AWS, Microsoft Azure, and Google Cloud, primarily rely on prediction-based memory management strategies, which, while effective, may not fully prevent OOM occurrences, explained Kaustubh K, practice director, Everest Group. “This difference in approach can position Alibaba Cloud’s Eigen+ with a greater technical differentiation in the cloud database market, potentially influencing future strategies of other hyperscalers.”
The technology is currently deployed across thousands of database instances in Alibaba Cloud’s production environment, supporting both online transaction processing (OLTP) workloads using MySQL and online analytical processing (OLAP) workloads using AnalyticDB for PostgreSQL, according to Alibaba researchers.
The memory over-subscription risk
Memory over-subscription — allocating more memory to virtual machines than physically exists — has become standard practice among cloud providers because VMs rarely use their full allocated memory simultaneously. However, this practice creates a dangerous balancing act for enterprises running mission-critical databases.
“Memory over-subscription enhances resource utilization by allowing more instances per machine, it increases the risk of Out of Memory (OOM) errors, potentially compromising service availability and violating Service Level Objectives (SLOs),” the researchers noted in their paper.
The stakes are particularly high for enterprise databases. “The figure clearly demonstrates that service availability declines significantly, often falling below the SLO threshold as the number of OOM events increases.”
Traditional approaches attempt to predict future memory usage based on historical data, then use complex algorithms to pack database instances onto servers. But these prediction-based methods often fail catastrophically when workloads spike unexpectedly.
“Eliminating Out of Memory (OOM) errors is critical for enterprise IT leaders, as such errors can lead to service disruptions and data loss,” Everest Group’s Kaustubh said. “While improvements in memory allocation efficiency are beneficial, ensuring system stability and reliability remains paramount. Enterprises should assess their cloud providers’ real-time monitoring capabilities, isolation mechanisms to prevent cross-tenant interference, and proactive mitigation techniques such as live migration and memory ballooning to handle overloads without service disruption. Additionally, clear visibility into oversubscription policies and strict adherence to Service Level Agreements (SLAs) are essential to maintain consistent performance and reliability.”
The Pareto Principle solution
Rather than trying to predict the unpredictable, Alibaba Cloud’s research team discovered that database OOM errors follow the Pareto Principle—also known as the 80/20 rule. “Database instances with memory utilization changes exceeding 5% within a week constitute no more than 5% of all instances, yet these instances lead to more than 90% of OOM errors,” the team said in the paper.
Instead of trying to forecast memory usage patterns, Eigen+ simply identifies which database instances are “transient” (prone to unpredictable memory spikes) and excludes them from over-subscription policies.
“By identifying transient instances, we can convert the complex problem of prediction into a more straightforward binary classification task,” the researchers said in the paper.
Eigen+ employs machine learning classifiers trained on both runtime metrics (memory utilization, queries per second, CPU usage) and operational metadata (instance specifications, customer tier, application types) to identify potentially problematic database instances.
The system uses a sophisticated approach that includes Markov chain state transition models to account for temporal dependencies in database behavior. “This allows it to achieve high accuracy in identifying transient instances that could cause OOM errors,” the paper added.
For steady instances deemed safe for over-subscription, the system employs multiple estimation methods, including percentile analysis, stochastic bin packing, and time series forecasting, depending on each instance’s specific usage patterns.
Quantitative SLO modeling
Perhaps most importantly for enterprise environments, Eigen+ includes a quantitative model for understanding how memory over-subscription affects service availability. Using quadratic logistic regression, the system can determine precise memory utilization thresholds that maintain target SLO compliance levels.
“Using the quadratic logistic regression model, we solve for the machine-level memory utilization (𝑋) corresponding to the desired 𝑃target,” the paper said.
This gives enterprise administrators concrete guidance on safe over-subscription levels rather than relying on guesswork or overly conservative estimates.
Recognizing that no classification system is perfect, Eigen+ includes reactive live migration capabilities as a fallback mechanism. When memory utilization approaches dangerous levels, the system automatically migrates database instances to less loaded servers.
During production testing, “Over the final two days, only five live migrations were initiated, including mirror databases. These tasks, which minimally impact operational systems, underscore the efficacy of Eigen+ in maintaining performance stability without diminishing user experience.”
Industry implications
The research suggests that cloud providers have been approaching memory over-subscription with unnecessarily complex prediction models when simpler classification approaches may be more effective. The paper stated that approaches used by Google Autopilot, AWS Aurora, and Microsoft Azure all rely on prediction-based methods that can fail under high utilization scenarios.
For enterprise IT teams evaluating cloud database services, Eigen+ represents a potential competitive advantage for Alibaba Cloud in markets where database reliability and efficient resource utilization are critical factors.
6 techniques to reduce cloud observability cost 3 Jul 2025, 11:15 am
Cloud observability is really important for most modern organizations in that it dives deep when it comes down to keeping application functionality, problems, and those little bumps in the road along the way, a smooth overall user experience. Meanwhile, the growing toll of telemetry data that keeps piling up, such as logs, metrics and traces, becomes costlier by the minute. But one thing is clear: You do not have to compromise visibility just to reduce costs.
This post is all about strategies and top practices that can help optimize your cloud observability spend for you to derive value from your monitoring investments without breaking the bank.
Recognizing the drivers of cost: The observability
Before going into solutions, let’s understand what could actually make observability costs skyrocket.
- Data ingestion volume. This is one of the biggest causes of costs. The more logs, metrics and traces you ingest, the more expensive it gets. This includes data from applications, infrastructure, networks and third-party services.
- Data retention. Keeping huge historical data for long periods makes costs higher.
- High cardinality metrics. Metrics with many unique labels or dimensions can translate to an explosion in data points and storage requirements.
- Overcollection. Collecting data is never used in real-time for monitoring or alerting, or for analysis purposes.
- Tool sprawl. Using different observability tools that do not connect creates duplications in data ingestion and managing those overheads.
- Lack of cost awareness. Teams provisioning resources without understanding the financial implications of their observability choices.
Key techniques to reduce cloud observability cost
Now, let’s explore actionable techniques to bring your observability costs under control:
1. Optimize data ingestion at the sources
This one is perhaps the most impactful area in which we can reduce costs. Make certain that only the data that really counts is collected.
- Filter and whitelist the data:
- Logs. Aggressive filtering must be applied at the source to get rid of debug logs, information that is not useful or data coming in from non-critical services. Several observability platforms allow you to filter logs before ingesting them.
- Metrics. Focus on those metrics that impact how the application performs, user experience and utilization of resources (such as application response times, CPU/memory usage, error rates). Dismiss low-value or unused metrics.
- Traces. Focus on business-critical transactions and distributed traces that help the organization understand service dependencies.
- Strategic sampling. For high-volume data streams (especially traces and logs), consider some intelligent sampling methods that will allow you to capture a statistically significant subset of data, thus reducing volume while still allowing for anomaly detection and trend analysis.
- Scrape intervals. Consider the periodicity of metric collection. Do you really need to scrape metrics every 10 seconds, when every 60 seconds would have been enough to get a view of this service? Adjusting these intervals can greatly reduce the number of data points.
- Data transformation rules. Transform raw data into a more compact and efficient format before ingesting it. This may involve parsing logs to extract only relevant fields and ignoring the rest.
- Compression techniques: Most observability platforms incorporate certain compression techniques that considerably minimize the volume of data for storage.
2. Intelligent data retention policies
The retention of data is an enormously expensive affair. Therefore, create a pathway to tiered storage with intelligent data retention policies.
- Short-term or long-term storage. High granularity of data needs to be retained for smaller time periods (7-30 days for detailed troubleshooting), and on the other extreme, older data needs to be archived and stored long-term (in S3 or Glacier for compliance and historical analysis) if it is comparatively accessed less.
- Retention by type of data. Not all the data can fit into the same retention period. Some data, such as application logs for immediate debugging, may need only a few days, while others, such as audit logs, could require several years of retention.
- Archiving/deletion on autopilot. Automate based on the retention policies defined to archive or delete data accordingly.
3. Right-sizing and resource optimization
Observability tools help you identify inefficiencies in your cloud infrastructure, leading to cost savings.
- Find idle and unutilized resources. Observability data can help you find idle or underutilized resources (EC2 instances, databases, load balancers, etc.) that should be stopped or right-sized.
- Autoscaling. Utilize autoscaling to automatically scale the compute capacity proportional to demand so that you pay only for what you actually use. This removes over-allocation of resources in the low usage times.
- Spot instance/savings plans/reserved instance. For predictable workloads, check out the discounts offered by the cloud providers in the form of Reserved Instances or Savings Plans. For the fault-tolerant and interruptible workloads, Spot instances offer a considerable discount.
- Storage optimizations. Optimize using different classes of storage (e.g., S3 Standard, S3 Intelligent-Tiering, S3 Glacier) driven by data access patterns and retention requirements.
4. Decentralized and distributed observability
Consider strategies that reduce reliance on a single, expensive observability platform for all data:
- Open-source solutions (self-hosting). Organizations with expertise can consider self-hosting open-source tools like Grafana, Prometheus, Loki and Jaeger to save costs that go only into infrastructure. Do keep in mind the operational overhead.
- Mixed-mode approaches. Use commercial observability platforms like Middleware, DataDog, etc., for mission-critical applications and rely less on open source or native cloud logging solutions for some other less critical data or use cases.
- Native cloud observability tools. Use the monitoring/logging services provided by your cloud provider (e.g., AWS CloudWatch, Google Cloud Monitoring, Azure Monitor). These are usually the least expensive options for ingesting and storing basic telemetry.
5. Foster a FinOps and cost-conscious culture
Cost optimization of observability is not only a technical challenge but a cultural one.
- Education for the teams. Train developers and operations teams about the cost implications of the observability choices they make. Set a ‘cost-aware’ development culture.
- Set budgets and alerts. Set a clear budget for observability expenditures and create alerts when teams approach or exceed that budget.
- Cost allocation and chargeback. Tagging and labeling should be put in place so observability costs can be fairly charged to teams, projects or business units. This creates accountability.
- Conduct regular reviews of observability spending. Review the observability spending regularly. Troubleshoot high-cost areas, analyze the usage patterns and search for other optimization opportunities. Cost management dashboard tools can be really helpful here.
6. Utilization of AI and machine learning
More cost optimization is performed with the help of AI and ML:
- Anomaly detection. Identify any strange spikes in terms of data ingestion or resource utilization, which might indicate inefficiency or misconfiguration.
- Predictive analytics. Allow observability requirements and costs to be predicted on the basis of historical trends, enabling subsequent proactive optimization.
- Automated remediation: Some platforms can automate actions (e.g., reduce resources) based on detected anomalies, which will help eliminate more wastage.
As far as judging cloud observability is concerned, the quintessential question arises if there must be money spent without limit. Organizations can reduce cloud observability costs by strategically optimizing data ingestion and effectively managing retention. And embracing automation while ensuring that the level of visibility supported is optimal for maintaining resilient and high-performing cloud environments. The absolute necessity is that one must be proactive, analytical and forever seeking to fine-tune one’s observability approach in sync with operational requirements, as well as budget constraints.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Working with Microsoft 365’s new Copilot APIs 3 Jul 2025, 9:00 am
Microsoft has been adding AI features to its Microsoft 365 productivity platform for some time now, even renaming its desktop Office portal app “Microsoft 365 Copilot.” Under the hype is a mix of useful tools that gives you new ways of working with enterprise content.
That’s not surprising, considering that much of Microsoft 365 has been built on Microsoft’s existing enterprise content management platform, SharePoint. It’s an ideal foundation for AI tools as it offers role-based access to a mix of structured and unstructured data, all stored in hierarchical user- and team-defined repositories. That data helps ground large language models (LLMs) too, providing the necessary framework for retrieval-augmented generation (RAG) and other orchestration techniques.
Having a ground truth for Microsoft 365’s Copilot helps reduce risks associated with Microsoft’s AI applications, but that still limits what can be done with the services Microsoft has been building, especially when we want to build them into our own applications and workflows. With the increasing importance of agent-powered workflows in business process automation, having access to the Microsoft 365 Copilot platform would help speed up application development.
It’s clear there is not a one-size-fits-all approach to AI applications. Microsoft’s various Copilots only go part of the way to supporting businesses, with much more of a focus on the individual user. But Microsoft is a platform company, and eventually what it builds for its own applications becomes another piece of their developer story.
Introducing the Microsoft 365 Copilot APIs
It was good to see the announcement of a series of Microsoft 365 Copilot APIs at Build 2025, breaking out key pieces of its functionality while maintaining the essential security needed to build AI applications that comply with appropriate regulations and ensure only authorized users have access to data. Five different APIs are in the first set: Retrieval, Interactions Export, Change Notifications, Meeting Insights, and Chat. The first four are available as public previews, and the Chat API is currently in private preview.
Retrieval and Interactions Export are the APIs likely to be of most interest to anyone building AI workflows around Microsoft 365. Public previews are intended to test new code, as there can be breaking changes between releases, and so they are only available through the Microsoft Graph’s beta endpoint. Like all Microsoft Graph APIs, if you have the right tenant permissions, you can use the web-based Graph Explorer to build and test Copilot API requests.
Using the Interactions API for compliance
The Interactions API fills an interesting gap, as it builds on existing Teams compliance tools to use the Microsoft 365 Graph to download user interactions for analysis. The outputs include the initial user prompt and the response from the service—not only from the standalone Copilot app, but also from inside tools such as Word and Outlook.
The Interactions API isn’t a compliance tool, but it can be used to build one, offering a way to see prompts and responses that have been used in the Microsoft 365 Copilot. It helps find common prompts, allowing you to build them into Teams applications or Office plug-ins. The resulting prebuilt prompts can then be watched to ensure there’s minimal output drift or that updated content added to Microsoft 365 doesn’t distort responses. If users are using prompts where there’s no data, you can use this as a signal to create content or add sources.
Access is via a familiar HTTP call to the Copilot graph endpoint. You can get the entire interaction history for your organization in one call or you can apply filters to bring in a subset of the data. For example, you can filter by date or user ID. With a standard Microsoft Graph call structure, you have access to all the filter statements you’ve used in other applications, as well as basic Booleans and SQL-like queries.
Regular sampling of queries and responses can watch for leaks of personally identifiable information, as well as ensure that users’ role-based access controls only give access to authorized data. You can chain queries, using data from one to drive another, for example, a Microsoft Graph query can first get a user ID from an email address before querying to see how they are using Copilot in Word.
Responses are returned in JSON format, and you can use the Microsoft Graph SDK to build and parse requests. Alternatively, tools like Kiota generate libraries for specific endpoints, giving you the tools to build your own Copilot usage analytics applications and dashboards.
Ground your AI applications with enterprise data
The latest endpoint to get a public preview is the Retrieval API, which simplifies the process of bringing your enterprise content—and the resulting enterprise knowledge—into your AI applications using tools like Semantic Kernel or Copilot Studio. Like the other APIs, it’s designed to work inside the Microsoft Graph security perimeter, ensuring that users get answers based on their own authorizations.
You can use the API to build AI applications without adding complexity; there’s no need for vector indexes or a separate query environment. Instead, you build on Microsoft’s own semantic index, which powers the enterprise search features built into the platform. This reduces the workload needed to build RAG connections, so you can concentrate on the content you want to surface in your applications without having to spend time thinking about nearest-neighbor search algorithms.
By building on the Microsoft Graph’s SharePoint heritage, you’re able to quickly surface relevant content, reducing the risk of generating hallucinations and errors by focusing operations on specific content and in specific domains. An AI application for the legal team can build on data in the libraries and lists they use, while the sales team will be able to work with historical bids and terms.
Like other Microsoft Graph calls, the retrieval API uses HTTP calls with requests embedded in the JSON body of a POST. This includes a query string of up to 1,500 characters, a list of SharePoint or connector data sources, as well as Kusto Query Language (KQL) format filters, and the required number of results. Filters are a powerful tool to choose documents with specific metadata, for example, from a single author or associated with a specific project or customer. Filter expressions use KQL and work with any document property.
Responses are returned in a JSON document. This holds links to source documents along with relevant text extracts. You can define the metadata that’s returned, too, for example, providing any sensitivity labels. The same procedure works for SharePoint and for Copilot connectors to external data sources, so you can bring in data from tools like ServiceNow or Jira. Working with connectors helps bring in other knowledge sources and allows you to tie agent workflows to service tickets and the like. Filters can target specific sites and services, locking down to one or selecting several sources.
One useful aspect of the retrieval API is its support for JSON batching. A single call can embed multiple queries, up to 20. This approach allows you to mix SharePoint and connector queries in a single call, as well as different permutations of the same query. Requests are given separate IDs, and responses are assigned to the same IDs. If you’ve used Microsoft Graph, this approach will be familiar, as it uses the existing JSON batch feature.
Build queries in C#
If you don’t want to handcraft API calls, Microsoft is developing an open source client library with versions for C#, TypeScript, and Python. Current versions of all three are available on GitHub, where you can report any issues. They will eventually be part of the Microsoft 365 Agents SDK and will be available via NuGet. The current beta release of the .NET tool can be installed via NuGet using the .NET CLI or PowerShell.
Once installed, the client library works with Azure’s identity provider to authorize access to the Microsoft Graph. You can define tenants, data sources, and query strings. The response will be stored in a Results object and can then be read back and used as needed. You can use that data as part of an LLM orchestration with your choice of tools.
Using data stored in the Microsoft Graph to both ground and personalize your AI applications makes a lot of sense. SharePoint has long been a place where libraries and lists let us structure unstructured data, while OneDrive is an often-overlooked source of corporate knowledge. Combining these new APIs with LLM orchestration should provide both essential security and effective grounding, using your own data to power agents and conversational user interfaces.
AiSuite: An open-source AI gateway for unified LLM access 3 Jul 2025, 9:00 am
The proliferation of large language models (LLMs) has given developers a range of choices. While developers now have access to cutting-edge models from OpenAI, Anthropic, Google, AWS, and numerous other providers, each comes with its own unique API structures, authentication mechanisms, and response formats. This fragmentation has led developers to wrestle with different APIs, provider-specific documentation, and integration requirements. The result is increased development complexity, extended project timelines, and substantial technical debt as teams struggle to maintain multiple provider integrations simultaneously.
AiSuite has emerged as a revolutionary solution to this fragmentation, offering developers what can best be described as a “universal adapter for the LLM world.” By functioning as a thin wrapper around existing Python client libraries, AiSuite transforms the chaotic landscape of multiple LLM providers into a streamlined, unified experience that prioritizes developer productivity and application flexibility.
Project overview – AiSuite
AiSuite is an open-source Python library created by Andrew Ng and his team to simplify the integration of various AI models from different providers. As of June 2025, the project’s GitHub repository has garnered over 12,000 stars, reflecting its growing popularity in the AI development community.
At its core, AiSuite provides a unified interface that enables developers to interact with multiple large language models through a standardized API similar to OpenAI’s. This approach allows developers to easily switch between models from different providers without having to rewrite their code, making it an invaluable tool for those working with multiple AI services.
The project currently supports a wide range of LLM providers including OpenAI, Anthropic, AWS, Azure, Cerebras, Groq, Hugging Face, Mistral, Ollama, Sambanova, and Watsonx. By offering this comprehensive support, AiSuite addresses a significant pain point in the AI development workflow: the fragmentation of APIs across different providers.
What problem does AiSuite solve?
Developers working with multiple LLM providers often face significant challenges due to the fragmented nature of the AI ecosystem. Each provider has its own API structure, authentication mechanisms, and response formats, which can complicate development and extend project timelines.
The current landscape of LLM integration is inefficient and often requires developers to write custom code for each provider they wish to use. This leads to several pain points:
- Managing different API formats and authentication methods for each provider
- Difficulty in comparing performance across different models
- Increased development time when switching between providers
- Code maintenance challenges when providers update their APIs
These limitations particularly impact developers, AI researchers, and companies building LLM-powered applications. Organizations seeking to leverage multiple LLM providers are constrained by the complexity of managing various integrations and the lack of standardization across the ecosystem.
AiSuite addresses these challenges by providing a single, consistent interface that abstracts away the differences between providers. This allows developers to focus on building their applications rather than managing the intricacies of multiple APIs.
A closer look at AiSuite
AiSuite is designed to be both flexible and powerful. At its heart is the ability to translate all API calls into a familiar format, regardless of the underlying provider. This means developers can switch between models by simply changing a string in their code, such as from openai:gpt-4o
to anthropic:claude-3-7-sonnet
.
The library follows an interface similar to OpenAI’s, making it easy for developers already familiar with that API to adopt AiSuite. This design choice ensures a smooth transition for teams looking to expand beyond a single provider.
One of AiSuite’s key features is its simple installation process. Developers can install just the base package or include specific provider libraries based on their needs:
pip install aisuite # Installs just the base package
pip install 'aisuite[anthropic]' # Installs aisuite with Anthropic support
pip install 'aisuite[all]' # Installs all provider-specific libraries
Setting up AiSuite is straightforward, requiring only the API keys for the providers you intend to use. These keys can be set as environment variables or passed directly to the AiSuite client constructor.
Here’s a simple example of using AiSuite to generate responses from different models:
import aisuite as ai
client = ai.Client()
messages = [
{"role": "system", "content": "Respond in Pirate English."},
{"role": "user", "content": "Tell me a joke."}
]
# Using OpenAI's model
response = client.chat.completions.create(
model="openai:gpt-4o",
messages=messages,
temperature=0.75
)
print(response.choices[0].message.content)
# Using Anthropic's model
response = client.chat.completions.create(
model="anthropic:claude-3-5-sonnet-20240620",
messages=messages,
temperature=0.75
)
print(response.choices[0].message.content)
This example demonstrates how easily developers can switch between different providers by simply changing the model parameter. The rest of the code remains identical, showcasing AiSuite’s unified interface.
Key use cases for AiSuite
AiSuite excels in several key use cases that highlight its versatility and value in AI development workflows.
Multi-provider integration
AiSuite enables developers to integrate and compare multiple LLM providers in their applications easily. This allows teams to:
- Use different models for specific tasks based on their strengths
- Implement A/B testing across providers to determine optimal performance
- Create fallback mechanisms to ensure high availability
Simplified development workflow
By providing a consistent API across different LLM providers, AiSuite supports a more streamlined development process. Developers can:
- Quickly prototype with different models without changing code
- Easily switch between models for testing and comparison
- Reduce the learning curve for team members working with new providers
Educational and research applications
AiSuite’s unified interface makes it an excellent tool for educational and research purposes. Users can:
- Compare responses from different models to the same prompt
- Evaluate performance across providers for specific tasks
- Experiment with different parameters across models
A recent addition to AiSuite is enhanced function calling capabilities, which simplify the implementation of agentic workflows. This feature allows developers to define functions that LLMs can call, making it easier to build complex AI applications that interact with external tools and services.
Bottom line – AiSuite
AiSuite represents a significant advancement in the field of AI development tools. By providing a unified interface to multiple LLM providers, it addresses a critical pain point in the current AI ecosystem: the fragmentation of APIs and the complexity of working with multiple models.
The project’s open-source license (MIT), active community, and comprehensive provider support make it an attractive option for developers seeking to build flexible, robust AI applications. As the AI landscape continues to evolve, tools like AiSuite will play an increasingly important role in enabling developers to leverage the best models for their specific needs without being locked into a single provider.
With a simple installation process, familiar interface, and growing feature set, AiSuite is well-positioned to become a standard tool in the AI developer’s toolkit. Whether you’re building a simple chatbot or a complex AI system, AiSuite’s streamlined approach to working with multiple LLM providers can significantly reduce development time and complexity.
What you need to know about Java wrapper classes 3 Jul 2025, 9:00 am
Have you ever wondered how Java seamlessly combines its primitive data types with object-oriented programming? Enter wrapper classes, an important but often overlooked Java feature. These special classes bridge the gap between primitive types (like int
and double
) and objects, enabling you to store numbers in collections, handle null values, use generics, and even process data in modern features like pattern matching.
Whether you’re working with a List
or parsing a Double
from a string, Java’s wrapper classes make it all possible. In this article, we’ll catch up with wrapper classes in Java 21, the current LTS (long-term support) version of Java. I’ll also provide tips, examples, and traps to avoid when working with wrapper classes in equals()
and hashCode()
.
Before we dive into what’s new with wrapper classes in Java 21, let’s do a quick review.
Definition and purpose of wrapper classes
Java wrapper classes are final, immutable classes that “wrap” primitive values inside objects. Each primitive type has a corresponding wrapper class:
These wrapper classes serve multiple purposes:
- Enabling primitives to be used where objects are required (for example, in collections and generics).
- Providing utility methods for type conversion and manipulation.
- Supporting null values, which primitives cannot do.
- Facilitating reflection and other object-oriented operations.
- Enabling consistent handling of data through object methods.
The evolution of wrapper classes through Java versions
Wrapper classes have undergone significant evolution throughout Java’s history:
- Java 1.0 through Java 1.4 introduced basic wrapper classes with manual boxing and unboxing.
- Java 5 added autoboxing and unboxing, dramatically simplifying code.
- Java 8 enhanced wrapper classes with new utility methods and functional interface compatibility.
- Java 9 deprecated wrapper constructors in favor of factory methods.
- Java 16 through 17 strengthened deprecation warnings and prepared for the removal of wrapper constructors.
- Java 21 improved pattern matching with wrappers and further optimized their performance for virtual threads.
This evolution reflects Java’s ongoing balance between backward compatibility and integrating modern programming paradigms.
Wrapper classes in Java 21’s type system
Starting in Java 21, wrapper classes have played an increasingly sophisticated role in Java’s type system:
- Enhanced pattern matching for
switch
andinstanceof
works seamlessly with wrapper types. - Natural integration with record patterns for cleaner data manipulation.
- Optimized interaction between wrapper types and the virtual thread system.
- Improved type inference for wrappers in lambda expressions and method references.
Wrapper behavior was also refined as groundwork for Project Valhalla’s value types.
Wrapper classes in Java 21 maintain their fundamental bridge role while embracing modern language features, making them an essential component of contemporary Java development.
Primitive data types and wrapper classes in Java 21
Java provides a wrapper class for each primitive type, creating a complete object-oriented representation of the language’s fundamental values. Here’s a quick review of primitive types and their corresponding wrapper class, along with a creation example:
Primitive type | Wrapper class | Example creation |
boolean | java.lang.Boolean | Boolean.valueOf(true) |
byte | java.lang.Byte | Byte.valueOf((byte)1) |
char | java.lang.Character | Character.valueOf('A') |
short | java.lang.Short | Short.valueOf((short)100) |
int | java.lang.Integer | Integer.valueOf(42) |
long | java.lang.Long | Long.valueOf(10000L) |
float | java.lang.Float | Float.valueOf(3.14F) |
double | Double java.lang | Double.valueOf(2.71828D) |
Each wrapper class extends Object
and implements interfaces like Comparable
and Serializable
. Wrapper classes provide additional functionality beyond their primitive counterparts, enabling them to be compared with the equals()
method.
Wrapper class methods
Java’s wrapper classes provide a rich set of utility methods beyond their primary role of boxing primitives. These methods offer convenient ways to parse strings, convert between types, perform mathematical operations, and handle special values.
Type conversion methods
- String parsing:
Integer.parseInt("42")
,Double.parseDouble("3.14")
- Cross-type conversion:
intValue.byteValue()
,intValue.doubleValue()
- Base conversion:
Integer.parseInt("2A", 16)
,Integer.toString(42, 2)
- Unsigned operations:
Integer.toUnsignedLong()
Utility methods
- Min/max functions:
Integer.min(a, b)
,Long.max(x, y)
- Comparison:
Double.compare(d1, d2)
- Math operations:
Integer.sum(a, b)
,Integer.divideUnsigned(a, b)
- Bit manipulation:
Integer.bitCount()
,Integer.reverse()
- Special value checking:
Double.isNaN()
,Float.isFinite()
valueOf()
Another important method to know about is valueOf()
. Constructors were deprecated in Java 9 and marked for removal in Java 16. One way to manage without constructors is to use factory methods instead; for example, Integer.valueOf(42)
rather than new Integer(42)
. The advantages of valueOf()
include:
- Memory-efficient caching for primitive wrappers (-128 to 127 for
Integer
,Short
,Long
, andByte
; 0-127 forCharacter
; TRUE/FALSE constants forBoolean
). Float
andDouble
don’t cache due to their floating-point value range.- Some factory methods have well-defined behavior for null inputs.
Wrapper class updates for pattern matching and virtual threads
Wrapper classes in Java 21 were optimized for pattern matching and virtual threads. Pattern matching in Java allows you to test the structure and types of objects while simultaneously extracting their components. Java 21 significantly enhances pattern matching for switch
statements, particularly in wrapper classes. As the following example shows, enhanced pattern matching enables more concise and type-safe code when handling polymorphic data:
public String describeNumber(Number n) {
return switch (n) {
case Integer i when i "Negative integer: " + i;
case Integer i -> "Positive integer: " + i;
case Double d when d.isNaN() -> "Not a number";
case Double d -> "Double value: " + d;
case Long l -> "Long value: " + l;
case null -> "No value provided";
default -> "Other number type: " + n.getClass().getSimpleName();
};
}
Key improvements for pattern matching include:
- Null handling: An explicit case for null values prevents unexpected
NullPointerException
s. - Guard patterns: The
when
clause enables sophisticated conditional matching. - Type refinement: The compiler now understands the refined type within each case branch.
- Nested patterns: Pattern matching now supports complex patterns involving nested wrapper objects.
- Exhaustiveness checking: You can now get a compiler verification that all possible types are covered.
These features make wrapper class handling more type-safe and expressive, particularly in code that processes mixed primitive and object data.
Java 21’s virtual threads feature also interacts with wrapper classes in several important ways. For one, boxing overhead has been reduced in concurrent contexts, as shown here:
// Efficiently processes a large stream of numbers using virtual threads
void processNumbers(List numbers) {
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
numbers.forEach(num ->
executor.submit(() -> processNumber(num))
);
}
}
Additional updates for virtual threads include:
- The JVM optimizes thread communication involving wrapper classes, which reduces overhead in virtual thread scheduling and handoffs.
- Thread-local caching was also improved. Wrapper class caches (-128 to 127 for
Integer
, etc.) are maintained per carrier thread rather than per virtual thread, preventing unnecessary memory usage in high-concurrency scenarios. - Identity preservation has also been added. Within a single virtual thread, wrapper identity is maintained appropriately for synchronization and identity-sensitive operations.
Finally, wrapper classes were optimized to improve their performance with virtual threads:
- Virtual threads use stack walking for various operations. Wrapper classes optimize these interactions.
- Wrapper classes in virtual thread scheduler queues benefit from memory efficiency improvements.
- Thread pinning risks are reduced through optimized unboxing operations.
- Structured concurrency patterns work seamlessly with wrapper class value compositions.
The integration between wrapper classes and virtual threads ensures wrapper classes maintain their usefulness in the new concurrent programming models introduced in Java 21. The changes described here ensure that wrapper classes continue to function in Java without the performance penalties that might otherwise occur in high-throughput, virtual thread-intensive applications.
Equals and hashcode implementation in wrapper classes
Wrapper classes override the equals()
method to perform value-based comparisons rather than the reference comparison used by Object.equals()
. In a value-based comparison, two wrapper objects are equal if they contain the same primitive value, regardless of being distinct objects in memory. This type of comparison has the advantages of type specificity and null safety:
- Type specificity: The comparison only returns true if both objects are of the exact same wrapper type.
- Null safety: All wrapper implementations safely handle null comparisons.
In the following example, Integer.equals()
checks whether the argument is an Integer
and has the same int
value:
public boolean equals(Object obj) {
if (obj instanceof Integer)
return value == ((Integer)obj).intValue();
}
return false;
}
There are a couple of exceptional cases to note:
- Float and Double: These wrappers handle special values like NaN consistently. (NaN equals NaN in
equals()
, unlike the primitive comparison.) - Autoboxing: When comparing with
==
instead ofequals()
, autoboxing can lead to unexpected behavior due to the caching of certain values.
Wrappers in hash-based collections
Wrapper classes implement hashCode()
in a way that directly corresponds to their primitive values, ensuring consistent behavior in hash-based collections. This implementation is critical for collections like HashMap
, HashSet
, and ConcurrentHashMap
. Consider the following implementation details, then we’ll look at a few examples.
Integer
,Short
, andByte
: Return the primitive value directly as the hash code.Long
: XOR the lower 32 bits with the upper 32 bits:((int)(value ^ (value >>> 32)))
.Float
: Convert to raw bits withFloat.floatToIntBits()
to handle special values likeNaN
.Double
: Convert to raw bits, then use theLong
strategy on the resulting bits.Character
: Return the Unicode code point as the hash code.Boolean
: Return1231
for true or1237
for false (arbitrary but consistent values).
Using wrappers in hash-based collections has several advantages:
- Performance: Hash-based collections rely on well-distributed hash codes for O(1) lookup performance.
- Consistency: The
hashCode()
contract requires equal objects to produce equal hash codes, which wrapper classes guarantee. - Special value handling: Proper handling of edge cases like NaN in floating-point types (two NaN values are equal in hash code despite not being equal with
equals()
). - Distribution: The implementations are designed to minimize hash collisions for common value patterns.
- Immutability: Since wrapper objects are immutable, their hash codes can be safely cached after the first calculation, improving performance.
This careful implementation ensures that wrapper classes function reliably as keys in hash-based collections, a common use case in Java applications.
The == versus .equals() wrapper trap
I’ve seen many bugs caused by comparing wrapper objects with ==
instead of .equals()
. It’s a classic Java gotcha that bites even experienced developers. You can see here what makes it so tricky:
Integer a = 100;
Integer b = 100;
System.out.println(a == b); // Prints: true
Integer c = 200;
Integer d = 200;
System.out.println(c == d); // Prints: false (Wait, what?)
The confusing behavior happens because Java internally caches Integer
objects for common values (typically -128 to 127). Within this range, Java reuses the same objects, and outside of the cache range, you get new objects.
This is why the golden rule is simple: Always use .equals()
when comparing wrapper objects. This method consistently checks for value equality rather than object identity:
// This works reliably regardless of caching
if (wrapperA.equals(wrapperB)) {
// Values are equal
}
The null unboxing trap
Developers waste a lot of time trying to understand the origin of confusing NullPointerException
s like the one shown here:
Integer wrapper = null;
int primitive = wrapper; // Throws NullPointerException at runtime
This seemingly innocent code compiles without warnings but crashes at runtime. When Java attempts to unbox a null wrapper to its primitive equivalent, it tries to call methods like intValue()
on a null reference, resulting in a NullPointerException
.
This issue is particularly dangerous because it passes compilation silently, the error only surfaces during execution, and it commonly occurs with method parameters, database results, and collection processing. To protect your code, you can use the following defensive strategies:
- Explicit null checking; e.g.,
int primitive = (wrapper != null) ? wrapper : 0;
- Java 21 pattern matching; e.g.,
int value = (wrapper instanceof Integer i) ? i : 0;
- Provide default values; e.g.,
int safe = Optional.ofNullable(wrapper).orElse(0);
Always be cautious when converting between wrapper objects and primitives, especially when working with data that may contain null values from external sources or database queries.
To learn more about the null unboxing trap, you can check out my Java challenger here.
Wrapper constants (don’t reinvent the wheel)
Every Java developer has probably written code like “if (temperature > 100)
” at some point. But what about when you need to check if a value exceeds an integer’s maximum capacity? Hard-coding 2147483647
is a recipe for bugs.
Instead, you can use wrapper classes with built-in constants:
// This is clear and self-documenting
if (calculatedValue > Integer.MAX_VALUE) {
logger.warn("Value overflow detected!");
}
The most useful constants fall into two categories.
Numeric limits help prevent overflow bugs:
Integer.MAX_VALUE
andInteger.MIN_VALUE
.Long.MAX_VALUE
when you need bigger ranges.
Floating-point specials handle edge cases:
Double.NaN
for “not a number” results.Double.POSITIVE_INFINITY
when you need to represent∞
.
I’ve found these particularly helpful when working with financial calculations or processing scientific data where special values are common.
Memory and performance impact of wrapper classes
Understanding the memory and performance implications of wrappers is crucial. To start, each wrapper object requires 16 bytes of header: 12 bytes for the object header and 4 for the object reference. We also must account for the actual primitive value storage (e.g., 4 bytes for Integer
, 8 for Long
, etc.). Finally, object references in collections add another layer of memory usage, and using wrapper objects in large collections also significantly increases memory compared to primitive arrays.
There also are performance considerations. For one, despite JIT optimizations, repeated boxing and unboxing in tight loops can impact performance. On the other hand, wrappers like Integer
cache commonly used values (-128 to 127 by default), reducing object creation. Additionally, modern JVMs can sometimes eliminate wrapper allocations entirely when they don’t “escape” method boundaries. Project Valhalla aims to address these inefficiencies by introducing specialized generics and value objects.
Consider the following best practice guidelines for reducing the performance and memory impact of wrapper classes:
- Use primitive types for performance-critical code and large data structures.
- Leverage wrapper classes when object behavior is needed (eg., collections and nullability).
- Consider specialized libraries like Eclipse Collections for large collections of “wrapped” primitives.
- Be cautious about identity comparisons (
==
) on wrapper objects. - Always use the
Object equals()
method to compare wrappers. - Profile before optimizing, as JVM behavior with wrappers continues to improve.
While wrapper classes incur overhead compared to primitives, Java’s ongoing evolution continues to narrow this gap while maintaining the benefits of the object-oriented paradigm.
General best practices for wrapper classes
Understanding when to use primitive types versus wrapper classes is essential for writing efficient and maintainable code in Java. While primitives offer better performance, wrapper classes provide flexibility in certain scenarios, such as handling null values or working with Java’s generic types. Generally, you can follow these guidelines:
Use primitives for:
- Local variables
- Loop counters and indices
- Performance-critical code
- Return values (when null is not meaningful)
Use wrapper classes for:
- Class fields that can be null
- Generic collections (e.g.,
List
) - Return values (when null has meaning)
- Type parameters in generics
- When working with reflection
Conclusion
Java wrapper classes are an essential bridge between primitive types and Java’s object-oriented ecosystem. From their origins in Java 1.0 to enhancements in Java 21, these immutable classes enable primitives to participate in collections and generics while providing rich utility methods for conversion and calculation. Their careful implementations ensure consistent behavior in hash-based collections and offer important constants that improve code correctness.
While wrapper classes incur some memory overhead compared to primitives, modern JVMs optimize their usage through caching and JIT compilation. Best practices include using factory methods instead of deprecated constructors, employing .equals()
for value comparison, and choosing primitives for performance-critical code. With Java 21’s pattern-matching improvements and virtual thread integration, wrapper classes continue to evolve while maintaining backward compatibility, cementing their importance in Java development.
Page processed in 0.539 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.