Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Anthropic’s Claude Opus 4.5 pricing cut signals a shift in the enterprise AI market | InfoWorld
Technology insight for the enterpriseAnthropic’s Claude Opus 4.5 pricing cut signals a shift in the enterprise AI market 25 Nov 2025, 12:59 pm
Anthropic has launched Claude Opus 4.5 with a 67% price cut that repositions its flagship model from a boutique offering to a production-ready enterprise tool.
The new pricing of $5 per million input tokens and $25 per million output tokens — down from $15 and $75 — brings Anthropic closer to OpenAI and Google while maintaining a premium position.
The launch comes a week after Google released Gemini 3 and less than two weeks after OpenAI launched GPT-5.1, underscoring the rapid pace of competition in enterprise AI. For context, OpenAI’s GPT-5.1 costs $1.25 per million input tokens and $10 per million output tokens, while Google’s Gemini 3 Pro runs $2 to $4 per million input tokens.
“Opus 4.5 is a meaningful step forward in what AI systems can do,” the company said in an announcement on its website.
The benchmark debate
Anthropic claimed Opus 4.5 achieved 80.9% on Software Engineering Benchmark Verified, outperforming OpenAI’s GPT-5.1-Codex-Max at 77.9%, Google’s Gemini 3 Pro at 76.2%, and its own Sonnet 4.5 at 77.2%. The company also said the model scored higher on its internal two-hour performance engineering assessment than any human candidate who has taken the exam.
But analysts cautioned that benchmark scores tell only part of the story. “Benchmark scores often look impressive but mean very little once a model enters production,” said Sanchit Vir Gogia, chief analyst at Greyhound Research. “They are clean, simple, and run in isolation. Most enterprise systems are layered with legacy software, inconsistent workflows, and regulatory overhead.”
Leslie Joseph, principal analyst at Forrester, noted that while Anthropic’s lead is statistically significant, “the gap between these models has narrowed to the point where winner-takes-all metrics are less relevant than considerations of best architectural fit.”
The real evaluation criteria for enterprises, according to Gogia, are different: “Can the model work with an organization’s legacy tools? Will it maintain accuracy across long sequences? Is it stable under load? These are the details that matter more than a score on a public leaderboard.”
Strategic pricing shift
Beyond performance claims, Anthropic’s pricing strategy marks a significant repositioning. The pricing reduction represents more than cost competition. “Previously, the Opus line was viewed as a ’boutique’ model, too expensive for general automated workflows,” Joseph said. “By slashing prices significantly, Anthropic is undercutting the specialized enterprise models of its rivals, signaling that ‘frontier intelligence’ is no longer a scarce resource.”
The company said prompt caching can reduce costs by up to 90% and batch processing offers 50% savings. These optimization features matter for enterprise deployments where predictable costs and resource management drive adoption decisions.
“Anthropic’s updated pricing is not about joining a race to the bottom,” Gogia said. “The new rates bring Claude closer to competitors, but the value proposition remains clear. This model is built for enterprises that care more about stability and trust than token volume.”
The model offers a 200,000-token context window — roughly 150,000 words — and is available through Anthropic’s API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. According to the announcement, Opus 4.5 is “built for professional software engineering, complex agentic workflows, and high-stakes enterprise tasks.”
Where it fits in enterprise workflows
The technical specifications translate to specific enterprise use cases. Internal testers at Anthropic reported that Opus 4.5 completed tasks that were difficult for Sonnet 4.5 and handled ambiguous requirements without extensive guidance, according to the announcement. In the announcement, the company acknowledged its performance engineering test doesn’t measure collaboration, communication, or professional judgment developed through experience.
Analysts identified specific use cases where the model’s capabilities align with enterprise needs. “Given the model’s specific enhancements in agentic coding and reliable document synthesis, software engineering and document-heavy departments such as legal or compliance are clear beneficiaries,” Joseph said.
Gogia emphasized the model’s suitability for precision-focused roles: “Legal teams, software architects, policy writers, and compliance officers will see the most impact. These are the areas where accuracy matters more than speed.”
Deepika Giri, head of research for big data and AI at IDC Asia/Pacific, said the combination of competitive pricing and efficiency is accelerating enterprise adoption. “Advanced safety and auditability features make Anthropic particularly well-suited for compliance-focused industries and regulated use cases,” Giri said.
Developer tools and integrations
Beyond the core model improvements, Anthropic expanded its developer toolset. Anthropic expanded Claude Code, its terminal-based development environment, with an enhanced plan mode. “Plan Mode now builds more precise plans and executes more thoroughly — Claude asks clarifying questions upfront, then builds a user-editable plan.md file before executing,” the company said in the announcement.
The tool is now available in Anthropic’s desktop application. Claude Opus 4.5 is also available in GitHub Copilot for Pro, Pro+, Business, and Enterprise users, according to a GitHub changelog.
For paid users, “long conversations no longer hit a wall — Claude automatically summarizes earlier context as needed, so you can keep the chat going,” the announcement said. Claude for Chrome, which operates across browser tabs, is now available to all Max subscribers. Claude for Excel moved to general availability for Max, Team, and Enterprise users, the announcement added.
What enterprises should weigh
For organizations evaluating AI platforms, the decision extends beyond per-token pricing. “Procurement teams must weigh ecosystem fit,” Joseph said. “For instance, a company deeply integrated into Google Workspace might find Gemini’s multimodal capabilities offer more practical value, even with a slightly lower coding score.”
Gogia noted that cost considerations in regulated industries differ from pure API economics. “In industries where a single misstep can trigger compliance issues or downstream rework, the true cost is not the API bill. It is the cleanup. For those teams, paying more for fewer problems is a trade that makes sense.” The launch marks Anthropic’s third major model release in two months, following Sonnet 4.5 in September and Haiku 4.5 in October.
Microsoft’s Fara-7B brings AI agents to the PC with on-device automation 25 Nov 2025, 10:59 am
Microsoft is pushing agentic AI deeper into the PC with Fara-7B, a compact computer-use agent (CUA) model that can automate complex tasks entirely on a local device.
The experimental release, aimed at gathering feedback, provides enterprises with a preview of how AI agents might run sensitive workflows without sending data to the cloud, while still matching or outperforming larger models like GPT-4o in real UI navigation tasks.
“Unlike traditional chat models that generate text-based responses, Computer Use Agent (CUA) models like Fara-7B leverage computer interfaces, such as a mouse and keyboard, to complete tasks on behalf of users,” Microsoft said in a blog post. “With only 7 billion parameters, Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems that depend on prompting multiple large models.”
Fara-7B processes screenshots and interprets on-screen elements at the pixel level, enabling it to navigate interfaces even when the underlying code is complex or unavailable.
In internal benchmarks, Fara-7B posted a 73.5 percent success rate on the WebVoyager test, surpassing GPT-4o when both were evaluated as computer-use agents. Microsoft said the model also tends to finish tasks in far fewer steps than earlier 7B-class systems, which could translate to faster and more predictable automation on the desktop.
Microsoft has also built a “Critical Points” safeguard into the model, requiring the agent to pause and request user approval before performing irreversible actions such as sending emails or completing financial transactions.
The shift to local models
Analysts note that the move toward compact, local models such as Fara-7B reflects a broader shift in enterprise AI architecture.
Cloud-based systems continue to dominate for large-scale reasoning and organization-wide search. Still, many day-to-day enterprise workflows involve copying data between internal applications on a laptop, where information cannot leave the device.
“Edge-based models solve three big problems with cloud AI: compute cost, data leaving the device, and latency,” said Pareekh Jain, CEO of Pareekh Consulting. “Most enterprise tasks happen across internal apps on a laptop, and a local agent is a much better fit for that.”
Charlie Dai, VP and principal analyst at Forrester, said Fara-7B shows how lightweight, device-resident agents will become more important as organizations accelerate their adoption of agentic AI.
“For enterprises, this signals a gradual decentralization of AI workloads, lowering dependency on hyperscale infrastructure while demanding new strategies for edge governance and model lifecycle management,” Dai added.
The trend also reflects a broader move toward hybrid AI architectures, where local agents handle privacy-sensitive workflows and cloud systems continue to provide scale, according to Tulika Sheel, a senior VP at Kadence International.
By keeping data local and reducing reliance on hyperscale compute, small on-device agents offer a practical way to automate sensitive or repetitive desktop tasks without exposing information to external systems.
Practicality and governance challenges
Pixel-level agents promise broader compatibility because they can work across many applications without custom integrations, but they also bring operational risks. Jain compared this approach to an AI-enhanced version of robotic process automation, where the agent mimics mouse and keyboard inputs to move data between systems.
Snowflake to acquire Select Star to enhance its Horizon Catalog 25 Nov 2025, 10:17 am
Snowflake has signed a definitive agreement to acquire San Francisco-based startup Select Star’s team and context metadata platform to enhance its Horizon Catalog offering, the company said in a statement.
Horizon Catalog is a unified data discovery, management, and governance suite inside the cloud-based data warehouse provider’s Data Cloud offering.
Data and governance catalogs such as Snowflake’s Horizon and Databricks’ Unity have been gaining popularity among enterprises as they offer a unified control plane to manage the sprawl of data spread across multiple clouds and applications.
These catalogs also offer a unified contextual view of an enterprises’ data estate, which is increasingly becoming a staple need for enterprises to develop AI-driven applications and agents as they require clean, well-documented, and traceable inputs to perform reliably.
Snowflake plans to use Select Star’s context metadata platform to expand Horizon’s data access capabilities giving its users more options to contextualize data for AI-driven applications and agents.
Select Star already has integrations with database systems like PostgreSQL and MySQL, business intelligence tools like Tableau and Power BI, and data pipeline/orchestration tools like dbt and Airflow.
Race to become the AI-native foundation for enterprises
The expansion of Horizon’s capabilities aligns with Snowflake’s continued efforts to outpace rivals in garnering more data and analytics workloads buoyed by the demand for AI-driven applications and agents.
“Snowflake knows that the battle for AI workloads will be won in metadata, lineage, and trust, not raw storage. Horizon Catalog is a strong foundation, but Select Star brings what Snowflake lacks today, real automated discovery, column level lineage, usage intelligence, and UX that reduces the grunt work data analysts still do,” said Phil Fersht, CEO of HFS Research.
“There is a clear market gap for full stack metadata intelligence that is deeply embedded in the platform, not bolted on. At the same time, Databricks keeps stretching the gap in governance and lineage with Unity Catalog. Snowflake knows it must respond with real acceleration,” Fersht said adding that the inorganic approach to acquiring these capabilities work well for Snowflake.
Snowflake is also locked in a battle with hyperscalers, such as Google, AWS, and Microsoft for dominance in data analytics workloads for AI applications, said ISG Software Research’s executive director David Menninger.
“AI is the hot topic in the market right now, and AI is entirely dependent on data. Our research shows that making data usable for AI is the most common challenge enterprises face with their data,” Menninger said.
The intensity of the battle that Menninger is hinting at is reflected by Snowflake’s previous acquisitions this year.
In June, Snowflake announced its intent to acquire US-based cloud-based PostgreSQL database provider Crunchy Data, in an effort to offer developers an easier way to build AI-based applications by offering a PostgreSQL database, to be dubbed Snowflake Postgres, in its AI Data Cloud.
Although the timing of the deal suggested that Snowflake was responding to Databricks’ acquisition of open source serverless Postgres company Neon, analysts say the two vendors are vying to become “the” AI-native data foundation unifying analytics, operational storage, and machine learning .
Earlier this month,Snowflake acquired Datometry to boost SnowConvert AI, one of its existing set of free migration tools, sharpening its pitch to enterprises that are looking to shift legacy database workloads to cloud and are trying to achieve it without the usual pain, cost, or uncertainty of large-scale rewrites.
Where AI meets cloud-native computing 25 Nov 2025, 9:00 am
In the past decade, we’ve seen two major advances in software development: cloud-native architecture and artificial intelligence. The first redefined how we build, deploy, and manage applications, and the second is becoming a mainstream utility. Now, the two are converging, prompting developers to reevaluate both their skill sets and architectural strategies. This convergence isn’t just future talk. It’s today’s competitive reality.
The intersection of AI and cloud-native technology is much broader than just combining Kubernetes with machine learning or simply wrapping a chatbot in a container. It’s about fundamentally rethinking how applications deliver value at scale, in real time, with agility and resilience that only a cloud-native foundation can offer. The journey is complex, and the main issue is a knowledge gap that could slow innovation or, in the worst case, lead to fragile, unscalable architectures.
A new way to design AI systems
Cloud-native development is centered on containers, orchestration (such as Kubernetes), and microservices. It has become the standard for building scalable, resilient applications. Meanwhile, AI’s business value is undisputed, whether it’s predictive analytics that accelerate logistics or generative models that power customer experiences. If organizations want to make AI truly production-ready, resilient, and adaptable, it is vital that these new AI systems inherit cloud-native qualities.
Here’s the core issue: Most AI projects start with the model. Data scientists build something compelling on a laptop, perhaps wrap it in a Flask app, and then throw it over the wall to operations. As any seasoned cloud developer knows, solutions built outside the context of modern, automated, and scalable architecture patterns fall apart in the real world when they’re expected to serve tens of thousands of users, with uptime service-level agreements, observability, security, and rapid iteration cycles. The need to “cloud-native-ify” AI workloads is critical to ensure that these AI innovations aren’t dead on arrival in the enterprise.
In many CIO discussions, I hear pressure to “AI everything,” but real professionals focus on operationalizing practical AI that delivers business value. That’s where cloud-native comes in. Developers must lean into pragmatic architectures, not just theoretical ones. A cutting-edge AI model is useless if it can’t be deployed, monitored, or scaled to meet modern business demands.
A pragmatic cloud-native approach to AI means building modular, containerized microservices that encapsulate inference, data preprocessing, feature engineering, and even model retraining. It means leveraging orchestration platforms to automate scaling, resilience, and continuous integration. And it requires developers to step out of their silos and work closely with data scientists and operations teams to ensure that what they build in the lab actually thrives in the wild.
Three truths developers must embrace
First, cloud-native is not a shortcut. Complexity is the price of admission. Many developers imagine that containers and orchestration will magically solve all deployment headaches. These tools provide immense flexibility and scalability, but they introduce their own operational complexities in everything from networking and service discovery to security policies and resource optimization. It’s imperative for developers to invest sufficient time to understand these new abstractions. Skipping this step often leads to brittle, unmanageable architectures.
Second, data is at the heart of both AI and cloud-native, and the challenges multiply when you bring them together. Unlike stateless web applications, AI models often require stateful data pipelines for training, inference, retraining, and more. Orchestrating and versioning data flows across microservices and container boundaries is not trivial. Developers need to master robust data versioning, lineage, and governance patterns or risk building systems that produce unreliable predictions or that can’t be audited for compliance.
Third, observability is no longer optional, especially for AI-enabled systems in production. Microservices architectures splinter functionality across numerous services, each potentially using different models or data pipelines. When things go wrong (and they inevitably do), it’s crucial to have deep, end-to-end visibility across the stack. Developers must build monitoring, logging, tracing, and model performance tracking into the very bones of their applications. This effort pays dividends, not just in uptime but in the ability to quickly iterate and improve models based on real usage.
Bridging both worlds
For developers and enterprises intent on AI-powered innovation, meeting the challenge means going all-in on cloud-native principles. This does not mean abandoning the latest in machine learning and generative models. Rather, it requires taking a step back to ensure that these advanced capabilities are operationalized within scalable, resilient cloud-native architectures. The payoff will be systems that are innovative in the lab and transformative in the market.
Cloud-native technologies act as a force multiplier, transforming AI from experimental projects into enterprise-ready solutions. Developers who dedicate time to understanding the intersection, navigating complexity with pragmatism, and focusing on data and observability will become true change agents in a world where AI is increasingly a business imperative. The merging of AI and cloud-native is not just a trend; it’s a fundamental shift. Those who embrace this challenge and master the necessary tools, discipline, and mindset will position themselves and their organizations to catch the next wave of digital innovation.
Anthropic releases Claude Opus 4.5 25 Nov 2025, 1:19 am
Anthropic has introduced Claude Opus 4.5, a hybrid reasoning model for coding, agents, and computer use. The company said this new version of the Claude Opus model offers better vision, reasoning, and mathematics than predecessors.
Claude Opus 4.5 is meaningfully better at everyday tasks such as deep research and working with slides and spreadsheets, Anthropic said.
Claude Opus 4.5 was introduced November 24. It is available now in Anthropic’s consumer applications, its API, and on all three major cloud platforms including Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Developers can access the model by using claude-opus-4-5-20251101 via the Claude API.
According to Anthropic, testers of Claude Opus 4.5 found that the model handles ambiguity and reasons about tradeoffs without hand-holding, and that tests that proved near-impossible for Sonnet 4.5 only a few weeks ago were now within reach of Opus 4.5. Further, with Claude Opus 4.5, Anthropic has made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior, the company said.
Claude Opus 4.5 uses dramatically fewer tokens than its predecessors to reach similar or better outcomes, Anthropic said. Pricing for the new model is $5 per million tokens of input and $25 per million tokens of output.
Anthropic also announced updates to the Claude Developer Platform, Claude Code, and its consumer apps.
New Shai-Hulud worm spreading through npm, GitHub 25 Nov 2025, 12:24 am
A new version of the Shai-Hulud credentials-stealing self-propagating worm is expanding through the open npm registry, a threat that developers who download packages from the repository have to deal with immediately.
Researchers at Wiz Inc. said Monday that in the early stages of the campaign late last week, a thousand new GitHub repositories containing harvested victim data were being added every 30 minutes. And researchers at JFrog identified 181 compromised packages.
The current campaign introduces a new variant, which Wiz researchers dub Shai-Hulud 2.0, that executes malicious code during the preinstall phase, “significantly increasing potential exposure in build and runtime environments.”
The threat leverages compromised package maintainer accounts to publish trojanized versions of legitimate npm packages. Once installed, the malware exfiltrates developer and CI/CD secrets to GitHub repositories, and also inserts the malicious payload into all of the users’ available npm packages. Threat actors could also use the exfiltrated secrets to break into and install more malware in victims’ IT systems.
JFrog said this new variant generates randomized repository names for exfiltration, making it harder for security teams to hunt down and scrub the leaked secrets. JFrog also said the new payload contains new functionality, including privilege escalation, DNS hijacking, and the ability to delete data from the victim’s machine.
Multiple popular packages used by developers, including those from Zapier, ENS Domains, PostHog, and Postman, have been compromised.
Researchers at ReversingLabs also noted the list of compromised packages includes AsyncAPI related packages, including @asyncapi/specs, which has had more than 100 million lifetime downloads and an average of 1.4 million weekly downloads. This package in particular is also believed to be the ‘patient-zero’, or the first known infected package, for this wave of attack, the researchers added.
Second wave is bigger and faster
Developers and security teams looking for indicators of compromise should note that the new variant adds two new payload files: setup_bun.js and bun_environment.js.
“The re-emergence of the worm indicates that this remains a current and serious threat to the npm ecosystem,” said Johannes Ullrich, dean of research at the SANS Institute. “CSOs must address this threat by monitoring the components used in their software and hardening their CI/CD pipelines to increase resilience in the event that malicious code is executed.”
Shai-Hulud first emerged in September, revealed by the discovery that dozens of npm libraries, including a color library with over 2 million downloads a week, had been replaced with malicious versions.
The initial Shai-Hulud wave was already one of the most severe JavaScript supply-chain attacks Wiz has seen, Merav Bar, a company threat researcher and co-author of the report told CSO. “This new wave is bigger and faster: more than 25,000 attacker-created repos across roughly 350 GitHub users, growing by about 1,000 repos every 30 minutes, with malware that steals developer and cloud credentials and runs in the preinstall phase, touching dev machines and CI/CD pipelines alike. That combination of scale, speed, and access makes it a high-impact campaign.”
Assume compromise
If an individual had pulled any of the affected packages during the November 21–23 window, she said, they should assume their environment is exposed. Remedies include clearing the npm cache on their workstation, removing node_modules, reinstalling from clean versions, or pinning to versions published before the malicious releases, and rotating any tokens or secrets that were present (GitHub PATs, npm tokens, SSH keys, cloud credentials).
Enabling strong MFA on GitHub/npm and watching for unexpected new repos or workflow files in the developer’s personal account is also critical, she added.
Shai-Hulud’s second wave isn’t surprising, said Brad LaPorte, cybersecurity advisor at Morphisec. “The first attack showed how easily preinstall scripts could be weaponized. This wave proves what happens when those warnings aren’t acted on: larger scale, more destructive payloads, and automation that infects thousands of repositories in hours.”
Recommendations for repositories
To stop attackers from easily uploading malicious packages, npm needs to tighten its publishing process, LaPorte said, including ensuring the identity of new accounts is verified before they are allowed to publish packages; implementing rate limits to prevent attackers from uploading multiple malicious packages in a short period and monitoring package maintainers for suspicious activity such as sudden spikes in publishing or packages with significant unexplained changes.
“From what we’ve observed in the Shai-Hulud incidents,” said Bar of Wiz, “the core issue isn’t npm specifically. It’s that open-source registries now function as high-impact distribution hubs, and attackers are taking advantage of how much trust developers place in them. In that context, the most important thing is continuing to strengthen the guardrails around how packages are published and updated. That includes making it harder for compromised maintainer accounts to push malicious versions, increasing visibility into unusual publishing behavior, and helping downstream users quickly understand when a package version may be unsafe. Those are ecosystem-level defenses rather than criticisms of any single registry, but they reflect the direction the entire open-source community will need to move as attacks like Shai-Hulud become more automated and far-reaching.”
Ensar Seker, CISO at SOCRadar, cautioned that Shai‑Hulud isn’t what he called typical package compromise. “It’s a worm embedded into the dev supply chain. It signals that attackers are shifting from targeting compiled binaries and runtime environments toward the very processes developers use to build and ship software. No organization should assume, ‘We don’t use npm, so we’re safe’, because even downstream dependencies or dev toolchains can become the launch pad.”
So far npm has focused on ensuring that package authors are properly authenticated and that packages are not altered after being published, said Ullrich of the SANS Institute. But, he added, this does not prevent a malicious actor from publishing malicious packages. Recently, npm further restricted the default access token lifetimes and started to revoke legacy ‘classic tokens,’ he acknowledged. “npm may need to implement some form of automated scanning for obvious malicious content, but it will be difficult to implement a meaningful solution.”
Recommendations for security teams, developers
Wiz says security teams in organizations with application developer teams that use npm – and individual developers using npm and GitHub – should:
- clear each developer’s npm cache;
- pin dependencies to known clean versions or roll back to pre-November 21 builds;
- revoke and regenerate npm tokens, GitHub PATs, SSH keys and cloud provider credentials;
- enforce phishing-resistant multifactor authentication for developer and CI/CD (continuous integration/continuous delivery) accounts.
Within GitHub and CI/CD environments, they should search for newly-created repositories with ‘Shai-Hulud’ in the description, review unauthorized workflows or suspicious commits referencing hulud, and monitor for new npm publishes under their organization.
For long-term protection CSOs and IT leaders are urged to restrict or disable lifecycle scripts (postinstall, preinstall) in CI/CD environments, limit outbound network access from built systems to trusted domains only, and make sure developers use only short-lived, scoped automation login tokens.
Angular v21 debuts Signal Forms, stabilizes MCP server 24 Nov 2025, 11:38 pm
Google’s Angular team has released Angular v21, the latest version of Google’s TypeScript-based web framework. This update introduces experimental Signal Forms, a reactive forms experience built on the Signals application state tracker, and launches a developer preview of Aria, a modern library for common user interface patterns that emphasizes accessibility. Angular v21 also brings Angular’s local Model Context Protocol (MCP) server to stable.
Angular v21 was unveiled November 20. Instructions for installing Angular can be found at angular.dev.
The Angular CLI MCP server, launched with the Angular v20.2 point release on August 20, has reached stable status in Angular v21. The MCP server has tools to give AI agents context about modern Angular and the developer’s application, and promises to help users become better developers, according to the Angular team. The MCP server can be used to gain general context, find up-to-date information, update an application, and teach Angular, the team said. The MCP server follows the May 30 Angular v20 release that supported AI development through the introduction of an llms.tx file for large language models (LLMs).
Signal Forms, an experimental library for managing forms state with signals, provides a new scalable, composable, and reactive forms experience built on Signals. With Signal Forms, the Form model is defined by a signal that automatically syncs with the form fields bound to it. This allows for an ergonomic developer experience with full type safety for accessing form fields. Centralized schema-based validation logic is built-in. The Angular team added that, with signals driving modern Angular state management, zone.js no longer is needed for change detection. Zoneless change detection, introduced experimentally in Angular v18, progressed through Developer Preview in Angular v20 and reached stability in Angular v20.2. Angular traditionally used zone.js to track changes in applications, but zone.js has performance drawbacks, the Angular team said. Zoneless change detection offers benefits including better core web vitals, native async-await, ecosystem compatibility, reduced bundle size, easier debugging, and better control, the team said.
Also featured in Angular v21 is a developer preview of Angular Aria, a library for common UI patterns. The library is a collection of headless, accessible directives that implement common WAI-ARIA patterns. These directives handle keyboard interactions, ARIA attributes, focus management, and screen reader support, according to the Angular team. Developers just have to provide the HTML structure, business logic, and CSS styling. To start, developers have access to eight UI patterns encompassing 13 components that are completely un-styled and can be customized. The patterns include Accordion, Combobox, Grid, Listbox, Menu, Tabs, Toolbar, and Tree. Elsewhere in Angular v21, the Vitest testing framework has been made the default test runner and has been promoted to stable status.
AWS open-sources Agent SOPs to simplify AI agent development 24 Nov 2025, 12:23 pm
AWS is open-sourcing a new markdown format called Agent SOPs, designed to make it easier to build AI agents and learn from the shortcomings of its earlier model-driven agent development approach.
Hyperscalers and other vendors, including AWS, have been promoting LLM-driven agent development as a faster way for enterprises to scale agents in production workloads, as it uses an LLM’s reasoning to generate a workflow for agents to follow, in contrast to a developer having to write hundreds of lines of custom code to define a workflow.
Earlier this year, AWS open-sourced an SDK called Strands Agents that it used internally to build agents using LLMs.
However, AWS claims that during its internal use of Strands Agents, its developers encountered issues while deploying agents built with the SDK.
The SDK’s reliance on model-driven reasoning, according to AWS, often produced unpredictable outcomes once agents hit production workloads, leading to inconsistent results, misinterpreted instructions, and high-maintenance prompt engineering — all of which were impediments to adoption at scale.
To bypass these challenges and yet avoid writing lines of custom code, AWS came out with Agent SOPs or Standard Operating Procedures, which are standardized natural language instructions combined with RFC 2119 keywords, such as “MUST”, “SHOULD”, and “MAY”, that offer a way for developers to guide agents to generate a desired workflow.
Essentially, the instructions, parameters, and keywords in SOPs create a structure that acts as a scaffold around which the agent thinks, ensuring that it generates the desired workflow.
During internal use, AWS stated that its teams successfully utilized the SOPs to perform tasks ranging from code reviews and documentation generation to incident response and system monitoring, without needing to write complex custom code to generate workflows.
Building on that success, the hyperscaler has released the code and repositories for Agent SOPs on GitHub, allowing other developers to adopt the same patterns for their own use cases.
Adoption of SOPs, according to AWS, becomes easier as the markdown format works across LLMs, vibe coding platforms, and other agentic frameworks.
“Agent frameworks like Strands can embed SOPs as system prompts, development tools like Kiro and Cursor can use them for structured workflows, and AI models like Claude and GPT-4 can execute them directly,” AWS executives wrote in a blog post.
Further, the hyperscaler said that SOPs can be chained together to execute complex, multi-phase workflows.
7 ways AI is changing software testing 24 Nov 2025, 9:00 am
The integration of artificial intelligence in software testing isn’t just changing the workflow for testers, it’s reshaping how developers approach testing throughout the development life cycle. While much of the discussion around AI focuses on code generation, an equally powerful force is emerging in testing workflows, where AI is solving real bottlenecks that have plagued development teams for years.
That said, the reality is a bit messier than what you’ve likely read. Today’s tools work best when you treat them as starting points, rather than complete solutions. They may generate test cases that miss critical edge cases, struggle with complex code bases, and ignore existing patterns in your system. At this time, they demand careful human oversight to catch mistakes.
What does this look like in practice? Here are seven ways these tools are changing day-to-day testing workflows, along with the reality of what’s working, what isn’t, and where you’re likely to see the biggest impact on your own development process.
Test case generation from code changes
One of the most immediate applications of AI in testing is the generation of automated test cases. Tools can now analyze commit messages alongside the actual code changes to derive comprehensive test cases. Instead of writing “test the login functionality” after implementing OAuth integration, automated analysis of your code diff can generate specific scenarios: testing with valid tokens, expired tokens, malformed requests, and other edge cases you might not have considered.
This eliminates the friction between implementing a feature and defining how to test it. Previously, developers either wrote their own test cases — adding to their workload — or handed off incomplete testing specifications to QA teams. Now the test cases emerge directly from the implementation, maintaining consistency between what was built and what gets tested.
For many teams, this is also the best place to start. Feeding your existing code base to an AI model can quickly surface essential workflows and problematic input scenarios, even if not every suggestion is perfect. The key is to treat AI as a collaborative partner: review its output, refine the requests, and build iteratively on its suggestions rather than expecting complete solutions up front.
Visual testing through screenshots
Perhaps more significantly, new visual analysis capabilities in large language models (LLMs) are opening entirely new testing approaches. You can now take screenshots of your running application and use them for automated assessment. This means programmatic evaluation of UI layouts, color consistency, button placement, and interaction patterns — tasks that previously required manual review.
For full-stack developers, this represents a major shift. Back-end developers who occasionally touch front-end code now can get meaningful feedback on UI implementation without relying on design reviews. AI can flag when buttons are misaligned, when color schemes are inconsistent, or when the layout doesn’t match expected patterns, all at the speed of automated testing rather than human review cycles.
Eliminating manual test script writing
For teams that require developers to write Selenium, Cypress, or Playwright automation scripts alongside their features, AI is removing this secondary coding burden entirely. Instead of maintaining two code bases — your actual feature and the automation code to test it — you can describe the test scenario and let AI handle the automation implementation.
This is particularly valuable for developers who find themselves responsible for both feature development and test automation. Rather than context-switching between product code and test scripts, you can focus on the core implementation while AI handles the mechanical work of translating test cases into executable automation. Of course, developers need to validate the correctness of these generated test scripts, but there is a huge time savings from not authoring the implementation.
Accelerating the planning/thinking phase
In addition to accelerating the code-writing process, AI is helping to compress the thinking phase that precedes coding. Previously, developers might spend an hour analyzing a feature request, understanding component relationships, and planning the implementation before writing any code. AI can shorten this planning phase dramatically.
For complex changes, like adding event-based triggers to an existing time-based scheduling system, you can feed your entire code base context to an AI model and get assistance with impact analysis. The AI can identify which files need changes, suggest where new fields should be added, and flag potential conflicts with existing functionality. In some cases, what once took an hour of analysis can now be reduced to 10 minutes.
However, this capability does require breaking problems into manageable chunks. AI still struggles with deduplication and holistic system understanding, so the most effective approach involves iterative refinement: first getting help with the overall plan, then diving into specific implementation details, rather than asking for a complete solution up front. That “hour-to-10-minute” acceleration is something only maybe the top 1% of dev teams are achieving today. For most developers, the gains are still more modest.
Over time, however, more developers and teams will improve their ability to use AI during the thinking and planning phases.
Improved developer communication
AI’s content generation capabilities are reshaping how developers communicate about their work. Pull request descriptions, code review comments, and release notes can be generated automatically by analyzing code changes and commit messages.
This addresses a common developer pain point: translating technical implementations into clear explanations for different audiences. AI can take the same code change and generate a technical summary for engineering review, a feature description for product management, and user-facing release notes, each tailored to the appropriate audience.
For developers who struggle with communication or documentation, this opens up new opportunities to grow their skills. You can produce professional, comprehensive descriptions of your work without spending substantial time on writing and formatting.
Testing as a feedback mechanism
Beyond verification, testing serves as a critical feedback loop during development. When you test your changes locally, you often discover not just bugs but opportunities for improvement — edge cases you hadn’t considered, user experience issues, or integration points that need refinement.
AI can accelerate this feedback cycle by automatically running through test scenarios and providing qualitative assessments. Rather than manually clicking through workflows, you can get AI-generated insights about potential issues, suggested test cases you haven’t covered, and questions about your implementation approach.
Data transformation for testing
AI also excels at converting unstructured or semi-structured data into usable test inputs. If you capture API calls during a web session, AI can transform that pseudo-structured data into clean JSON for your test harness. Similarly, scraped web content can be converted into structured test data, and existing test data sets can be modified programmatically, turning positive numbers negative, generating variations on existing scenarios, or expanding test coverage without manual data manipulation.
The operational takeaway
AI is reshaping software testing in distinct ways — from generating test cases and transforming test data to accelerating planning and improving communication. Together, these shifts reduce friction across the development life cycle, allowing teams to move faster without compromising quality.
Of course, the technology isn’t without constraints. AI models can struggle with large, complex requests and often create new solutions rather than reusing existing code. The most effective approach involves breaking large problems into smaller, focused tasks and maintaining human oversight throughout the process.
The most significant change isn’t technological — it’s operational. By embracing these technologies thoughtfully, teams can streamline testing workflows while developers expand their role beyond coding into strategy, quality assessment, and cross-functional communication. Those are the skills that will matter most as AI takes on more of the repetitive mechanics of testing and coding.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Anatomy of an AI agent knowledge base 24 Nov 2025, 9:00 am
AI agent fervor has permeated the software development world. But, we’re no longer talking about a singular, all-knowing AI. Rather, emerging agentic workflows rely on multiple specialized agents working together.
So-called “agentic AI” has a strong business case, but it raises a big unanswered question: How should agents talk to each other, retain memory, and share knowledge?
That’s where a shared agentic knowledge base comes in. A knowledge base for AI agents is like a meta system prompt that all agents can access. “Think of it as a way to fine-tune the agent,” says Christian Posta, global field CTO at Solo.io, a provider of cloud operations software.
As agents multiply and interconnected behaviors grow more complex, a shared knowledge base (or knowledge graph) keeps them aligned.
“An internal knowledge base is essential for coordinating multiple AI agents,” says James Urquhart, field CTO and technology evangelist at Kamiwaza AI, maker of a distributed AI orchestration platform. “When agents specialize in different roles, they must share context, memory, and observations to act effectively as a collective.”
Designed well, a knowledge base ensures agents have access to up-to-date and comprehensive organizational knowledge. Ultimately, this improves the consistency, accuracy, responsiveness, and governance of agentic responses and actions.
The benefits are clear. But what actually goes into such a repository? Below, we’ll look at the core content within an AI agent knowledge base, explore implementation approaches and retrieval methods, and consider the bottlenecks.
What an agentic knowledge base contains
A knowledge base for AI agents can hold many things: documentation, policies, style guides, sample code, workflows, compliance rules, and more. “A knowledge base for AI agents contains the full spectrum of a company’s operational reality,” says Igor Beninca, data science manager at Indicium, a data and AI services firm.
Because enterprise data varies widely, a knowledge base will combine structured, semi-structured, and unstructured data. It should span everything from static rules to dynamic chat conversations. Really, any data that could be vectorized for training AI is fair game. That said, some common content types shine through for AI agent use cases.
Procedures and policies
Most knowledge bases include procedures and policies for agents to follow, such as style guides, coding conventions, and compliance rules. They might also document escalation paths, defining how to respond to user inquiries.
“The content mirrors what you’d find in a senior employee’s mental toolkit, but structured for machine consumption,” says AJ Sunder, chief information and chief product officer at Responsive, a provider of AI-powered response management software.
Structured data
Structured data, often formatted in JSON, YAML, or CSV, includes databases, sample code, API documentation, schemas, and service-level agreements. A specific example is a machine-readable product table that lists prices, packages, or configurations.
“A good knowledge base would look a bit like Wikipedia—a structured data catalog that is easily searchable,” says Ankit Jain, CEO and co-founder of Aviator, a provider of developer workflow automation tools.
Semi-structured data includes internal wikis, workflow guides, and detailed runbooks. Another tactic is to capture data relationships using custom field mappings, which are schemas that specify how internal data is mapped to external fields, so agents can interpret these relationships.
Unstructured data
Next up is unstructured data. This includes text and media such as images, audio, PDFs, or video. Meeting notes, recordings, and diagrams that visualize decision-making are common examples. Text-based cues, or broadly defined relationships between concepts, can also supply helpful directions.
“Successful knowledge bases include ‘negative examples,’ what not to say or do, and contextual decision trees that help agents navigate edge cases,” says Responsive’s Sunder.
Memory and relationships
Lastly, persistent memory helps agents retain context across sessions. Access to past prompts, customer interactions, or support tickets helps continuity and improves decision-making, because it enables agents to recognize patterns. But importantly, most experts agree you should make explicit connections between data, instead of just storing raw data chunks.
Sunder cites service-level agreements (SLAs) as an example. Instead of stating “Our SLA is 24 hours,” a richer model would specify, “Our SLA applies to enterprise customers, except during maintenance windows, unless escalated by account managers.”
Implementing the knowledge base
At the core of an agentic knowledge base are two main components: an object store and a vector database for embeddings. Whereas a vector database is essential for semantic search, an object store checks multiple boxes for AI workloads: massive scalability without performance bottlenecks, rich metadata for each object, and immutability for auditability and compliance.
Beyond these fundamentals, organizations don’t necessarily need to buy new SaaS applications or infrastructure. The better option is to extend what you already have. “The pragmatic approach is to build a layer on top of existing systems, with the right connectors to make data accessible to agents,” says Rotem Weiss, founder and CEO of Tavily, maker of a real-time search engine for large language models (LLMs).
Still, unifying multiple data sources may require an abstraction layer. “The most effective strategy is to create an abstraction layer that exposes data from various sources to agents via APIs,” says Indicium’s Beninca. “This allows businesses to leverage existing knowledge management systems like Confluence, tap into data warehouses for real-time structured information, and integrate vector databases for semantic search.”
Others agree that knowledge bases don’t need to be built from scratch, but maintenance challenges remain. “Most of the existing knowledge bases can be retrofitted to support AI agents,” says Aviator’s Jain. He adds that maintaining a knowledge base is a lot harder than creating one. To solve this, agents themselves should capture new information and keep the knowledge base up-to-date.
Given the technical nuances, experts suggest starting small and expanding on early successes. “Try to focus on measured proof-of-concept projects where unique organizational knowledge and data can be curated and surfaced to agents via tools,” says Greg Jennings, VP of engineering, AI, at Anaconda, the provider of a platform that helps organizations build secure AI with open source.
Connecting to the knowledge base
Now comes actually connecting to the data, which is more complex than you might think, given that there are many schools of thought for data retrieval in AI.
The consensus is that agent knowledge bases benefit from a multi-modal retrieval strategy: vector search finds semantically similar concepts, graph traversal identifies relationships between data, and keyword search pinpoints exact matches.
“AI agents generally connect to knowledge bases through APIs or retrieval-augmented generation (RAG) pipelines,” says Neeraj Abhyankar, VP, data and AI at R Systems, a software engineering consultancy. He adds that Model Context Protocol (MCP) will likely play a role as the leading standard for how agents access tools and data.
Others agree that MCP changes the game by standardizing agentic connections. “Instead of building custom integrations for each knowledge source, agents can plug into any MCP-compatible system,” says Sunder, noting this could even allow agents to communicate across organizational boundaries.
Beyond these methods, Solo.io’s Posta suggests a concept he calls “RAG on the wire,” in which LLM calls are intercepted by an agent gateway that performs a RAG-style look-up. “This way, the guidelines or conventions are enforced regardless of who’s calling.”
Additional retrieval techniques are emerging, including hierarchical search, which narrows broad queries into precise ones, and GraphRAG, which represents knowledge as a graph. ”In my opinion, agents will make GraphRAG more popular,” says Keith Pijanowski, AI solutions engineer at MinIO, provider of an open-source object storage server.
“GraphRAG provides agents with ‘multi-node’ knowledge, showing how knowledge is related to other knowledge,” says Pijanowski. “This more accurately represents the real world, enabling agents to perform more complex reasoning and actions. Standard RAG relies on a flat document structure.”
No ‘one-size-fits-all’
Some best practices for AI agent knowledge bases are materializing across industries. These are mainly around technical execution: version control, retrieval strategies, memory of past chats, access controls, prompt chaining, embedding, and data-refresh processes.
Still, while infrastructure and design patterns may be transferable, each knowledge base will inevitably reflect an organization’s custom domain logic and workflows. As Indicium’s Beninca says, “customization is not an optional extra—it is a fundamental requirement for achieving a positive return on investment.”
Responsive’s Sunder agrees that knowledge bases are not one-size-fits-all. “The infrastructure patterns are emerging, but the ontologies remain highly specialized,” Sunder says. “I am not seeing convergence yet. Every industry has its own conceptual vocabulary and regulatory requirements.”
The data and intended use cases will be highly industry-dependent. “Vertical customization is non-negotiable,” says R Systems’s Abhyankar, who notes that healthcare will need HIPAA-aware schemas, while agents in retail may prioritize inventory logic.
Each organization’s data moat, and in turn its knowledge base, will mirror its unique business logic.
“Everyone’s using similar vector databases, embedding models, and search technologies,” says Aviator’s Jain. “However, the knowledge schemas, validation rules, and business logic remain highly customized. The ‘how’ is becoming standardized while the ‘what’ stays wildly different.”
Keep the knowledge fresh
According to Microsoft’s 2025 Work Trend Index, 46% of business leaders say their companies are already using agents to automate workflows or processes, with a growing share exploring multi-agent systems as well. As consultancies like Deloitte double down on multi-agent approaches, the momentum is expected to continue.
Software engineering offers a clear example of how agents accelerate existing processes. Over 90% of developers now use AI coding tools, saving an average of 3.6 hours per week, according to DX’s AI-Assisted Engineering Q4 Impact Report, which analyzed data from nearly 60,000 developers. Yet despite faster throughput, code quality remains inconsistent, underscoring the need for stronger baselines and shared context among AI agents.
The same need for shared understanding is true for agents assisting end users in other contexts. But the key here is ongoing maintenance, because “shared understanding” could become “shared misconception” really fast.
Since organizational knowledge is always evolving, updating the system to keep data fresh, without duplicating knowledge or breaking agentic behavior, will be the major hurdle. As Aviator’s Jain says, ”the biggest challenge is maintaining the knowledge base, more specifically, maintaining the quality and freshness of the data.”
Sunder agrees. “Freshness, or lack thereof, is the silent killer of AI knowledge systems.”
Software development has a ‘996’ problem 24 Nov 2025, 9:00 am
There is a persistent, dangerous myth in software development that output equals outcome. It’s the idea that if we just throw more hours or more lines of code at the problem, we will inevitably win.
Gergely Orosz of The Pragmatic Engineer fame recently dismantled this myth with surgical precision. Orosz made a damning observation regarding the “996” work culture, the schedule of working 9 a.m. to 9 p.m., six days a week, popularized by Chinese tech giants: “I struggle to name a single 996 company that produces something worth paying attention to that is not a copy or rehash of a nicer product launched elsewhere.” The schedule and pace aren’t merely inhumane. They’re counterproductive.
Brute force gives you volume but rarely gives you differentiation and (perhaps) never gives you innovation.
Now, before we start tsk-tsking China for such 996 practices, we’d do well to examine our own. Founders sell it as being “hardcore,” “all in,” or “grind culture,” but it’s the same idea: Crush people with hours and hope something brilliant falls out the other side. And now we’re trying to instantiate this idea in code or, rather, GPUs. Some assume that if we can just get large language models (LLMs) to work the equivalent of thousand-hour weeks, generating code at superhuman speeds, we’ll magically get better software.
We won’t. We will just get more of what we already have: derivative, bloated, and increasingly unmanageable code.
The high cost of code churn
I’ve been sounding this alarm for a while. Recently I wrote about how the internet is being choked by low-value, high-volume content because we’ve made it frictionless to produce. The same is happening to our software.
We have data to back this up. As I noted when covering GitClear’s 2024 analysis of 153 million lines of code, “code churn,” or lines that are changed or thrown away within two weeks, is spiking. The research showed more copy-pasted code and significantly less refactoring.
In other words, AI is helping us code faster (up to 55% faster, according to GitHub’s analysis), but it isn’t helping us build better. We are generating more code, understanding it less, and fixing it more often. The real risk of AI isn’t that it writes code, but that it encourages us to write too much code. Bloated codebases are harder to secure, harder to reason about, and much harder for humans to own. Less code is better.
This is the 996 trap transferred to machines. The 996 mindset assumes that the constraint on innovation is the number of hours worked. The “AI-native” mindset assumes the constraint is the number of characters typed. Both are wrong. The constraint has always been and will always be clarity of thought.
Code is a liability, not an asset
Let’s get back to first principles. As any senior engineer knows, software development is not a typing contest. It is a decision-making process. The job is less about writing code and more about figuring out what code not to write. As Honeycomb founder and CTO Charity Majors puts it, being a senior software engineer “has far more to do with your ability to understand, maintain, explain, and manage a large body of software in production over time, as well as the ability to translate business needs into technical implementation.”
Every line of code you ship is a liability. Every line must be secured, debugged, maintained, and eventually refactored. When we use AI to brute-force the “construction” phase of software, we maximize this liability. We create vast surface areas of complexity that might solve the immediate Jira ticket but mortgages the future stability of the platform.
Orosz’s point about 996 companies producing copies is telling. Innovation requires the “slack” to think without the constant interruptions of meetings. Given a quiet moment, you might realize that the feature you were about to build is actually unnecessary. If your developers are spending their days reviewing an avalanche of AI-generated pull requests, they have no slack. They are not architects; they are janitors cleaning up after a robot that never sleeps.
None of this is to suggest that AI is bad for software development. Quite the opposite is true. As Harvard professor (and longtime open source luminary) Karim Lakhani stressed, “AI won’t replace humans,” but we increasingly will see that “humans with AI will replace humans without AI.” AI is an effective tool, but only if we use it as a tool and not as a club to replicate the false promise of the 996 culture.
The human part of the stack
So, how do we avoid building a 996 culture on silicon? We need to stop treating AI as a “developer replacement” and start treating it as a tool to buy back the one thing 996 culture destroys: time.
If AI can handle the drudgery—the unit tests, the boilerplate, the documentation updates—that should not be an excuse to jam more features into the sprint. It should be an opportunity to slow down and focus on the “human” parts of the stack, such as:
- Framing the problem. “What are we actually trying to do?” sounds simple, but it is where most software projects fail. Choosing the right problem is a high-context, high-empathy task. An LLM can give you five ways to build a widget; it cannot tell you if the widget is the wrong solution for the customer’s workflow.
- Editing ruthlessly. If AI makes writing code nearly free, the most valuable skill becomes deleting it. Humans must own the “no.” We need to reward developers not for the velocity of their commits, but for the simplicity of their designs. We need to celebrate the “negative code” commits: the ones that remove complexity rather than adding to it.
- Owning the blast radius. When things break (and they will!) it’s your name on the incident report, not the LLM’s. Understanding the system deep enough to debug it during an outage is a skill that degrades if you never write the code yourself. We need to ensure that “AI-assisted” doesn’t become “human-detached.” I’ve stressed the importance of ensuring that junior developers don’t default to whatever an LLM gives them. We need to ensure adequate training so that engineers of all skill levels can effectively use AI.
The rebellion against robot drivel is not about Luddism. It is about quality.
Orosz’s critique of 996 is that it produces exhausted people and forgettable products. If we aren’t careful, our adoption of AI will produce the exact same thing: exhausted humans maintaining a mountain of forgettable, brittle code generated by machines.
We don’t need more code. We need better code. And better code comes from human minds that have the quiet, uncluttered space to invent it. Let the AI handle the brute force, freeing up people to innovate.
It’s the end of vibe coding, already 21 Nov 2025, 9:00 am
In the early days of generative AI, AI-driven programming seemed to promise endless possibility, or at least a free pass to vibe code your way into quick wins. But now that era of freewheeling experimentation is coming to an end. As AI works its way deeper into the enterprise, a more mature architecture is taking shape. Risk-aware engineering, golden paths, and AI governance frameworks are quickly becoming the new requirements for AI adoption. This month is all about the emerging disciplines that make AI predictable, responsible, and ready to scale.
Top picks for generative AI readers on InfoWorld
What is vibe coding? AI writes the code so developers can think big
Curious about the vibe shift in programming? Hear from developers who’ve been letting AI tools write their code for them, with sometimes great and sometimes disastrous results.
The hidden skills behind the AI engineer
Vibe coding only gets you so far. As AI systems scale, the real work shifts to evaluation loops, model swaps, and risk-aware architecture. The role of AI engineer has evolved into a discipline built on testing, adaptability, and de-risking—not just clever AI prompts.
Building a golden path to AI
Your team members may not be straight-up vibe coding, but they’re almost certainly using AI tools that management hasn’t signed off on, which is like shadow IT on steroids. The best way to fight it isn’t outright bans, but guardrails that nudge developers in the right direction.
Boring governance is the path to real AI adoption
Big companies in heavily regulated industries like banking need internal AI governance policies before they’ll go all-in on the technology. Getting there quick enough to stay ahead of the curve is the trick.
How to start developing a balanced AI governance strategy
They say the best defense is a good offense, and when it comes to AI governance, organizations need both. Get expert tips for building your AI governance strategy from the ground up.
More good reads and generative AI updates elsewhere
Why AI breaks bad
One of the biggest barriers to corporate AI adoption is that the tools aren’t deterministic—it’s impossible to predict exactly what they’ll do, and sometimes they go inexplicably wrong. A branch of AI research called mechanistic interpretability aims to change that, making digital minds more transparent.
MCP doesn’t move data. It moves trust
The Model Context Protocol extends AI tools’ ability to access real-world data and functionality. The good news is that it acts as a trust layer, allowing LLMs to make those tool calls safely without needing to see credentials, touch systems, or improvise network behavior.
Anthropic says Chinese hackers used its AI in online attack
While details are scarce, Anthropic claims that Chinese hackers made extensive use of its Claude Code tool in a coordinated cyberattack program. The company says it’s working to develop classifiers that will flag such malicious activity.
PHP 8.5 enables secure URI and URL parsing 21 Nov 2025, 1:23 am
PHP 8.5 has been released, adding an extension for securely parsing URIs and URLs to the now-30-year-old server-side scripting language.
Described as a major update, PHP 8.5 was released November 20 and can be accessed at PHP.net. The URI extension featured in the update is always available and provides APIs to securely parse and modify URIs and URLs based on the RFC 3986 and WHATWG (Web Hypertext Application Technology Working Group) URL standards. PHP 8.5 also features a pipe operator that allows chaining function calls together without dealing with intermediary variables. The pipe operator enables replacing many “nested calls” with a chain that can be read forward, instead of inside-out. Additionally in version 8.5, developers can update properties during object cloning by passing an associative array to the clone() function. This enables straightforward support of the “with-er” pattern for read-only classes.
Developers using PHP 8.5 can also take advantage of a #[\NoDiscard] attribute. By adding this attribute to a function, PHP will check whether the returned value is consumed and emit a warning if it is not. The associated (void) cast can be used to indicate that a value is intentionally unused. Additionally in PHP 8.5, static closures and first-class callables now can be used in constant expressions. This includes attribute parameters, default values of properties and parameters, and constants, according to the update documents.
PHP 8.5. also features persistent cURL handles. Unlike curl_share_init(), handles built by curl_share_init_persistent() will not be destroyed at the end of a PHP request. If a persistent share handle with the same set of share options is found, it will be reused, avoiding the cost of initializing cURL handles every time. The array_first() and array_last() functions in PHP 8.5, meanwhile, return the first or last value of an array, respectively. If the array is empty, null is returned, making it easy to compose with the ?? operator. Additional features in PHP 8.5 include the following:
- A
#[\DelayedTargetValidation]attribute can suppress compile-time errors from core and extension attributes used on invalid targets. - Fatal errors, such as an exceeded maximum execution time, now include a backtrace.
- Attributes now can target constants.
- The #[\Override] attribute can be applied to properties.
- The #[\Deprecated] attribute can be used on constants and traits.
- Static properties support asymmetric visibility.
- Properties can be marked as
finalvia constructor property promotion. - The backtick operator as an alias for shell_exec() has been deprecated.
F# 10 features scoped warning suppression 20 Nov 2025, 10:53 pm
The newest version of Microsoft’s multi-paradigm language features a much-sought ability to suppress warnings in specified code sections.
With the scoped warning suppression capability, the compiler now supports the #warnon directive, which is paired with #nowarn to disable or enable warnings within a specific code span. The F# 10 update was introduced along with .NET 10 on November 11. Developers can get F# 10 by downloading .NET 10 or by accessing Visual Studio 2026 Insiders. A November 17 blog post introducing F# 10 notes that some changes to improve the consistency of #nowarn/#warnon directives were breaking changes, which could affect a codebase when updating to the new version.
F# 10 also allows developers to apply distinct access modifiers to individual property accessors. This capability allows developers to specify access levels for the getter and setter of a property inline, enabling common patterns such as publicly readable but privately mutable state without verbose boilerplate. Another new capability in F# 10 enables optional parameters to use a struct-based ValueOption representation. By applying the [attribute to an optional parameter, developers can instruct the compiler to use ValueOption instead of the reference-based option type. This avoids a heap allocation for the option wrapper, which is beneficial in performance-critical code. Other improvements available in F# 10 include the following:
- Computation-expression builders now can opt into tail-call optimizations.
- A long-standing inconsistency in type annotation syntax for computation expression bindings has been resolved. Developers can now add type annotations on
let!,use!, andand!bindings sans wrapping the identifier in parentheses. - The discard pattern (
_) now works in use! bindings within computation expressions. F# 10 allows using_directly when binding asynchronous resources whose values are only needed for lifetime management. There is no need to provide a named identifier. - Structural validation has been tightened to reject misleading module placement within types. F# 10 now raises an error when a module declaration appears indented at the same structural level inside a type definition, thus preventing a common source of confusion about module scoping.
- The compiler memoizes the results of type relationship checks, reducing redundant computations and improving compiler and tooling performance.
- In the FSharp.Core library, support has been added for
and!in thetaskcomputation expression. Usingtaskis a popular way to work with asynchronous workflows in F#, particularly when interoperability with C# is required. - Three features—graph-based type checking, parallel IL code generation, and parallel optimization—are grouped together under the
ParallelCompilationprojectproperty. - When publishing with trimming enabled (
PublishTrimmed=true), the F# build now auto‑generates a substitutions file that targets the tooling‑only F# resources. This results in smaller output by default, less boilerplate, and one less maintenance hazard.
How pairing SAST with AI dramatically reduces false positives in code security 20 Nov 2025, 6:15 pm
The promise of static application security testing (SAST) has always been the “shift-left” dream, catching vulnerabilities before they ever hit production. But for too long, that promise has been undermined by a frustrating reality with an overwhelming volume of alerts and high false-positive rates. This noise can lead to alert fatigue, wasted developer time and a loss of trust in the very tools designed to protect our codebase.
Meanwhile, large language models (LLMs) have emerged as powerful code analysis tools, capable of pattern recognition and code generation. Yet, they suffer from their own weaknesses, slow processing, inconsistency, and the potential for hallucination.
In our opinion, the path to next-generation code security is not choosing one over the other, but integrating their strengths. So, along with Kiarash Ahi, founder, Virelya Intelligence Research Labs and the co-author of the framework, I decided to do exactly that. Our novel hybrid framework combines the deterministic rigor and the speed of traditional SAST with the contextual reasoning of a fine-tuned LLM to deliver a system that doesn’t just find vulnerabilities, but also validates them. The results we achieved were stark: A 91% reduction in false positives compared to standalone SAST tools, transforming security from a reactive burden into an integrated and more efficient process.
The core problem: Context vs. rules
Traditional SAST tools, as we know, are rule-bound; they inspect code, bytecode, or binaries for patterns that match known security flaws. While effective, they often fail when it comes to contextual understanding, missing vulnerabilities in complex logical flaws, multi-file dependencies, or hard-to-track code paths. This gap is why their precision rates and the percentage of true vulnerabilities among all reported findings remain low. In our empirical study, the widely used SAST tool, Semgrep, reported a precision of just 35.7%.
Our LLM-SAST mashup is designed to bridge this gap. LLMs, pre-trained on massive code datasets, possess pattern recognition capabilities for code behavior and a knowledge of dependencies that deterministic rules lack. This allows them to reason about the code’s behavior in the context of the surrounding code, relevant files, and the entire code base.
A two-stage pipeline for intelligent triage
Our framework operates as a two-stage pipeline, leveraging a SAST core (in our case, Semgrep) to identify potential risks and then feeding that information into an LLM-powered layer for intelligent analysis and validation.
- Stage 1, initial SAST findings: The Semgrep SAST engine runs and identifies all potential security risks. For each flag, it extracts the intermediate representations, such as the data flow path from source to sink.
- Stage 2, LLM-powered intelligent triage: This is the critical step for filtering noise. The framework embeds the relevant code snippet, the data flow path and surrounding contextual information into a structured JSON prompt for a fine-tuned LLM. We fine-tuned Llama 3 8B on a high-quality dataset of vetted false positives and true vulnerabilities, specifically covering major flaw categories like those in the OWASP Top 10 to form the core of the intelligent triage layer. Based on the relevant security issue flagged, the prompt then asks a clear, focused question, such as, “Does this user input lead to an exploitable SQL injection?”
By analyzing the context that traditional SAST rules miss, the LLM can reliably determine if a finding is truly exploitable, acting as an intelligent triage layer. This is the key mechanism that allows the framework to convert a mountain of alerts into a handful of verified, actionable findings.
The metrics, from noise to actionable intelligence
The following empirical results validate our hybrid approach. Our test dataset had 25 diverse open source projects based on their active development and language diversity (Python, Java, JavaScript), with 170 vulnerabilities as ground truth, sourced from public exploit databases and manual expert verification.
- Precision: In our implementation, we found the precision jumped to 89.5%. This is a massive leap not only over Semgrep’s baseline of 35.7%, but also over a purely LLM-based approach (GPT-4), which achieved 65.5%.
- False positive reduction: Semgrep generated a total of 225 false positives. Our framework filtered this down to just 20, representing an approximately 11x improvement in the signal-to-noise ratio.
- Time to triage: This reduction in noise translated directly to developer efficiency, reducing the average triage time for security analysts by a stunning 91%.
Furthermore, the contextual reasoning of the LLM layer enabled the discovery of complex vulnerability types that traditional scanners miss, such as multi-file dataflow bugs.
Beyond detection, validation, and remediation
The LLM’s role doesn’t stop at filtering. It transforms the final output into actionable intelligence.
- Automated exploit generation: For vulnerabilities confirmed as exploitable, our framework automatically generates a proof-of-concept (PoC) exploit. This capability is crucial for verifying existence and providing concrete evidence to developers. In our evaluation, our framework successfully generated valid PoCs for approximately 70% of exploitable findings, significantly reducing the manual verification burden on security analysts.
- Dynamic remediation suggestion: LLMs, with their ability to understand code and generate text, produce comprehensive, human-readable bug descriptions and concrete repair suggestions. This streams raw security findings directly into the developer workflow, accelerating the time to fix and minimizing the window of vulnerability.
A SAST and LLM synergy marks a necessary evolution in static code security. By integrating deterministic analysis with intelligent, context-aware reasoning, we can finally move past the false positive crisis and equip developers with a tool that provides high signal security feedback at the pace of modern development with LLMs.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
How to alienate cloud customers and undermine revenue 20 Nov 2025, 9:00 am
Big Tech has entered an arms race to dominate the artificial intelligence space, with Amazon, Google, and Microsoft investing more than $600 billion into AI development between 2023 and 2025. This massive surge in AI spending reflects their belief that AI will revolutionize the tech industry and reshape the global economy.
Executives and investors may be prioritizing AI at the expense of traditional systems that millions of people use every day. This unrelenting focus on AI risks undercutting providers’ ability to serve current customers and sustain stable revenue streams. It doesn’t require a crystal ball to foresee lower customer satisfaction, reduced revenues, and damaged reputations.
The cost of overinvesting in AI
The hyperscalers are financially strong, but AI spending leaves a noticeable dent on anyone’s balance sheet. We’re not just talking about an incremental investment—it’s a seismic shift in resource allocation. Google, Microsoft, and Amazon alone are spending astronomically to build AI infrastructure, train advanced AI models, and acquire talent or smaller AI-focused firms. They view AI as a transformative technology, but such heavy investment carries an inherent gamble.
To justify these costs, AI must deliver returns that match or exceed the opportunity cost of neglecting other technologies. However, this outcome is far from guaranteed. Many AI innovations are years from reaching full potential, especially at enterprise scale. Developing and deploying AI requires time and organizational change while cautious businesses weigh adopting unproven tech.
If AI fails to deliver meaningful financial returns soon, the money spent on it could drag down profitability. The obvious path to recover the lost revenue would be to raise prices elsewhere or cut budgets for essential services, both of which would reduce customer satisfaction.
Ignoring today’s tech and its users
Both AWS and Microsoft Azure have redirected major parts of their R&D and leadership focus toward AI, prioritizing emerging technologies like machine learning and artificial intelligence. This shift risks neglecting the optimization of existing tools such as traditional cloud computing services, enterprise software, cybersecurity solutions, and infrastructure products that millions of businesses rely on every day. This could lead to lower performance, weaker security, and fewer customer service enhancements as companies pursue AI breakthroughs.
The risk here is obvious: Current customers who generate stable, predictable revenues might feel overlooked. Clients could start looking elsewhere if essential services decline or stagnate because resources were devoted to AI development. This isn’t hypothetical; businesses rely on reliable, well-supported tools to achieve their operational and financial goals. Any perception that the big providers are favoring moonshot AI projects over maintaining and improving core technologies will hurt customer relationships and weaken trust.
AI is not an immediate payoff
One of the biggest misconceptions driving this AI gold rush is that revolutionary outcomes are just around the corner. The tech industry loves to pitch rapid innovation cycles, but actual enterprise AI adoption is far slower. Implementing advanced AI in highly regulated, risk-averse sectors such as healthcare, government, or finance is a process measured in years, not quarters. Companies require rigorous testing, integration with legacy systems, and buy-in across multiple layers of leadership—none of which happens overnight.
Additionally, many businesses lack the expertise or infrastructure to fully leverage advanced AI capabilities today. Enterprises that have only recently transitioned to cloud computing, for example, are unlikely to have the technical infrastructure or highly skilled personnel to support cutting-edge AI systems. This presents a paradox for vendors. Even as they develop generational innovations in AI, the enterprises paying for these services may not be positioned to adopt them at scale. If that market inertia remains in place (and there’s little reason to assume it will vanish quickly), the revenue potential for AI in the near term may fall far short of the sky-high projections.
Ironically, the aggressive drive for AI dominance also reveals weaknesses in the business models of Microsoft, Google, Amazon, and others. These companies are signaling to investors that their futures depend on technological innovation rather than their historic successes. This poses a dilemma: Any slowdown in AI deployment or poor adoption rates could undermine their leadership strategies and shake investor confidence. Moreover, shifting priorities might allow smaller competitors to target market segments that feel neglected. For instance, mid-tier cloud providers or enterprise software companies that focus solely on high-quality, reliable services could attract customers frustrated by Big Tech’s AI-driven goals.
A balanced path forward
To avoid long-term fallout, the hyperscalers must strike a better balance between innovation and stability. While AI unquestionably represents an important step forward for the industry, it shouldn’t overshadow the bread-and-butter technologies that drive current income and build customer loyalty. Enterprise customers expect more than just cutting-edge technologies—they rely on consistent, supportive partnerships to accomplish their goals. The danger in ignoring this is alienating the very businesses that generate the revenue fueling these AI ambitions.
Investing wisely involves recognizing AI’s importance but not overemphasizing it. Maintaining open communication with clients, supporting their needs, and bridging current and future technologies are crucial. Slow progress may not generate headlines, but it’s less likely to disappoint or destabilize loyal customers.
The tech industry is all about pursuing the next big thing. It’s in our DNA. However, the balance between existing cloud services (the bird in the hand) and AI (the bird in the bush) is where risk versus reward can quickly get off track. Tech leaders would be wise to remember that although new frontiers in technology are thrilling, they are meaningless without the support of the enterprises and systems that bring those possibilities to life.
Azure HorizonDB: Microsoft goes big with PostgreSQL 20 Nov 2025, 9:00 am
Enterprises need data, and data needs to be stored, with a flexible, portable environment that scales from developers’ laptops to global clouds. That storage also needs to be able to run on any OS and any cloud without breaking the bank.
There aren’t many options. You might be able to use MySQL or any of its forks for most purposes, but it struggles to support large databases that run over multiple data centers. Plus there are licensing issues with proprietary tools like Microsoft’s SQL Server, with its hyperscale Azure SQL variant only available on Microsoft’s own cloud.
PostgreSQL is everywhere
It’s not surprising that developers have become increasingly dependent on the open source PostgreSQL, the nearly 30-year-old successor to the University of California Berkeley’s Ingres (hence its name). It’s a flexible tool that works well across a wide range of platforms and, thanks to an extensible architecture, can support most workloads.
Microsoft has been supporting PostgreSQL on Azure since 2017, with its 2019 acquisition of Citus Data bringing significant experience with scaling and performance. Since then, Microsoft has begun to build out a family of PostgreSQL platform-as-a-service implementations, with a hyperscale version as part of its Cosmos DB platform and a managed flexible server for most day-to-day operations. It even supports you running your own PostgreSQL instances on Azure VMs.
Microsoft has adopted PostgreSQL as a key part of its growing data platform, and the company has been a major contributor and sponsor of the open source project. There are 19 contributors who work for Microsoft, and code is already being delivered for the next major release in 2026.
Introducing Azure HorizonDB
This week at Ignite 2025, Microsoft announced the latest member of its PostgreSQL family: Azure HorizonDB. Designed to be a scale-out, high-performance database, it’s intended to be a place for a new generation of PostgreSQL workloads, for when you need an operational database that’s fast and can scale automatically without requiring complex sharding operations.
In advance of Ignite, I spoke to Shireesh Thota, CVP Databases at Microsoft, about the new service. He described the rationale for a new PostgreSQL variant:
I think increasingly what we notice is that people either go into the bucket of, “I want to lift and shift my PostgreSQL that’s working in the community version on-premises, or maybe another cloud.” They want to move it to Azure. They want 100% Postgres. They want all extensions working. They just want something that really has the flexibility of performance and speed. Then Azure Database for PostgreSQL, the existing version is perfect. Somebody who wants to build an AI-native, cloud-native kind of a workload that may need a lot of storage, wants really fast latencies, significantly higher IOPS. Then you go to HorizonDB.
Certainly, the published performance data for Azure HorizonDB is impressive: Microsoft is claiming a three-times increase in throughput over the open source release when running transactional workloads. You can scale up to 3072 cores, with 128TB of storage and sub-millisecond commits. HorizonDB builds on Azure’s multiregion architecture with data replicated in multiple availability zones and automated maintenance and backups with minimal impact on operations. Such performance is needed for AI applications and for large-scale Kubernetes. As Thota notes, “These cloud-native workloads can really succeed on HorizonDB.”
Key to the performance boost are changes to the architecture of the database, separating compute and storage and allowing them to scale independently. If you need more compute, Horizon DB will give it to you. If you need more read replicas, it’ll provision them.
Using Azure HorizonDB for AI
On top of compatibility with most standard PostgreSQL features, Microsoft has added its own features that support modern AI applications, with fast DiskANN-based vector search as part of retrieval-augmented generation (RAG) applications and model tuning. Using DiskANN’s new advanced filtering will give you a significant performance boost over PostgreSQL’s standard vector search, and its hybrid in-memory and disk search allow you to work with the largest vector indexes without significant performance impairments. Also, using the new filtered graph traversals makes queries up to three times faster.
Microsoft’s tools help you bring AI models inside your queries, using Microsoft Foundry to conduct AI operations as part of a SQL query. Managed models let you pick and choose from a list of default models in the Azure Portal, or you can bring your own Microsoft Foundry models. This allows you to do things like generating embeddings for query results as you write them to a vector index table without leaving the database. Other options let you use AI-based semantic searches or summarize results and provide insights into customer comments. Thota describes the process as a simple one: “You keep your SQL structure and invoke our semantic operators in the right places.”
It has built-in integration with Azure’s enterprise tools, adding support for encryption, Entra ID, and private endpoints so that cloud-hosted data can only be accessed by your own systems and applications. Added security comes from support in Azure Defender for Cloud to keep sensitive data protected. “Our core cohort of our customers are enterprises, and I want to make sure that we build something for both enterprises as well as developers,” Thota says. HorizonDB will initially be available in a small number of Azure regions, with limited access to the preview release.
Managing PostgreSQL in VS Code
Outside the database, Microsoft has released a general availability version of its Visual Studio Code PostgreSQL extension. This adds database development and management tools to your development environment, connecting to on-premises and in-cloud PostgreSQL instances, including HorizonDB. It’s important to note that this is a tool for any and all PostgreSQL implementations. You’re not limited to working only in Azure; you can use it with any database that implements the PostgreSQL APIs.
Tools in the extension allow you to visualize database schema, drilling into tables and displaying joins. You can display complex data structures and zoom in to specific tables as needed. Another set of visualizations delivers a server dashboard that drills down into various metrics to help you understand how your database is running and where you can improve performance.
With Microsoft positioning Visual Studio Code as its AI development tool, both for creating AI applications and using AI, the PostgreSQL tool provides an agent for its GitHub Copilot tools. Natural language queries help you refine queries and design databases, and they use the same metrics as the server dashboard to help improve operations.
Bringing Oracle to PostgreSQL
HorizonDB’s performance improvements make it a good target for migration from existing relational databases, which can reduce licensing costs—especially for databases that have a per-core licensing model. Tools in the Visual Studio Code PostgreSQL extension help migrate Oracle schemas to Azure-hosted PostgreSQL, using AI tools to handle transformations based on best practices. To avoid problems, it allows you to validate its output in a Scratch database before you deploy the resulting database structure.
The tool works with more than just databases; it also helps you update application code to work with the new schema. Not everything will be updated automatically. To reduce the risks of hallucinations, it flags elements and code that can’t be migrated so you can perform manual updates. Not all Oracle features will migrate, as proprietary SQL extensions may not map to PostgreSQL’s standards-based approach.
Mirroring in Fabric
Data is increasingly important for businesses, with the growing capabilities of analytical platforms like Microsoft’s Fabric. HorizonDB and other operational databases are part of this approach, as they will mirror their tables into Fabric without affecting your applications. This brings near-real-time business data into an analytical platform for use in dashboards and AI applications. There’s no need for complex ETL to go from a row-based store to a column-based one, as it’s all handled by the platform. Microsoft won’t detail a timeline for bringing HorizonDB to Fabric, but it is part of the road map.
PostgreSQL is an important part of Microsoft’s data platform. Its open source foundations make it easy to develop outside Azure and then configure as part of deploying an application. HorizonDB takes it further, with support for at-scale cloud-native applications and for embedded AI. At the same time, mirroring operational, transactional data from PostgreSQL into Fabric ensures that your analytic applications have access to up-to-date information, making it easier to make business decisions without having to wait for data.
Improving annotation quality with machine learning 20 Nov 2025, 9:00 am
Data science and machine learning teams face a hidden productivity killer: annotation errors. Recent research from Apple analyzing production machine learning (ML) applications found annotation error rates averaging 10% across search relevance tasks. Even ImageNet, computer vision’s gold standard benchmark, contains a 6% error rate that MIT CSAIL discovered in 2024—errors that have skewed model rankings for years.
The impact extends beyond accuracy metrics. Computer vision teams spend too much of their time on data preparation and annotation, with quality issues creating development bottlenecks where engineers spend more time fixing errors than building models. Teams implementing manual quality control report five to seven review cycles before achieving production-ready data sets, with each cycle requiring coordination across annotators, domain experts, and engineers.
The financial implications follow the 1x10x100 rule: annotation errors cost $1 to fix at creation, $10 during testing, and $100 after deployment when factoring in operational disruptions and reputational damage.
Why current annotation tools fall short
Existing annotation platforms face a fundamental conflict of interest that makes quality management an afterthought rather than a core capability. Enterprise solutions typically operate on business models that incentivize volume—they profit by charging per annotation, not by delivering performant downstream models. This creates incentives to annotate ever increasing amounts of data with little motivation to prevent errors that would reduce billable work. Their black-box operations provide minimal visibility into QA processes while demanding $50,000+ minimum engagements, making it impossible for teams to understand or improve their annotation quality systematically.
Open-source alternatives like Computer Vision Annotation Tool (CVAT) and Label Studio focus on labeling workflows but lack the sophisticated error detection capabilities needed for production systems. They provide basic consensus mechanisms—multiple annotators reviewing the same samples—but don’t offer prioritization of which samples actually need review or systematic analysis of error patterns.
These shortcomings force a telling statistic: 45% of companies now use four or more annotation tools simultaneously, cobbling together partial solutions that still leave quality gaps. The result is a costly, multi-step process where teams cycle through initial annotation, extensive manual QA, correction rounds, and re-validation. Each step adds weeks to development timelines because the underlying tools lack the intelligence to identify and prevent quality issues systematically.
Modern ML development demands annotation platforms that understand data, not just manage labeling workflows. Without this understanding, teams remain trapped in reactive quality control cycles that scale poorly and consume engineering resources that should be focused on model innovation.
A data-centric annotation solution
Voxel51’s flagship product, FiftyOne, fundamentally reimagines annotation quality management by treating it as a data understanding problem rather than a labeling workflow challenge. Unlike traditional platforms that focus on creating labels, FiftyOne helps teams work smarter by identifying which data actually needs annotation attention and where errors are most likely to occur.
Our data-centric approach represents a paradigm shift from reactive quality control to proactive data intelligence. Instead of blindly labeling entire data sets or reviewing random samples, the platform uses ML-powered analysis to prioritize high-impact data, automatically detect annotation errors, and focus human expertise where it matters most.
FiftyOne leverages machine learning to identify specific, actionable quality issues. This methodology recognizes that annotation errors aren’t random—they follow patterns driven by visual complexity, ambiguous edge cases, and systematic biases that can be detected and corrected algorithmically.
This intelligence transforms annotation from a cost center into a strategic capability. Rather than accepting 10% error rates as inevitable, teams can systematically drive down error rates while reducing the time and cost required to achieve production-quality data sets. FiftyOne is backed by an open-source community with three million installs and teams from Microsoft, Google, Bosch, Ford, Raytheon, Berkshire Grey, and more.
Automated error detection with mistakenness scoring
FiftyOne’s compute_mistakenness() capability identifies potential annotation errors by analyzing disagreement between ground truth labels and model predictions. This ML-powered approach ranks errors by likelihood and impact, transforming weeks of manual review into hours of targeted correction.
import fiftyone.brain as fob
# Automatically detect likely annotation errors
fob.compute_mistakenness(dataset, "predictions", label_field="ground_truth")
The system generates several error indicators:
- mistakenness: Likelihood that a label is incorrect (0-1 scale)
- possible_missing: High-confidence predictions with no ground truth match
- possible_spurious: Unmatched ground truth objects likely to be incorrect
from fiftyone import ViewField as F
# Show most likely annotation mistakes first
mistake_view = dataset.sort_by("mistakenness", reverse=True)
# Find highly suspicious labels (>95% error likelihood)
high_errors_view = dataset.filter_labels("ground_truth", F("mistakenness") > 0.95)
# Identify samples with missing annotations
missing_objects_view = dataset.match(F("possible_missing") > 0)
FiftyOne’s interactive interface enables immediate visual verification of flagged errors. Teams can quickly confirm whether detected issues represent actual annotation mistakes or model limitations, focusing human expertise on genuine problems rather than reviewing random samples.

Voxel51
This intelligent prioritization typically achieves significantly faster convergence to accurate labels compared to random sampling approaches, with customers like SafelyYou reporting a 77% reduction in images sent for manual verification.
Patch embedding-based pattern discovery
FiftyOne’s patch embedding visualization exposes quality issues invisible to traditional metrics. The platform’s similarity analysis projects samples into semantic space, revealing clusters of similar images with inconsistent annotations.
In other words, embeddings finds groups of similar objects that should be labeled the same way but aren’t (consistency-driven error detection).
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Path to BDD100k dataset files
source_dir = "/path/to/bdd100k-dataset"
# Load dataset
dataset = foz.load_zoo_dataset("bdd100k", split="validation", source_dir=source_dir)
# Compute patch embeddings using pre-trained model
model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")
gt_patches = dataset.to_patches("detections")
gt_patches.compute_patch_embeddings(
model=model, patches_field='detections',
embeddings_field='patch_embeddings',
)
# Generate embedding visualization
results = fob.compute_visualization(
gt_patches, embeddings='patch_embeddings', brain_key="img_viz"
)
# Launch interactive visualization
session = fo.launch_app(gt_patches)
Clusters can be used to identify vendor-specific annotation errors invisible to statistical quality metrics—errors that only became apparent when visualizing the semantic similarity of misclassified samples.

Voxel51
Similarity search for quality control
Once you find one problematic annotation, similarity search becomes a powerful tool to find all related errors. Click on a mislabeled sample and instantly retrieve the most similar images to check if they have the same systematic labeling problem.
FiftyOne’s similarity search transforms “find more like this” from manual tedium into instant discovery. Index your data set once, then instantly retrieve visually similar samples through point-and-click or programmatic queries.
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
# Load dataset
dataset = foz.load_zoo_dataset("quickstart")
# Index images by similarity
fob.compute_similarity(
dataset,
model="clip-vit-base32-torch",
brain_key="img_sim"
)
# Sort by most likely to contain annotation mistakes
mistake_view = dataset.sort_by("mistakenness", reverse=True)
# Query the first sample and find 10 most similar images
query_id = mistake_view.take(1).first().id
similar_view = dataset.sort_by_similarity(query_id, k=10, brain_key="img_sim")
# Launch App to view similar samples and for point-and-click similarity search
session = fo.launch_app(dataset)
Key capabilities include instant visual search through the App interface, object-level similarity indexing for detection patches, and scalable back ends that switch from sklearn to Qdrant, Pinecone, or other vector databases for production.
Remove problematic samples before they’re sent to annotators
FiftyOne’s Data Quality workflow scans data sets for visual issues that commonly lead to annotation mistakes. The built-in analyzer detects problematic samples—overly bright/dark images, excessive blur, extreme aspect ratios, and near-duplicates—that annotators often label inconsistently.
How the Data Quality workflow prevents annotation errors:
- Brightness/blur detection: Identifies low-quality images where annotators guess labels
- Near-duplicate finder: Reveals inconsistent annotations across visually identical samples
- Extreme aspect ratios: Flags distorted images that confuse annotators about object proportions
- Interactive thresholds: Adjusts sensitivity to explore borderline cases where quality degrades

Voxel51
Teams like Berkshire Grey achieved 3x faster investigations by using the tagging system to quarantine problematic samples, preventing bad annotations from contaminating model training. This transforms quality control from reactive debugging into proactive prevention.
Works with existing annotation tools and pipelines
Rather than forcing teams to abandon existing annotation infrastructure, FiftyOne can integrate seamlessly with any platform including CVAT, Labelbox, Label Studio, and V7 Darwin. The platform’s annotate() API uploads samples directly to these services while maintaining complete provenance tracking. After correction, load_annotations() imports updated labels back into FiftyOne for validation.
This integration extends throughout the platform. FiftyOne works natively with PyTorch, TensorFlow, and Hugging Face, enabling quality assessment within existing ML pipelines. Moreover, FiftyOne’s plugins architecture enables rapid development of custom functionality tailored to specific workflows.
FiftyOne’s data-centric approach offers automated error detection that reduces quality assessment time by 80%, improves model accuracy by 15% to 30%, and delivers up to 50% operational efficiency gains. By emphasizing understanding and improving data set quality through ML-powered analysis, FiftyOne differentiates itself from traditional labeling platforms—all while maintaining an open-core foundation that ensures transparency and flexibility.
For engineering teams drowning in annotation quality issues, the solution isn’t better labeling tools—it’s better data understanding. FiftyOne transforms annotation quality from a manual bottleneck into an automated, intelligent process that scales with modern ML development needs.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Microsoft rolls out Agent 365 ‘control plane’ for AI agents 20 Nov 2025, 12:09 am
Microsoft Agent 365 has been introduced as a control plane to help organizations deploy and manage AI agents at scale.
Unveiled November 18, Agent 365 is available through Microsoft’s Frontier program for early access to AI technologies. Agent 365 lets users manage an organization’s agents at scale, regardless of where these agents are built or acquired, Microsoft said. Agent 365 integrates with Microsoft productivity and business tools, allowing employees to work smarter, faster, and more creatively with agents, according to Microsoft.
Agent 365 integrates with Microsoft’s security solutions including Microsoft Defender, Microsoft Entra, and Microsoft Purview to protect and govern agents; integrates with the Microsoft 365 productivity apps including Word, Excel, Outlook, and the Work IQ intelligence layer to provide work context and accelerate productivity; and integrates with the Microsoft 365 admin center to manage agents. The suite delivers unified observability across an entire agent fleet through telemetry, dashboards, and alerts, allowing IT organizations to track every agent being used, built, or brought into the organization, Microsoft said.
IT administrators can get early access to Agent 365 by signing up online. Microsoft said that Agent 365 unlocks five capabilities intended to make enterprise-scale AI possible:
- Registry, to view all agents in an organization, including agents with agent ID, agents registered by the user, and shadow agents.
- Access control, to bring agents under management and limit access only to needed resources.
- Visualization, to explore connections between agents, people, and data, and monitor agent performance.
- Interoperability, by equipping agents with applications and data to simplify human-agent workflows. They would be connected to Work IQ to provide context of work to onboard into business processes.
- Security, to protect agents from threats and vulnerabilities and remediate attacks that target agents.
Microsoft Fabric IQ adds ‘semantic intelligence’ layer to Fabric 19 Nov 2025, 5:46 pm
With Fabric IQ, Microsoft is adding new semantic intelligence capabilities to its unified data and analytics platform, Fabric, that it says will help enterprises maintain a common data model and automate operational decisions.
We’ll be hearing a lot more about “IQ” from Microsoft, which has also just introduced Work IQ, the semantic intelligence layer for Microsoft 365, and Foundry IQ, a managed knowledge system for grounding AI agents via multiple knowledge repositories, that together with Fabric IQ form what Microsoft calls a “shared intelligence layer” for the enterprise.
Fabric IQ will offer five integrated capabilities, Microsoft said: the ability to create an ontology or shared model of business entities, relationships, rules, and objectives; a semantic model extending business intelligence definitions beyond analytics to AI and operations; a native graph engine to enable multi-hop reasoning with data; virtual analysts called data agents that can answer business questions; and autonomous operations agents that can reason, learn, and act in real time.
The ontology is the foundation of all of this, a living structure capturing key business concepts, the company said. It contrasted it with traditional data modelling, saying that it can be created and evolved by business experts using a no-code tool (still in preview), without support from engineers — yet still offering IT managers the control to secure, approve, version, and manage it.
Constellation Research principal analyst Michael Ni was skeptical: “There is upfront work for IT. Ontologies don’t build themselves,” he said.
IT teams may be able to capitalize on work they have already done: Organizations using Power BI can import its data models as the basis for their ontology, Microsoft said.
From ontology to autonomy
For organizations not already invested in the Fabric ecosystem, adoption of IQ will likely take longer, warned Suhas AR, associate practice leader at HFS Research. “Key hurdles include agreeing on shared business definitions, ensuring that data permissions are carried through to AI agents, and maintaining clean and reliable real-time data feeds for automation. Teams also need some new skills and processes to govern these agents as they evolve.”
The endgame of all this describing of data is to ensure that workers, both human and AI, have a shared understanding of what the data means in the real world, so that they can analyze and act on it together.
Operations agents “monitor the business in real time, reason over live conditions, evaluate trade-offs, and take actions automatically to advance business outcomes,” Microsoft said in a blog post announcing Fabric IQ.
Such agents are capable of much more than just alerting or simple workflow automation, Microsoft said, enabling decisions at scale in seconds, without the need for interminable meetings.
Who benefits?
While successfully implementing Fabric IQ is likely to involve IT teams in some up-front heavy lifting, they’ll benefit from a longer-term reduction of operational effort, analysts say.
Microsoft-first enterprises with strong central governance stand to gain the most, Suhas said, cautioning enterprises with low data-maturity, or not committed to Microsoft’s data stack, to “wait and watch.”
Constellation Research’s Ni sees good reasons for adopting Fabric IQ: “These benefits include consistent semantics, fewer one-off models, less duplicated logic, and a shared decision layer that lowers downstream maintenance as enterprises ramp up iteration on decision automation and AI-driven automation,” he said.
Stephanie Walter, practice leader of AI Stack at HyperFrame Research, doesn’t expect IT teams’ data modelling workload to disappear with the introduction of Fabric IQ’s ontology, she does see it shifting towards controlling security, and approving the changes made by business users.
Other analysts have reservations, fearing complex, time-intensive deployments and vendor lock-in.
While Fabric IQ’s ontology will provide a clear basis for communication between employees and autonomous agents it will also, according to Moor Insights and Strategy principal analyst Robert Kramer, tie the enterprise to it.
“The more an enterprise builds on this semantic layer, the harder it becomes to move that logic elsewhere.,” Kramer said.
Fabric costs
Suhas, too, pointed to the heavy switching costs enterprises would face if they wanted to move to another platform that didn’t support the Fabric IQ ontology.
And if, after spending on the creation and governance of the ontology, and all the attendant Fabric IQ services, an enterprise was unable to drive meaningful agent adoption, then all that investment would be for nothing.
Predicting or measuring that investment will be a challenge in itself. Fabric IQ is treated as a workload in Microsoft Fabric, just like Data Factory, Analytics, Databases, Real-Time Intelligence, and Power BI, the company said.
“It uses the same unified Fabric licensing model and runs on Fabric capacity. There is no separate SKU or add-on fee,” said Yitzhak Kesselman, Microsoft CVP for Messaging and Real-Time Analytics Platform.
Those costs are tracked via a bewildering array of Fabric capacity invoice meters for underlying infrastructure usage.
Microsoft hasn’t yet published the billing meters for the Ontology item, but plans to do so later this week, with billing beginning in the first half of 2026, he said. Billing for other Fabric items will remain unchanged.
More Microsoft Ignite 2025 news:
Page processed in 0.449 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2025, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
