Who profits from AI? Not OpenAI, says think tank | InfoWorld

Technology insight for the enterprise

Who profits from AI? Not OpenAI, says think tank 30 Jan 2026, 2:15 am

Findings from a new study by Epoch AI, a non-profit research institute, appear to poke major holes in the notion that AI firms, and specifically OpenAI, will eventually become profitable.

The research paper  written by Jaime Sevilla, Hannah Petrovic and Anson Ho, suggests that while running an AI model may generate enough revenue to cover its own R&D costs, any profit is outweighed by the cost of developing the next big model. So, it said, “despite making money on each model, companies can lose money each year.”

The paper seeks to answer three questions: How profitable is running AI models? Are models profitable over their lifecycle? Will AI models become profitable?

To answer question one, researchers created a case study they called the GPT-5 bundle, which they said included all of OpenAI’s offerings available during GPT-5’s lifetime as the flagship model, including GPT-5 and GPT-5.1, GPT-4o, ChatGPT, and the API, and estimated the revenue from and costs of running the bundle. All numbers gathered were based on sources of information that included claims by OpenAI and its staff, and reporting by media outlets, primarily The Information, CNBC, and the Wall Street Journal.

The revenue estimate, they said, “is relatively straightforward”. Since the bundle included all of OpenAI’s models, it was the company’s total revenue over GPT-5’s lifetime from August to December last year: $6.1 billion.

And, they pointed out, “at first glance, $6.1 billion sounds healthy, until you juxtapose it with the costs of running the GPT-5 bundle.” These costs come from four main sources, the report said, the first of which is inference compute at a cost of $3.2 billion. That number is based on public estimates of OpenAI’s total inference compute spend in 2025, and assumes that the allocation of compute during GPT-5’s tenure was proportional to the fraction of the year’s revenue generated in that period.

The other costs are staff compensation ($1.2 billion), sales and marketing ($2.2 billion) and legal, office, and administrative costs: $0.2 billion.

It’s all in the calculation

As for options for calculating profit, the paper stated, “one option is to look at gross profits. This only counts the direct cost of running a model, which in this case is just the inference compute cost of $3.2 billion. Since the revenue was $6.1 billion, this leads to a profit of $2.9 billion, or gross profit margin of 48%, and in line with other estimates. This is lower than other software businesses, but high enough to eventually build a business on.”

In short, they stated, “running AI models is likely profitable in the sense of having decent gross margins.”

However, that’s not the full story.

The paper stated that by buying the argument that gross margins only should be considered when looking at profitability, “on those terms, it was profitable to run the GPT-5 bundle. But was it profitable enough to recoup the costs of developing it? In theory, yes — you just have to keep running them, and sooner or later you’ll earn enough revenue to recoup these costs. But in practice, models might have too short a lifetime to make enough revenue. For example, they could be outcompeted by products from rival labs, forcing them to be replaced.”

The trick, the authors stated, revolves around comparing gross profits and comparing the nearly $3 billion to the firm’s R&D costs: “To evaluate AI products, we need to look at both profit margins in inference as well as the time it takes for users to migrate to something better. In the case of the GPT-5 bundle, we find that it’s decidedly unprofitable over its full lifecycle, even from a gross margin perspective.”

As for the big question of whether AI models will become profitable, the paper stated, “the most crucial point is that these model lifecycle losses aren’t necessarily cause for alarm. AI models don’t need to be profitable today, as long as companies can convince investors that they will be in the future. That’s standard for fast-growing tech companies.”

The bottom line, said the trio of authors, is that profitability is very possible because “compute margins are falling, enterprise deals are stickier, and models can stay relevant longer than the GPT-5 cycle suggests.”

Asked whether the markets will stay irrational for long enough for OpenAI to become solvent, Jason Andersen, VP and principal analyst at Moor Insights & Strategy, said, “it’s possible, but there is no guarantee. I believe in 2026 you will see refinements in strategy from these firms. In my brain, there are three levers that OpenAI and other general-purpose AIs can use to improve their financial position (or at least slow the burn).” 

The first, he said, is pacing, “and I think that is happening already. We saw major model drops at a slower pace last year. So, by slowing down a bit, they can reduce some of their costs or at the very least spread them out better. Frankly, customers need to catch up anyway, so they can plausibly slow down, so the market can catch up to what they already have.”

The second, said Andersen, is to diversify their offerings, and the third involves capturing revenue from other software vendors.

As to whether OpenAI and others can keep going long enough for AI to become truly effective, he said, “OpenAI and Anthropic have the best chance of going long and staying independent. But, that said, I also want to be cautious about what ‘truly effective’ means. If you mean truly effective means achieving AGI, it’s theoretical, so probably not without major breakthroughs in hardware and energy. But if ‘effective’ means reaching profitability over a period of years, then yes, those two have a shot.”

The trick on the road to profits, he said, “will be finding a way to compete and win against companies that have welded their future to AI. Notably, Google, Microsoft, and X have now made their models inextricable to their other products and offerings. So, is there enough time and diversification opportunities to compete with them? My guess is that a couple pure plays will do well and maybe even disrupt the market, but many others won’t make it.”

Describing the paper’s findings as “very linear” and based on short-term analysis,  Scott Bickley, advisory fellow at Info-Tech Research Group, said that OpenAI has been “pretty open about the fact they are not profitable currently. What they pivot to is this staggering chart of how revenues are going to grow exponentially over the next three plus years, and that’s why they are trying to raise $200 billion now to build up infrastructure that’s going to support hundreds of billions of dollars of business a year.”

Many fortunes tied to OpenAI

He estimated that OpenAI’s overall financial commitments, as a result of agreements with Nvidia and hyperscalers as well as data center buildouts, now total $1.4 trillion, and said, “They’re trying to make themselves too big to fail, to buy the long runway they’re going to need for these investments to hopefully pay off over the course of years, or even decades.”

Right now, he said, the company is “shoring up the balance sheet. They’re trying to build everything they can to buy runway ahead. But either they wildly succeed beyond any of our imagination, and they come up with applications that I can’t envision are realistic today, or they fail miserably, and they’re guaranteed that everyone can buy a chunk of the empire for pennies on the dollar or something to that effect. But I think it’s either boom or bust. I don’t see a middle road.”

As it currently stands, said Bickley, all major vendors have “tied their fortunes to OpenAI, which is exactly what Sam Altman wanted to have happen. He’s going to force the biggest players in the space to help him be successful.”

In the event the company did end up failing, he predicted the impact on companies buying AI initiatives developed by it will be minimal. “Regardless of what happens to the commercial entity of OpenAI, the intellectual property that’s been developed, the models that are there, are going to be there. They’ll fall under someone’s control and continue to be used. They’re not in any danger of not being available.”

The article originally appeared on Computerworld.

(image/jpeg; 1.88 MB)

Microsoft previews GitHub Copilot app modernization for C++ 30 Jan 2026, 1:38 am

Microsoft has launched a public preview of GitHub Copilot app modernization for C++. The company had previewed C++ code editing tools for GitHub Copilot in December. Both previews are available via the Visual Studio 2026 Insiders channel.

GitHub Copilot app modernization for C++ helps developers upgrade C++ projects to newer MSVC Build Tools versions. The public preview was announced January 27. App modernization for C++ previously became available in a private preview in November, with the launch of the Visual Studio 2026 IDE. After receiving feedback from private preview participants, Microsoft has added support for CMake projects, reduced hallucinations, removed several critical failures, and improved Copilot’s behavior when encountering an internal compiler error. Microsoft also reinforced Copilot’s understanding of when project files need to be modified to do the upgrade.

With app modernization for C++, GitHub Copilot can reduce toil incurred when adopting newer versions of MSVC, Microsoft said. GitHub Copilot will first examine a project to determine whether it can update its settings to use the latest MSVC version. Microsoft described a three-step process of assessment, planning, and execution that GitHub Copilot follows for app modernization. After updating the project settings, Copilot will do an initial build to assess if there are any issues blocking the upgrade. After confirming the accuracy of the assessment with the user, Copilot will propose solutions to any issues that need to be addressed. Once the user approves the plan, the agent completes a sequence of tasks and validates that its changes resolved the identified problems. If there remains work to be done, the agent continues iterating until the problems are resolved or the conversation is discontinued.

(image/jpeg; 6.84 MB)

Suse offers cloud sovereignty assessment tool 29 Jan 2026, 9:07 pm

Suse has unveiled a Cloud Sovereignty Framework Self Assessment tool, with the intention of helping customers understand the gaps in their compliance with the 2025 EU Cloud Sovereignty Framework.

Launched January 29, the tool is a web-based, self-service discovery platform designed to evaluate an organization’s cloud infrastructure against the EU Cloud Sovereignty Framework. The assessment provides an objective Sovereignty Effective Assurance Levels (SEAL) score, which measures the organization’s sovereignty on the eight objectives defined by the framework. The tool also provides a roadmap for closing the compliance gap by leveraging Suse solutions and its European partner ecosystem.

Key features of Suse’s tool include:

  • The SEAL benchmark, which maps the organization to one of five SEAL levels, from No Sovereignty to Full Digital Sovereignty.
  • Weighted risk analysis, which weighs eight sovereignty objectives (SOVs), prioritizing supply chain (20%) and operational autonomy (15%), showing where the most critical vulnerabilities lie.
  • Trust-based engagement, with results stored in the user’s browser.
  • Consultative roadmap, a concrete improvement plan that can be downloaded as a PDF.

In explaining the assessment, Suse noted that industry research firm Forrester expects digital and AI sovereignty to drive a private cloud renaissance with doubled year-on-year growth in 2026. With the 2025 EU Cloud Sovereignty Framework now introduced, organizations risk contract ineligibility without proven digital sovereignty. The Cloud Sovereignty Framework Self Assessment simplifies this journey, Suse said.

(image/jpeg; 1.43 MB)

New PDF compression filter will save space, need software updates 29 Jan 2026, 8:03 pm

Brotli is one of the most widely used but least-known compression formats ever devised, long incorporated into all major browsers and web content delivery networks (CDNs). Despite that, it isn’t yet used in the creation and display of PDF documents, which since version 1.2 in 1996 have relied on the FlateDecode filter also used to compress .zip and .png files.

That is about to change, though, with the PDF Association moving closer to publishing a specification this summer that developers can use to add Brotli to their PDF processors. The hope is that Brotli will then quickly be incorporated in an update of the official PDF 2.0 standard, ISO 32000-2, maintained by the International Organization for Standardization.

With PDF file sizes steadily increasing, and the number stored in enterprise data lakes ballooning by billions each year, the need for a more efficient compression method has never been more pressing.

The pay-off for using Brotli compression will be smaller PDFs. This will translate into an average of 10% to 25% reduction in file size, depending on the type of content being encoded, according to a 2025 test by PDF Association member Artifex Software.

Unfortunately, for enterprises this is where the work begins. As PDFs written using Brotli compression start to circulate, anyone who hasn’t updated their applications and library dependencies to support it will be unable to decompress and open the new-format files. For PDFs, this would be a first: While the format has added numerous features since becoming an ISO standard in 2008, none have stopped users from opening PDFs.

The most visible software requiring an upgrade to support Brotli includes proprietary PDF creators and readers such as Adobe Acrobat, Foxit PDF Editor, and Nitro PDF. PDF readers integrated into browsers also fall into this category.

Beyond this, however, lies a sizable ecosystem of less-visible open-source utilities, libraries, and SDKs which are used inside enterprises as part of PDF workflows and automated batch processing. Finding and updating these components, often buried deep inside third-party libraries, promises to be time consuming.

If enterprises delay updating, then they risk encountering PDFs created using newer software supporting Brotli that will no longer open on their older, non-updated programs. IT teams will most likely come face to face with this when users contact them to report that they can’t open a file.

Building Brotli support

To kick off adoption, developers need encouragement, said Guust Ysebie, a software engineer with document processing developer Apryse. “Somebody has to jump first and make some noise so other products jump on the bandwagon,” he said.

It’s a challenge because, as he explained in a post about the move to Brotli on the PDF Association’s website, Brotli’s adoption has been slowed because the PDF specification requires consensus across hundreds of stakeholders.

The transition can be eased in three ways, he suggested, the simplest of which is to publicize the need to upgrade across multiple information sources as part of an awareness campaign.

A more radical suggestion is that Brotli-enabled PDFs could be formatted such that, rather than cause older readers to crash, they could show a “not supported” error message encouraging customers to upgrade as a placeholder for the compressed content.

A final tactic is for likeminded developers to take it upon themselves to upgrade open-source libraries. Ysebie said he’s added Brotli support to several libraries, including the iText SDK from Apryse.

“This is how adoption works in real life: Create the feature unofficially, then early adopters implement it, and this causes bigger products to also adopt it,” said Ysebie. The critical moment for adoption of Brotli-enabled software would be its appearance in Adobe Reader. This will happen at some point, but when is still unclear, he said.

The good news is that because there are only a limited number of software libraries to upgrade, adding support to this software should be straightforward, said Ysebie. However, organizations will still have to apply those updated images to their current applications.

As to when Brotli will be added to the ISO PDF 2.0 specification (ongoing since 2015), Ysebie agreed this has a way to go. But the industry had to move on from old technology at some point. “We need to push the ecosystem forward. It will be a little chaotic in the beginning but with a lot of potential for the future.”

This article first appeared on Computerworld.

(image/jpeg; 0.42 MB)

Apiiro’s Guardian Agent guards against insecure AI code 29 Jan 2026, 6:35 pm

Apiiro has launched Guardian Agent, an AI agent that helps prevent coding agents from generating vulnerable or non-compliant code by rewriting developer prompts into secure prompts, according to the company.

Introduced January 28, Guardian Agent is now in a private preview stage. Describing the technology as introducing a fundamentally new paradigm for securing software in the era of AI-driven development, Apiiro said Guardian replaces traditional appsec approaches built around detecting and fixing vulnerabilities after code is written. Guardian Agent replaces this reactive model with a preventive one, stopping risk before code is generated by guarding AI coding agents in real time, according to Apiiro. Guardian Agent operates in real time directly from the developer’s IDE and CLI tools. The agent is powered by Apiiro’s code analysis technology and a software graph that “deeply understands” the customer’s software architecture and adapts to its changes, the company said.

Elaborating on the inspiration behind Guardian Agent, Apiiro said AI coding agents are breaking the physics of application security. Enterprises generate four times more code after adopting AI coding agents and expand the application attack surface by six times. This expansion is driven by rapid generation of new APIs, duplicated open source technologies and dependencies, and other resources, reshaping the software architecture with each code change, Apiiro said. Much of the code is generated without developers being fully aware of it. By preventing vulnerabilities before code exists, security outcomes are improved and developer productivity is increased, Apiiro stressed.

(image/jpeg; 7.36 MB)

What is prompt engineering? The art of AI orchestration 29 Jan 2026, 9:00 am

Prompt engineering is the process of crafting inputs, or prompts, to a generative AI system that lead to the system producing better outputs. That sounds simple on the surface, but because LLMs and other gen AI tools are complex, nondeterministic “black box” systems, it’s a devilishly tricky process that involves trial and error and a certain degree of guesswork — and that’s before you even consider that the question of what constitutes “better output” is itself difficult to answer.

Almost every advance in computer science since COBOL has been pitched as a means for ordinary people to unlock the power of computers without having to learn any specialized languages or skills, and with natural language AI chatbots, it might seem that we’ve finally achieved that goal. But it turns out that there are a number of techniques — some intuitive, some less so — that can help you get the most from a gen AI system, and learning those techniques is quickly becoming a key skill in the AI age.

Why is prompt engineering important?

Most people’s experience with gen AI tools involves directly interacting with ChatGPT or Claude or the like. For those folks, prompt engineering techniques represent a way to get better answers out of those tools. Those tools are becoming increasingly built into business software and processes, so that’s a strong motivation to improve your prompts, just as the first generation of web users learned quirks and tricks of Google and other search engines.

However, prompt engineering is even more important for developers who are building an ecosystem around AI tools in ways that hopefully relieve some of the burden from ordinary users. Enterprise AI applications increasingly include an orchestration layer between end users and the underlying AI foundation model. This layer includes system prompts and retrieval augmented generation (RAG) tools that enhance user inputs before they’re sent to the AI system.

For instance, a medical AI application could ask its doctor and nurse users to simply input a list of patient symptoms; the application’s orchestration layer would then turn that list into a prompt, informed by prompt engineering techniques and enhanced by information derived from RAG, that will hopefully produce the best diagnosis.

For developers, this orchestration layer represents the next frontier of professional work in the AI age. Just as search engines were originally aimed at ordinary users but also spawned a multibillion dollar industry in the form of search engine optimization, so too is prompt engineering becoming a vital and potentially lucrative skill.

Prompt engineering types and techniques

Prompt engineering approaches vary in sophistication, but all serve the same goal: to guide the model’s internal reasoning and reduce the model’s tendency toward ambiguity or hallucination. The techniques fall into a few major categories:

Zero-shot prompting is the simplest and, in many cases, the default: you give the model an instruction — “Summarize this article,” “Explain this API,” “Draft a patient note” — and the system relies entirely on its general training to produce an answer. This is referred to as direct or (for reasons we’ll discuss in a moment) zero-shot prompting; it’s useful for quick tasks, but it rarely provides the consistency or structure needed for enterprise settings, where outputs must follow predictable formats and meet compliance or quality constraints.One-shot and few-shot prompting add examples to the instruction to demonstrate the format, reasoning style, or output structure the system should follow. Here’s an example of one-shot prompting with ChatGPT:

example of one-shot prompting with ChatGPT

 One-shot prompting with ChatGPT

Foundry


This is a one-shot prompt because it involves a single example, but you can add more to produce few-shot (or indeed many-shot) prompts. Direct prompts that don’t include examples were retroactively named zero-shot as a result.

Prompts of this type can be used to provide in-context learning with examples that steer the model to better performance. For instance, a model that struggles with a zero-shot instruction like “Extract key risks from this report” may respond much more reliably if given a few examples of the kinds of risks you’re talking about. In production systems, these examples are often embedded as part of the system prompt or stored in an internal prompt template library rather than visible to the end user.

Chain-of-thought prompting takes things further, encouraging the model to break down a problem into intermediate steps. It was first developed in a 2022 paper that used the following example:

Image of chain-of-thought prompts

Source: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Wei et al., 2022.

Foundry

Chain-of-thought prompts can involve elaborate demonstrations of your desired reasoning, as the example demonstrates; however, it’s worth noting that contemporary LLMs are prone to engage in chain of thought reasoning on their own with even gentle nudge, like adding “show your work” to the prompt. This technique is particularly effective for reasoning tasks—anything involving classification, diagnostics, planning, multi-step decision-making, or rules interpretation.

The way these engineered prompts work reveals something about the nature of gen AI that’s important to keep in mind. While ChatGPT and other LLM chat interfaces create the illusion that you’re having a conversation with someone, the underlying model is fundamentally a machine for predicting the next token in sequence.

When you’re “talking” with it in a natural conversational style, it’s doing its best to predict, based on its training data, what the most likely next bit of dialogue in the exchange would be. But as our examples indicate, you can prompt it with a multi-“character” dialogue scaffolds, with both the Qs and the As, and then ask it to predict the next A, or indeed even the next Q; it’s perfectly happy to do so and doesn’t necessarily “identify” with either “character,” and can even switch back and forth if you prompt it correctly. Good prompt engineering techniques can make use of this rather than trying to coax an LLM into doing what you want as if it were a person.

Zero-shot and few-shot examples can be embedded as system-level templates, and chain-of-thought reasoning can be enforced by the software layer rather than left to user discretion. More elaborate dialogue scaffolds can shape model behavior in ways that reduce risk and improve consistency. Collectively, these techniques form the core of production-grade prompting that sits between end users and the model.

Prompt engineering challenges

Prompt engineering remains a rapidly evolving discipline, and that brings real challenges. One issue is the fragility of prompts: even small changes in wording can cause large shifts in output quality. Prompts tuned for one model version do not always behave identically in a newer version, meaning organizations face ongoing maintenance simply to keep outputs stable as models update.

A related problem is opacity. Because LLMs are black-box systems, a strong prompt does not guarantee strong reasoning; it only increases the likelihood that the model interprets instructions correctly. Studies have highlighted the gap between well-engineered prompts and trustworthy outputs. In regulated industries, a model that merely sounds confident can be dangerous if the underlying prompt does not constrain it sufficiently. (We’ve already compared prompt engineering to SEO, and fragility and opacity are problems familiar to SEO practitioners.)

Enterprise teams also face scalability issues. Due to LLMs’ nondeterministic nature, a prompt that works for a single request may not perform consistently across thousands of queries, each with slightly different inputs. As businesses move toward broader deployments, this inconsistency can translate into productivity losses, compliance risks, or increased human review needs.

Security risk is another emerging challenge. Prompt-injection attacks, where malformed user input or retrieval content manipulates the internal prompt templates, are now practical threats.

Prompt engineering courses

One more challenge in the prompt engineering landscape: The skills gap remains significant. Enterprises understand the importance of prompt engineering, but the technology and techniques are so new that few professionals have hands-on experience building robust prompt pipelines. This gap is driving demand for the growing list of prompt engineering courses and certifications.

Companies themselves are increasingly offering internal training as they roll out generative AI. Citi, for example, has made AI prompt training mandatory for roughly 175,000–180,000 employees who can access its AI tools, framing it as a way to boost AI proficiency across the workforce. Deloitte’s AI Academy similarly aims to train more than 120,000 professionals on generative AI and related skills.

Prompt engineering jobs

There’s rising demand for professionals who can design prompt templates, build orchestration layers, and integrate prompts with retrieval systems and pipelines. Employers increasingly want practitioners with AI skills who understand not just prompting, but how to integrate them with retrieval systems and tool-use.

These roles often emphasize hybrid responsibilities: evaluating model updates, maintaining prompt libraries, testing output quality, implementing safety constraints, and embedding prompts into multi-step agent workflows. As companies deploy AI deeper into customer support, analytics, and operations, prompt engineers must collaborate with security, compliance, and UX teams to prevent hallucination, drift or unexpected system behavior.

Despite some skepticism about the longevity of “prompt engineer” as a standalone title, the underlying competencies—structured reasoning, workflow design, prompt orchestration, evaluation and integration—are becoming core to broader AI engineering disciplines. Demand for talent remains strong, and compensation for AI skills continues to rise.

Prompt engineering guides

Readers interested in going deeper into practical techniques have several authoritative guides available:

These resources can help you get started in this rapidly expanding field—but there’s no substitute for getting hands on with prompts yourself.

(image/jpeg; 1.67 MB)

Why your next microservices should be streaming SQL-driven 29 Jan 2026, 9:00 am

“It is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” — Abraham Maslow, 1966

Microservices provide many benefits for building business services, such as independent tools, technologies, languages, release cycles, and full control over your dependencies. However, microservices aren’t the solution for every problem. As Mr. Maslow observed, if we limit our tool choices, we end up using less-than-ideal methods for fixing our problems. In this article, we’ll take a look at streaming SQL, so you can expand your toolbox with another option for solving your business problems. But first, how does it differ from a regular (batch) SQL query?

SQL queries on a traditional database (e.g. Postgres) are bounded queries, operating over the finite set of data within the database. The bounded data includes whatever was present at the point in time of the query’s execution. Any modifications to the data set occurring after the query execution are not included in the final results. Instead, you would need to issue another query to include that new data.

Streaming SQL 01

A bounded SQL query on a database table. 

Confluent

In contrast, a streaming SQL query operates on an unbounded data set—most commonly one or more event streams. In this model, the streaming SQL engine consumes events from the stream(s) one at a time, ordering them according to timestamps and offsets. The streaming SQL query also runs indefinitely, processing events as they arrive at the inputs, updating state stores, computing results, and even outputting events to downstream streams.

Streaming SQL 02

An unbounded SQL query on an event stream. 

Confluent

Apache Flink is an excellent example of a streaming SQL solution. Under the hood is a layered streaming framework that provides low-level building blocks, a DataStream API, a higher level Table API, and at the top-most level of abstraction, a streaming SQL API. Note that Flink’s SQL streaming syntax may vary from other SQL streaming syntaxes, since there is no universally agreed upon streaming SQL syntax. While some SQL streaming services may use ANSI standard SQL, others (including Flink) add their own syntactical variations for their streaming frameworks.

There are several key benefits for building services with streaming SQL. For one, you gain access to the powerful streaming frameworks without having to familiarize yourself with the deeper underlying syntax. Secondly, you get to offload all of the tricky parts of streaming to the framework including repartitioning data, rebalancing workloads, and recovering from failures. Third, you gain the freedom to write your streaming logic in SQL instead of the framework’s domain-specific language. Many of the full-featured streaming frameworks (like Apache Kafka Streams and Flink) are written to run on the Java Virtual Machine (JVM) and offer limited support for other languages.

Finally, it’s worth mentioning that Flink uses the TABLE type as a fundamental data primitive, as evidenced by the SQL layer built on top of the Table API in this four-layer Flink API diagram. Flink uses both streams and tables as part of its data primitives, enabling you to materialize a stream into a table and similarly turn a table back into a stream by appending the table updates to an output stream. Visit the official documentation to learn more.

Let’s shift gears now to look at a few common patterns of streaming SQL use, to get an idea of where it really shines.

Pattern 1: Integrating with AI and machine learning

First on the list, streaming SQL supports direct integration with artificial intelligence and machine learning (ML) models, directly from your SQL code. Accessing an AI or ML model through streaming SQL is easier than ever. Given the emergence of AI as a contender for many business workloads, streaming SQL gives you the capability to utilize that model in an event-driven manner, without having to spin up, run, and manage a dedicated microservice. This pattern works similarly to the user-defined function (UDF) pattern: You create the model, register it for use, and then call it inline in your SQL query.

The Flink documentation goes into greater detail on how you’d hook up a model:

CREATE MODEL sentiment_analysis_model 
INPUT (text STRING COMMENT 'Input text for sentiment analysis') 
OUTPUT (sentiment STRING COMMENT 'Predicted sentiment (positive/negative/neutral/mixed)')
COMMENT 'A model for sentiment analysis of text'
WITH (
    'provider' = 'openai',
    'endpoint' = 'https://api.openai.com/v1/chat/completions',
    'api-key' = '',
    'model'='gpt-3.5-turbo',
    'system-prompt' = 'Classify the text below into one of the following labels: [positive, negative, neutral, mixed]. Output only the label.'
);

Source: Flink Examples

The model declaration is effectively a bunch of wiring and configurations that enable you to use the model in your streaming SQL code. For example, you can use this model declaration to evaluate sentiments about the body of text contained within the event. Note that ML_PREDICT requires both the specific model name used and the text parameter:

INSERT INTO my_sentiment_results 
    SELECT text, sentiment 
    FROM input_event_stream, LATERAL TABLE(ML_PREDICT('sentiment_analysis_model', text));

Pattern 2: Bespoke business logic as functions

While streaming SQL offers many functions natively, it’s simply not possible to include everything into the language syntax. This is where user-defined functions come in—they’re functions, defined by the user (you), that your program can execute from the SQL statements. UDFs can call external systems and create side effects that aren’t supported by the core SQL syntax. You define the UDF by implementing the function code in a standalone file, then upload it to the streaming SQL service before you run your SQL statement.

Let’s take a look at a Flink example.

// Declare the UDF in a separate java file
import org.apache.flink.table.api.*;
import org.apache.flink.table.functions.ScalarFunction;
import static org.apache.flink.table.api.Expressions.*;

//Returns 2 for high risk, 1 for normal risk, 0 for low risk.
public static class DefaultRiskUDF extends ScalarFunction {
  public Integer eval(Integer debt, 
    Integer interest_basis_points, 
    Integer annual_repayment,
    Integer timespan_in_years) throws Exception {

    int computed_debt = debt;

   	 for (int i = 0; i = debt )
return 2;
else if ( computed_debt  debt / 2)
return 1;
else
return 0;
}

Next, you’ll need to compile the Java file into a JAR file and upload it to a location that is accessible by your streaming SQL framework. Once the JAR is loaded, you’ll need to register it with the framework, and then you can use it in your streaming SQL statements. The following gives a very brief example of registration and invocation syntax.

-- Register the function.
CREATE FUNCTION DefaultRiskUDF
  AS 'com.namespace.SubstringUDF'
  USING JAR '';

-- Invoke the function to compute the risk of:
-- 100k debt over 15 years, 4% interest rate (400 basis points), and a 10k annual repayment rate
SELECT UserId, DefaultRiskUDF(100000, 400, 10000, 15) AS RiskRating
FROM UserFinances
WHERE RiskRating >= 1;

This UDF lets us compute the risk that a borrower may default on a loan, returning 0 and 1 for low and normal risk respectively, and a value of 2 for high-risk borrowers or accounts at risk. Though this UDF is a pretty simple and contrived example, you can do far more complex operations using almost anything within the standard Java libraries. You won’t have to build a whole microservice just to get some functionality that isn’t in your streaming SQL solution. You can just upload whatever’s missing and call it directly inline from your code.

Pattern 3: Basic filters, aggregations, and joins

This pattern takes a step back to the basic (but powerful) functions built right into the streaming SQL framework. A popular use case for streaming SQL is that of simple filtering, where you keep all records that meet a criteria and discard the rest. The following shows a SQL filter that only returns records where total_price > 10.00, which you can then output to a table of results or into a stream as a sequence of events.

SELECT *
FROM orders
WHERE total_price > 10.00;

Windowing and aggregations are another powerful set of components for streaming SQL. In this security example, we’re counting how many times a given user has attempted to log in within a one-minute tumbling window (you can also use other window types, like sliding or session).

SELECT 
    COUNT(user_id) AS login_count, 
    TUMBLE_START(event_time, INTERVAL '1' MINUTE) AS window_start
FROM login_attempts
GROUP BY TUMBLE(event_time, INTERVAL '1' MINUTE);

Once you have how many login attempts a user has in the window, you can filter for a higher value (say > 10), triggering business logic inside a UDF to lock them out temporarily as an anti-hacking feature.

Finally, you can also join data from multiple streams together with just a few simple commands. Joining streams as streams (or as tables) is actually pretty challenging to do well without a streaming framework, particularly when accounting for fault tolerance, scalability, and performance. In this example, we’re joining Product data on Orders data with the product ID, returning an enriched Order + Product result.

SELECT * FROM Orders
INNER JOIN Product
ON Orders.productId = Product.id

Note that not all streaming frameworks (SQL or otherwise) support primary-to-foreign-key joins. Some only allow you to do primary-to-primary-key joins. Why? The short answer is that it can be quite challenging to implement these types of joins when accounting for fault tolerance, scalability, and performance. In fact, you should investigate how your streaming SQL framework handles joins, and if it can support both foreign and primary key joins, or simply just the latter.

So far we’ve covered some of the basic functions of streaming SQL, though currently the results from these queries aren’t powering anything more complex. With these queries as they stand, you would effectively just be outputting them into another event stream for a downstream service to consume. That brings us to our next pattern, the sidecar.

Pattern 4: Streaming SQL sidecar

The streaming SQL sidecar pattern enables you to leverage the functionality of a full featured stream processing engine, like Flink or Kafka Streams, without having to write your business logic in the same language. The streaming SQL component provides the rich stream processing functionality, like aggregations and joins, while the downstream application processes the resulting event stream in its own independent runtime.

Streaming SQL 03

Connecting the streaming SQL query to an event-driven service via an internal stream.

Confluent

In this example, INTERNAL_STREAM is a Kafka topic where the SQL sidecar writes its results. The event-driven service consumes the events from the INTERNAL_STREAM, processes them accordingly, and may even emit events to the OUTPUT_STREAM.

Another common use of the sidecar is to prepare data to serve using a web service to other applications. The consumer consumes from the INPUT_STREAM, processes the data, and makes it available for the web service to materialize into its own state store. From there, it can serve request/response queries from other services, such as REST and RPC requests.

Streaming SQL 04

Powering a web service to serve REST / RPC requests using the sidecar pattern. 

Confluent

While the sidecar pattern gives you a lot of room for extra capabilities, it does require that you build, manage, and deploy the sidecar service alongside your SQL queries. A major benefit of this pattern is that you can rely on the streaming SQL functionality to handle the streaming transformations and logic without having to change your entire tech stack. Instead, you just plug the streaming SQL results into your existing tech stack, building your web services and other applications using the same tools that you always have.

Additional use cases

Each of these patterns shows operations in isolation, which may be enough for simpler applications. For more complex business requirements, there’s a good chance that you’re going to chain together multiple patterns, with each feeding its results into the next. For example, first you filter some data, then apply a UDF, then make a call to an ML or AI model using ML_PREDICT, as shown in the first half of the example below. The streaming SQL then filters the results from the first ML_PREDICT, applies a UDF, and then sends those results to a final ML model before writing to the OUTPUT_STREAM.

Streaming SQL 05

Chain multiple SQL streaming operations together for more complex use cases. 

Confluent

Streaming SQL is predominantly offered as a serverless capability by numerous cloud vendors, billed as a quick and easy way to get streaming data services up and running. Between its built-in functions, UDFs, materialized results, and integrations with ML and AI models, streaming SQL has proven to be a worthy contender for consideration when building microservices.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

(image/jpeg; 9.07 MB)

Get started with Angular: Introducing the modern reactive workflow 29 Jan 2026, 9:00 am

Angular is a cohesive, all-in-one reactive framework for web development. It is one of the larger reactive frameworks, focused on being a single architectural system that handles all your web development needs under one idiom. While Angular was long criticized for being heavyweight as compared to React, many of those issues were addressed in Angular 19. Modern Angular is built around the Signals API and minimal formality, while still delivering a one-stop-shop that includes dependency injection and integrated routing.

Angular is popular with the enterprise because of its stable, curated nature, but it is becoming more attractive to the wider developer community thanks to its more community engaged development philosophy. That, along with its recent technical evolution, make Angular one of the most interesting projects to watch right now.

Why choose Angular?

Choosing a JavaScript development framework sometimes feels like a philosophical debate, but it should be a practical decision. Angular is unique because it is strongly opinionated. It doesn’t just give you a view layer; it provides a complete toolkit for building web applications.

Like other reactive frameworks, Angular is built around its reactive engine, which lets you bind state (variables) to the view. But if that’s all you needed, one of the smaller, more focused frameworks would be more than enough. What Angular has that some of these other frameworks don’t is its ability to use data binding to automatically synchronize data from your user interface (UI) with your JavaScript objects. Angular also leverages dependency injection and inversion of control to help structure your application and make it easier to test. And it contains more advanced features like server-side rendering (SSR) and static-site generation (SSG) within itself, rather than requiring you to engage a meta-framework for either style of development.

While Angular might not be your top choice for every occasion, it’s an excellent option for larger projects that require features you won’t get with a more lightweight framework.

Also see: Catching up with Angular 19.

Getting started with Angular

With those concepts in mind, let’s set up Angular in your development environment. After that, we can run through developing a web application with Angular. To start, make sure you have Node and NPM installed. From the command line, enter:

$ node -v
$ npm -v

Next, you can use the Angular CLI to launch a new app:

$ ng new iw-ng

You can use the defaults in your responses to the interactive prompts shown here:

A screenshot of a new project setup in the Angular command-line interface.

Matthew Tyson

We now have a basic project layout in the new directory, which you can import into an IDE (such as VS Code) or edit directly.

Looking at the project layout, you might notice it is fairly lean, a break from Angular projects of the past. The most important parts are:

  • src/main.ts: This is the main entry point. In older versions of Angular, this file had to bootstrap a module, which then bootstrapped a component. Now, it avoids any verbose syntax, calling bootstrapApplication with your root component directly.
  • src/index.html: The main HTML page that hosts your application. This is the standard index.html that serves all root requests in a web page and contains the tag where your Angular component will render. It is the “body” that the “spirit” of your code animates.
  • src/app/app.ts: The root component of your application. This single file defines the view logic and the component metadata. In the new “standalone” world, it manages its own imports, meaning you can see exactly what dependencies it uses right at the top of the file. (This is the root element that appears in src/index.html.)
  • src/app/app.config.ts: This file is new in modern Angular and replaces the old AppModule providers array. It is where you configure global services, like the router or HTTP client.
  • angular.json: The configuration file for the CLI itself. It tells the build tools how to process your code, though you will rarely need to touch this file manually anymore.

Here is the basic flow of how the engine renders these components:

  1. The arrival (HTML): The browser receives index.html. The tag is there, but it’s empty.
  2. The unpacking (JavaScript): The browser sees the tags at the bottom of the HTML and downloads the JavaScript bundles (your compiled code) from src/app/app.ts.
  3. The assembly (Bootstrap): The browser runs that JavaScript. The code “wakes up,” finds the tag in the DOM, and dynamically inserts your title, buttons, and lists.

This flow will be different if you are using server-side rendering (SSR), but we’ll leave that option aside for now. Now that you’ve seen the basic architecture, let’s get into the code.

Developing your first web app in Angular

If you open src/app/app.ts (more info here) the component definition looks like this:

import { Component, signal } from '@angular/core';
import { RouterOutlet } from '@angular/router';

@Component({
  selector: 'app-root',
  imports: [RouterOutlet],
  templateUrl: './app.html',
  styleUrl: './app.css'
})
export class App {
  protected readonly title = signal('iw-ng');
}

Before we dissect the code, let’s run the app and see what it produces:

$ ng serve

You should see a page like this one at localhost:4200:

A screenshot of a Hello, World! app built with Angular.

Matthew Tyson

Returning to the src/app.ts component, notice that there are three main parts of the definition: the class, the metadata, and the view. Let’s unpack these separately.

The class (export class App)

Export class App is vanilla TypeScript that holds your component’s data and logic. In our example, title = signal(‘iw-ng’) defines a piece of reactive state. Unlike older versions of Angular where data was just a plain property, here we use a signal. Signals are wrappers around values that notify the template precisely when they change, enabling fine-grained performance.

The metadata (@Component)

The @Component decorator tells Angular it is dealing with a component, not just a generic class. There are several elements involved in the decorator’s communication with the engine:

  • selector: 'app-root': Defines the custom HTML tag associated with any given component. Angular finds in your index.html and renders the component there.
  • imports: In the new Angular era, dependencies are explicit. You list exactly what a component needs (like RouterOutlet or other components) here, rather than hiding them in a separate module file.
  • templateUrl: Points to the external HTML file that defines the view.

The view (the template)

This is the visual part of the component, defined in app.html. It combines standard HTML with Angular’s template syntax. (JSX handles this part for React-based apps.)

We can modify src/app/app.html to see how these three elements work together. To start, delete the default content and add the following:

Hello, {{ title() }}

The double curly braces {{ }} are called interpolation. Notice the parentheses in title(). We are reading the “title” signal value by calling its function. If you were to update that signal programmatically (e.g., this.title.set('New Value')), the text on the screen would update instantly.

Angular’s built-in control flow

Old-school Angular required “structural directives” like *ngIf and *ngFor logic control. These were powerful but required importing CommonModule and learning a specific micro-syntax. Modern Angular uses a built-in control flow that looks like standard JavaScript (similar to other Reactive platforms).

To see the new control flow in action, let’s add a list to our component. Update src/app/app.ts as follows, leaving the rest of the file the same:

export class App {
  protected readonly title = signal('iw-ng');
  protected readonly frameworks = signal(['Angular', 'React', 'Vue', 'Svelte']);
  protected showList = signal(true);

  toggleList() {
    this.showList.update(v => !v);
  }
}

While we’re at it, let’s also update src/app/app.html to render this new list (don’t worry about for now; it just tells Angular where to render the framing template):



@if (showList()) {
  
    @for (tech of frameworks(); track tech) {
  • {{ tech }}
  • }
} @else {

List is hidden

}

The app will now display a list that can be toggled for visibility:

Screenshot of a list that can be toggled on and off for visibility.

Matthew Tyson

This syntax is cleaner and easier to read than the old *ngFor loops:

  • @if conditionally renders the block if the signal’s value is true.
  • @for iterates over the array. The track keyword is required for performance (it tells Angular how to identify unique items in the list).
  • (click) is an event binding. It lets us run code (the toggleList method) when the user interacts with the button.

Services: Managing business logic in Angular

Components focus on the view (i.e., what you see). For the business logic that backs the application functionality, we use services.

A service is just a class that can be “injected” into a component that needs it. This is Angular’s famous dependency injection system. It allows you to write logic once and reuse it anywhere. It’s a slightly different way of thinking about how an application is wired together, but it gives you real organizational benefits over time.

To generate a service, you can use the CLI:

$ ng generate service frameworks

This command creates a src/app/hero.ts file. In modern Angular, we define services using the @Injectable decorator. Currently, the src/app/hero.ts file just has this:

import { Injectable } from '@angular/core';

@Injectable({
  providedIn: 'root',
})
export class Frameworks {
  
}

Open the file and add a simple method to return our data:

import { Injectable } from '@angular/core';

@Injectable({
  providedIn: 'root', // Available everywhere in the app
})
export class Frameworks {
  getList() {
    return ['Angular', 'React', 'Vue', 'Svelte'];
  }
}

The providedIn: 'root' metadata is important, it tells Angular to create a single, shared instance of this service for the entire application (you might recognize this as an instance of the singleton pattern).

Using the service

In the past, we had to list dependencies in the constructor. Modern Angular offers a cleaner way: the inject() function. Subsequently, we can refactor our src/app/app.ts to get its data from the service instead of hardcoding it:

import { Component, inject, signal } from '@angular/core';
import { RouterOutlet } from '@angular/router';
import { Frameworks } from './frameworks'; // Import the service

@Component({
  selector: 'app-root',
  imports: [RouterOutlet],
  templateUrl: './app.html',
  styleUrl: './app.css'
})
export class App {
  private frameworksService = inject(Frameworks); // Dependency Injection
  
  protected readonly title = signal('iw-ng');
  
  // Initialize signal with data directly from the service
  protected readonly frameworks = signal(this.frameworksService.getList());
  protected showList = signal(true);

  toggleList() {
    this.showList.update(v => !v);
  }
}

Dependency injection is a powerful pattern. The component doesn’t need to know where the list came from (it could be coming from an API, a database, or a hard-coded array); it just asks the service for what it needs. This pattern adds a bit of extra work up front, but it delivers a more flexible, organized codebase as the app grows in size and complexity.

Routers and routes

Once your application grows beyond a single view, you need a way to navigate between different screens. In Angular, we use the built-in router for this purpose. In our example project, src/app/app.routes.ts is the dedicated home for the router config. Let’s follow the steps for creating a new route.

First, we define the route. When you open src/app/app.routes.ts, you will see an exported routes array. This array contains the available routes for your app. Each string name resolves to a component that handles rendering that route. In effect, this is the map of your application’s landscape.

In a real application, you’d often have “framing template” material in the root of the app (like the navbar) and then the routes fill in the body content. (Remember that by default, Angular is designed for single-page apps, where navigation does reload the screen, but swaps content.)

For now, let’s just get a sense of how the router works. First, create a new component so we have a destination to travel to. In your terminal, run:

$ ng generate component details

This will generate a simple details component in the src/app/details directory.

Now we can update src/app/app.routes.ts to include this new path. We will also add a “default” path that redirects empty requests to the home view, ensuring the user always lands somewhere:

import { Routes } from '@angular/router';
import { App } from './app'; // Matches src/app/app.ts
import { Details } from './details/details'; // Matches src/app/details/details.ts

export const routes: Routes = [
  { path: '', redirectTo: '/home', pathMatch: 'full' },
  { path: 'home', component: App },
  { path: 'details', component: Details },
];

Now if you visit localhost:4200/home, you’ll get the message from the details component: “Details works!”

Next, we’ll use the routerLink directive to move between views without refreshing the page. In src/app/app.html, we create a navigation bar that sits permanently at the top of the page (the “stationary” element), while the router swaps the content below it (the “impermanent” element):




And with that, the application has a navigation flow. The user clicks, the URL updates, and the content transforms, all without the jarring flicker of a browser reload.

Parametrized routes

The last thing we’ll look at is handling route parameters, where the route accepts variables in the path. To manage this kind of dynamic data, you define a route with a variable, marked by a colon. Open src/app/app.routes.ts and add a dynamic path:

export const routes: Routes = [
  // ... existing routes
  { path: 'details/:id', component: Details }, 
];

The :id is a placeholder. Whether the URL is /details/42 or /details/108, this router will receive it because it matches the path. Inside the details component, we have access to this parameter (using the ActivatedRoute service or the new withComponentInputBinding). We can use that value to retrieve the data we need (like using it to recover a detail item from a database).

Conclusion

We have seen the core elements of modern Angular: Setting up the environment, building reactive components with signals, organizing logic with services, and tying it all together with interactive routing.

Deploying these pieces together is the basic work in Angular. Once you get comfortable with it, you have an extremely powerful platform at your fingertips. And, when you are ready to go deeper, there is a whole lot more to explore in Angular, including:

  • State management: Beyond signals, Angular has support for managing complex, application-wide state.
  • Forms: Angular has a robust system for handling user input.
  • Signals: We only scratched the surface of signals here. Signals offer a powerful, fine-grained way to manage state changes.
  • Build: You can learn more about producing production builds.
  • RxJS: Takes reactive programming to the next level.

(image/jpeg; 3.02 MB)

Crooks are hijacking and reselling AI infrastructure: Report 29 Jan 2026, 12:39 am

For years, CSOs have worried about their IT infrastructure being used for unauthorized cryptomining. Now, say researchers, they’d better start worrying about crooks hijacking and reselling access to exposed corporate AI infrastructure.

In a report released Wednesday, researchers at Pillar Security say they have discovered campaigns at scale going after exposed large language model (LLM) and MCP endpoints – for example, an AI-powered support chatbot on a website.

“I think it’s alarming,” said report co-author Ariel Fogel. “What we’ve discovered is an actual criminal network where people are trying to steal your credentials, steal your ability to use LLMs and your computations, and then resell it.”

“It depends on your application, but you should be acting pretty fast by blocking this kind of threat,” added co-author Eilon Cohen. “After all, you don’t want your expensive resources being used by others. If you deploy something that has access to critical assets, you should be acting right now.”

Kellman Meghu, chief technology officer at Canadian incident response firm DeepCove Security, said that this campaign “is only going to grow to some catastrophic impacts. The worst part is the low bar of technical knowledge needed to exploit this.”

How big are these campaigns? In the past couple of weeks alone, the researchers’ honeypots captured 35,000 attack sessions hunting for exposed AI infrastructure.

“This isn’t a one-off attack,” Fogel added. “It’s a business.” He doubts a nation-state it behind it; the campaigns appear to be run by a small group.

The goals: To steal compute resources for use by unauthorized LLM inference requests, to resell API access at discounted rates through criminal marketplaces, to exfiltrate data from LLM context windows and conversation history, and to pivot to internal systems via compromised MCP servers.

Two campaigns

The researchers have so far identified two campaigns: One, dubbed Operation Bizarre Bazaar, is targeting unprotected LLMs. The other campaign targets Model Context Protocol (MCP) endpoints. 

It’s not hard to find these exposed endpoints. The threat actors behind the campaigns are using familiar tools: The Shodan and Censys IP search engines.

At risk: Organizations running self-hosted LLM infrastructure (such as Ollama, software that processes a request to the LLM model behind an application; vLLM, similar to Ollama but for high performance environments; and local AI implementations) or those deploying MCP servers for AI integrations.

Targets include:

  • exposed endpoints on default ports of common LLM inference services;
  • unauthenticated API access without proper access controls;
  • development/staging environments with public IP addresses;
  • MCP servers connecting LLMs to file systems, databases and internal APIs.

Common misconfigurations leveraged by these threat actors include:

  • Ollama running on port 11434 without authentication;
  • OpenAI-compatible APIs on port 8000 exposed to the internet;
  • MCP servers accessible without access controls;
  • development/staging AI infrastructure with public IPs;
  • production chatbot endpoints (customer support, sales bots) without authentication or rate limiting.

George Gerchow, CSO at Bedrock Data and an IANS faculty member, said Operation Bizarre Bazaar “is a clear sign that attackers have moved beyond ad hoc LLM abuse and now treat exposed AI infrastructure as a monetizable attack surface. What’s especially concerning isn’t just unauthorized compute use, but the fact that many of these endpoints are now tied to the Model Context Protocol (MCP), the emerging open standard for securely connecting large language models to data sources and tools. MCP is powerful because it enables real-time context and autonomous actions, but without strong controls, those same integration points become pivot vectors into internal systems.”

Defenders need to treat AI services with the same rigor as APIs or databases, he said, starting with authentication, telemetry, and threat modelling early in the development cycle. “As MCP becomes foundational to modern AI integrations, securing those protocol interfaces, not just model access, must be a priority,” he said.

In an interview, Pillar Security report authors Eilon Cohen and Ariel Fogel couldn’t estimate how much revenue threat actors might have pulled in so far. But they warn that CSOs and infosec leaders had better act fast, particularly if an LLM is accessing critical data.

Their report described three components to the Bizarre Bazaar campaign:

  • the scanner: a distributed bot infrastructure that systematically probes the internet for exposed AI endpoints. Every exposed Ollama instance, every unauthenticated vLLM server, every accessible MCP endpoint gets cataloged. Once an endpoint appears in scan results, exploitation attempts begin within hours;
  • the validator: Once scanners identify targets, infrastructure tied to an alleged criminal site validates the endpoints through API testing. During a concentrated operational window, the attacker tested placeholder API keys, enumerated model capabilities and assessed response quality;
  • the marketplace: Discounted access to 30+ LLM providers is being sold on a site called The Unified LLM API Gateway. It’s hosted on bulletproof infrastructure in the Netherlands and marketed on Discord and Telegram.

So far, the researchers said, those buying access appear to be people building their own AI infrastructure and trying to save money, as well as people involved in online gaming.

Threat actors may not only be stealing AI access from fully developed applications, the researchers added. A developer trying to prototype an app, who, through carelessness, doesn’t secure a server, could be victimized through credential theft as well.

Joseph Steinberg, a US-based AI and cybersecurity expert, said the report is another illustration of how new technology like artificial intelligence creates new risks and the need for new security solutions beyond the traditional IT controls.

CSOs need to ask themselves if their organization has the skills needed to safely deploy and protect an AI project, or whether the work should be outsourced to a provider with the needed expertise.

Mitigation

Pillar Security said CSOs with externally-facing LLMs and MCP servers should:

  • enable authentication on all LLM endpoints. Requiring authentication eliminates opportunistic attacks. Organizations should verify that Ollama, vLLM, and similar services require valid credentials for all requests;
  • audit MCP server exposure. MCP servers must never be directly accessible from the internet. Verify firewall rules, review cloud security groups, confirm authentication requirements;
  • block known malicious infrastructure.  Add the 204.76.203.0/24 subnet to deny lists. For the MCP reconnaissance campaign, block AS135377 ranges;
  • implement rate limiting. Stop burst exploitation attempts. Deploy WAF/CDN rules for AI-specific traffic patterns;
  • audit production chatbot exposure. Every customer-facing chatbot, sales assistant, and internal AI agent must implement security controls to prevent abuse.

Don’t give up

Despite the number of news stories in the past year about AI vulnerabilities, Meghu said the answer is not to give up on AI, but to keep strict controls on its usage. “Do not just ban it, bring it into the light and help your users understand the risk, as well as work on ways for them to use AI/LLM in a safe way that benefits the business,” he advised.

“It is probably time to have dedicated training on AI use and risk,” he added. “Make sure you take feedback from users on how they want to interact with an AI service and make sure you support and get ahead of it. Just banning it sends users into a shadow IT realm, and the impact from this is too frightening to risk people hiding it. Embrace and make it part of your communications and planning with your employees.”

This article originally appeared on CSOonline.

(image/jpeg; 19 MB)

Google’s LiteRT adds advanced hardware acceleration 28 Jan 2026, 11:04 pm

LiteRT, Google’s “modern” on-device inference framework evolved from TensorFlow Lite (TFLite), has introduced advanced acceleration capabilities, based on a ”next-generation GPU engine” called ML Drift.

Google said that this milestone, announced January 28, solidifies LiteRT as a universal on-device framework and represents a significant leap over its predecessor, TFLite. LiteRT delivers 1.4x faster GPU performance than TFLite, provides a unified workflow for GPU and NPU acceleration across edge platforms, supports superior cross-platform deployment for generative AI models, and offers first-class PyTorch/JAX support through seamless model conversion, Google said. The company previewed LiteRT’s new acceleration capabilities last May.

Found on GitHub, LiteRT powers apps used every day, delivering low latency and high privacy on billions of devices, Google said. Via the new ML Drift GPU engine, LiteRT supports OpenCL, OpenGL, Metal, and WebGPU, allowing developers to deploy models across, mobile, desktop, and web. For Android, LiteRT automatically prioritizes when available for peak performance, while falling back to OpenGL for broader device coverage. In addition, LiteRT provides a unified, simplified NPU deployment workflow that abstracts away low-level, vendor-specific SDKs and handles fragmentation across numerous SoC (system on chip) variants, according to Google.

LiteRT documentation can be found at ai.google.dev.

(image/jpeg; 7.91 MB)

Teradata unveils enterprise AgentStack to push AI agents into production 28 Jan 2026, 9:17 am

Teradata has expanded the agent-building capabilities it launched last year into a full-blown toolkit, which it says will help enterprises address the challenge of moving AI agents beyond pilots into production-grade deployments.

Branded as Enterprise AgentStack, the expanded toolkit layers AgentEngine and AgentOps onto Teradata’s existing Agent Builder, which includes a user interface for building agents with the help of third-party frameworks such as LangGraph, and a context intelligence capability.

While AgentEngine is an execution environment for deploying agents across hybrid infrastructures, AgentOps is a unified interface for centralized discovery, monitoring, and lifecycle management of agents across a given enterprise.

The AgentEngine is a critical piece of Enterprise AgentStack as it sits between agent design and real-world operations, saidHyperFRAME Research’s practice leader of AI stack Stephanie Walter.

“Without an execution engine, enterprises often rely on custom glue code to coordinate agents. The Agent Engine standardizes execution behavior and gives enterprises a way to understand agent performance, reliability, and risk at scale,” Walter said, adding that AgentEngine-like capabilities are what enterprises need for moving agents or agentic systems into production.

However, analysts say Teradata’s approach to enterprise agent adoption differs markedly from that of rivals such as Databricks and Snowflake.

While Snowflake has been leaning on its Cortex and Native App Framework to let enterprises build AI-powered applications and agents closer to governed data, Databricks has been focusing on agent workflows through Mosaic AI, emphasizing model development, orchestration, and evaluation tied to its lakehouse architecture, Robert Kramer, principal analyst at Moor Insights and Strategy, said.

Seconding Kramer, Walter pointed out that Teradata’s differentiation lies in positioning Enterprise AgentStack as a vendor-agnostic execution and operations layer designed to work across hybrid environments, rather than anchoring agents tightly to a single cloud or data platform.

That positioning can be attributed to Teradata’s reliance on third-party frameworks such as Karini.ai, Flowise, CrewAI, and LangGraph, which give enterprises and their developers flexibility to evolve their agent architectures over time without being locked onto platforms from Snowflake and Databricks that tend to optimize for end-to-end control within their own environments, Walter added.

However, the analyst cautioned that, although Enterprise AgentStack’s architecture aligns well with enterprise needs, its litmus test will be to continue maintaining deep integrations with third-party frameworks.

“Customers will want to see concrete evidence of AgentStack supporting complex, long-running, multi-agent deployments in production,” Walter said.

Kramer, too, pointed out that enterprises and developers should try to understand the depth of usability before implementation.

“They need to check how easy it is to apply policies consistently, run evaluations after changes, trace failures end-to-end, and integrate with existing security and compliance tools. Openness only works if it doesn’t shift complexity back onto the customer,” Kramer said.

Enterprise AgentStack is expected to be made available in private preview on the cloud and on-prem between April and June this year.

(image/jpeg; 2.54 MB)

Is code a cow path? 28 Jan 2026, 9:00 am

When the automobile was first invented, many looked like—and were indeed called—horseless carriages.

Early websites for newspapers were laid out just like the paper versions. They still are to some extent.

Our computers have “desktops” and “files”—just like an office from the 1950s.

It is even said that the width of our railroad tracks is a direct result of the width between the wheels of a Roman chariot (though that claim is somewhat dubious).

This phenomenon—using new technology in the old technology way—is often called “paving the cow paths.” Cows are not known for understanding that the shortest distance between two points is a straight line, and it doesn’t always make sense to put pavement down where they wear tracks in the fields.

This notion was formalized by the great Peter Drucker, who said “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.” 

All of this got me thinking about AI writing all of our code now. 

Is code necessary?

We developers spend years honing our craft. We read books about clean code, write blogs about the proper way to structure code, and tell ourselves, rightly, that code is meant to be read and maintained as much as it is to be written.

AI coding could change all of that. Your coding agent doesn’t need to see all that great stuff that we humans do. Comments, good variable names, cleanly constructed classes, and the like are all things we do for humans. Shoot, code itself is a human construct, a prop we created to make it easier to reason about the software we design and build. 

I was recently using Claude Code to build an application, and I insisted that he (I can’t help but think of Claude Code as a person) code against interfaces and not implementations, that he design everything with small classes that do one thing, etc. I wanted the code Claude created to be what we developers always shoot for—well-written, easy to maintain, and decoupled. You know the drill. 

And then it occurred to me—are we all merely paving cowpaths? Should agentic AI be concerned with the same things we humans care about when constructing software? Claude wrote comments all over the place—that was for me, not for him. He wrote the code the way that I wanted him to. Does he have a better idea about how to make the software work? 

For that matter, who needs code anyway? It’s not inconceivable that coding agents will eventually just render machine code—i.e., they could compile your native language directly into a binary. (That’s one way to end the language wars!)

Right now we have the process of writing code, reviewing it, compiling it, and running it. We’ve added an extra layer—explaining our intentions to an agent that translates them into code. If that is a cow path—and the more I think about it, the more it does seem a rather indirect way to get from point A to point B—then what will be the most direct way to get from here to there?

The straightest path to software

Every day, our coding agents get better. The better they get, the more we’ll trust them, and the less we’ll need to review their code before committing it. Someday, we might expect, agents will review the code that agents write. What happens to code when humans eventually don’t even read what the agents write anymore? Will code even matter at that point?

Will we write unit tests—or have our agents write unit tests—only for our benefit? Will coding agents even need tests? It’s not hard to imagine a future where agents just test their output automatically, or build things that just work without testing because they can “see” what the outcome of the tests would be. 

Ask yourself this: When is the last time that you checked the output of your compiler? Can you even understand the output of your compiler? Some of you can, sure. But be honest, most of you can’t. 

Maybe AI will come up with a way of designing software based on our human language inputs that is more direct and to the point—a way that we haven’t conceived of yet. Code may stop being the primary representation of software.

Maybe code will become something that, as Peter Drucker put it, should not be done at all.

(image/jpeg; 0.22 MB)

CPython vs. PyPy: Which Python runtime has the better JIT? 28 Jan 2026, 9:00 am

PyPy, an alternative runtime for Python, uses a specially created JIT compiler to yield potentially massive speedups over CPython, the conventional Python runtime.

But PyPy’s exemplary performance has often come at the cost of compatibility with the rest of the Python ecosystem, particularly C extensions. And while those issues are improving, the PyPy runtime itself often lags in keeping up to date with the latest Python releases.

Meanwhile, the most recent releases of CPython included the first editions of a JIT compiler native to CPython. The long-term promise there is better performance, and in some workloads, you can already see significant improvements. CPython also has a new alternative build that eliminates the GIL to allow fully free-threaded operations—another avenue of significant performance gains.

Could CPython be on track to displace PyPy for better performance? We ran PyPy and the latest JIT-enabled and no-GIL CPython builds side by side on the same benchmarks, with intriguing results.

PyPy still kills it at raw math

CPython has always performed poorly in simple numerical operations, due to all the indirection and abstraction required. There’s no such thing in CPython as a primitive, machine-level integer, for instance.

As a result, benchmarks like this one tend to perform quite poorly in CPython:


def transform(n: int):
    q = 0
    for x in range(0, n * 500):
        q += x
    return q


def main():
    return [transform(x) for x in range(1000)]

main()

On a Ryzen 5 3600 with six cores, Python 3.14 takes about 9 seconds to run this benchmark. But PyPy chews through it in around 0.2 seconds.

This also isn’t the kind of workload that benefits from Python’s JIT, at least not yet. With the JIT enabled in 3.14, the time drops only slightly, to around 8 seconds.

But what happens if we use a multi-threaded version of the same code, and throw the no-GIL version of Python at it?


def transform(n: int):
    q = 0
    for x in range(0, n * 500):
        q += x
    return q


def main():
    result = []
    with ThreadPoolExecutor() as pool:
        for x in range(1000):
            result.append(pool.submit(transform, x))
    return [_.result() for _ in result]

main()

The difference is dramatic, to say the least. Python 3.14 completes this job in 1.7 seconds. Still not the sub-second results of PyPy, but a big enough jump to make using threads and no-GIL worth it.

What about PyPy and threading? Ironically, running the multithreaded version on PyPy slows it down drastically, with the job taking around 2.1 seconds to run. Blame that on PyPy still having a GIL-like locking mechanism, and therefore no full parallelism across threads. Its JIT compilation is best exploited by running everything in a single thread.

If you’re wondering if swapping a process pool for a thread pool would help, the answer is, not really. A process pool version of the above does speed things up a bit—1.3 seconds on PyPy—but process pools and multiprocessing on PyPy are not as optimized as they are in CPython.

To recap: for “vanilla” Python 3.14:

  • No JIT, GIL: 9 seconds
  • With JIT, GIL: 8 seconds
  • No JIT, no-GIL: 9.5 seconds

The no-GIL build is still slightly slower than the regular build for single-threaded operations. The JIT helps a little here, but not much.

Now, consider the same breakdown for Python 3.14 and a process pool:

  • No JIT, GIL: 1.75 seconds
  • With JIT, GIL: 1.5 seconds
  • No JIT, no-GIL: 2 seconds

How about for Python 3.14, using other forms of the script?

  • Threaded version with no-GIL: 1.7 seconds
  • Multiprocessing version with GIL: 2.3 seconds
  • Multiprocessing version with GIL and JIT: 2.4 seconds
  • Multiprocessing version with no-GIL: 2.1 seconds

And here’s a summary of how PyPy fares:

  • Single-threaded script: 0.2 seconds
  • Multithreaded script: 2.1 seconds
  • Multiprocessing script: 1.3 seconds

The n-body problem

Another common math-heavy benchmark vanilla Python is notoriously bad at is the “n-body” benchmark. This is also the kind of problem that’s hard to speed up by using parallel computation. It is possible, just not simple, so the easiest implementations are single-threaded.

If I run the n-body benchmark for 1,000,000 repetitions, I get the following results:

  • Python 3.14, no JIT: 7.1 seconds
  • Python 3.14, JIT: 5.7 seconds
  • Python 3.15a4, no JIT: 7.6 seconds
  • Python 3.15a4, JIT: 4.2 seconds

That’s an impressive showing for the JIT-capable editions of Python. But then we see that PyPy chews through the same benchmark in 0.7 seconds—as-is.

Computing pi

Sometimes even PyPy struggles with math-heavy Python programs. Consider this naive implementation to calculate digits of pi. This is another example of a task that can’t be parallelized much, if at all, so we’re using a single-threaded test.

When run for 20,000 digits, here’s what came out:

  • Python 3.14, no JIT: 13.6 seconds
  • Python 3.14, JIT: 13.5 seconds
  • Python 3.15, no JIT: 13.7 seconds
  • Python 3.15, JIT: 13.5 seconds
  • PyPy: 19.1 seconds

It’s uncommon, but hardly impossible, for PyPy’s performance to be worse than regular Python’s. What’s surprising is to see it happen in a scenario where you’d expect PyPy to excel.

CPython is getting competitive for other kinds of work

Another benchmark I’ve used often with Python is a variant of the Google n-gram benchmark, which processes a multi-megabyte CSV file and generates some statistics about it. That makes it more I/O-bound than the previous benchmarks, which were more CPU-bound, but it’s still possible to use it for useful information about the speed of the runtime.

I’ve written three incarnations of this benchmark: single-threaded, multi-threaded, and multi-process. Here’s the single-threaded version:


import collections
import time
import gc
import sys
try:
    print ("JIT enabled:", sys._jit.is_enabled())
except Exception:
    ...

def main():
    line: str
    fields: list[str]
    sum_by_key: dict = {}

    start = time.time()

    with open("ngrams.tsv", encoding="utf-8", buffering=2 

Here’s how Python 3.14 handles this benchmark with different versions of the script:

  • Single-threaded, GIL: 4.2 seconds
  • Single-threaded, JIT, GIL: 3.7 seconds
  • Multi-threaded, no-GIL: 1.05 seconds
  • Multi-processing, GIL: 2.42 seconds
  • Multi-processing, JIT, GIL: 2.4 seconds
  • Multi-processing, no-GIL: 2.1 seconds

And here’s the same picture with PyPy:

  • Single-threaded: 2.75 seconds
  • Multi-threaded: 14.3 seconds (not a typo!)
  • Multi-processing: 8.7 seconds

In other words, for this scenario, the CPython no-GIL multithreaded version beats even PyPy at its most optimal. As yet, there is no build of CPython that enables the JIT and uses free threading, but such a version is not far away and could easily change the picture even further.

Conclusion

In sum, PyPy running the most basic, unoptimized version of a math-heavy script still outperforms CPython. But CPython gets drastic relative improvements from using free-threading and even multiprocessing, where possible.

While PyPy cannot take advantage of those built-in features, its base speed is fast enough that using threading or multiprocessing for some jobs isn’t really required. For instance, the n-body problem is hard to parallelize well, and computing pi can hardly be parallelized at all, so it’s a boon to be able to run single-threaded versions of those algorithms fast.

What stands out most from these tests is that PyPy’s benefits are not universal, or even consistent. They vary widely depending on the scenario. Even within the same program, there can be a tremendous variety of scenarios. Some programs can run tremendously fast with PyPy, but it’s not easy to tell in advance which ones. The only way to know is to benchmark your application.

Something else to note is that one of the major avenues toward better performance and parallelism for Python generally—free-threading—isn’t currently available for PyPy. Multiprocessing doesn’t work well in PyPy either, due to it having a much slower data serialization mechanism between processes than CPython does.

As fast as PyPy can be, the benchmarks here show the benefits of true parallelism with threads in some scenarios. PyPy’s developers might find a way to implement that in time, but it’s unlikely they’d be able to do it by directly repurposing what CPython already has, given how different PyPy and CPython are under the hood.

(image/jpeg; 8.16 MB)

Gemini Flash model gets visual reasoning capability 28 Jan 2026, 3:20 am

Google has added an Agentic Vision capability to its Gemini 3 Flash model, which the company said combines visual reasoning with code execution to ground answers in visual evidence. The capability fundamentally changes how AI models process images, according to Google.

Introduced January 27, Agentic Vision is available via the Gemini API in the Google AI Studio development tool and Vertex AI in the Gemini app.

Agentic Vision in Gemini Flash converts image understanding from a static act into an agentic process, Google said. By combining visual reasoning andcode execution, the model formulates plans to zoom in, inspect, and manipulate images step-by-step. Until now, multimodal models typically processed the world in a single, static glance. If they missed a small detail—like a serial number or a distant sign—they were forced to guess, Google said. By contrast, Agentic Vision converts image understanding into an active investigation, introducing an agentic, “think, act, observe” loop into image understanding tasks, the company said.

Agentic Vision allows a model to interact with its environment by annotating images. Instead of just describing what it sees, Gemini 3 Flash can execute code to draw directly on the canvas to ground reasoning. Also, Agentic Vision can parse high-density tables and execute Python code to visualize findings. Future plans for Agentic Vision including adding more implicit code-driven behaviors, equipping Gemini models with more tools, and delivering the capability in more model sizes, extending it beyond Flash.

(image/jpeg; 9.21 MB)

OpenSilver 3.3 runs Blazor components inside XAML apps 27 Jan 2026, 10:37 pm

Userware has released OpenSilver 3.3, an update to the open-source framework for building cross-platform applications using C# and XAML. OpenSilver 3.3 lets Blazor components for web development run directly inside XAML applications, streamlining the process of running these components.

Userware unveiled OpenSilver 3.3 on January 27. OpenSilver SDKs for Microsoft’s Visual Studio and Visual Studio Code can be downloaded from opensilver.net.

With the Blazor boost in OpenSilver 3.3, Blazor components run directly inside an XAML visual tree, sharing the same DOM and the same runtime. Developers can drop a MudBlazor data grid, a DevExpress rich text editor, or any Blazor component directly into their XAML application without requiring JavaScript bridges or interop wrappers, according to Userware. Because OpenSilver runs on WebAssembly for browsers and .NET MAUI Hybrid for native apps, the same code deploys to Web, iOS, Android, Windows, macOS, and Linux.

The company did warn, though, that Razor code embedded inside XAML will currently show errors at design time but will compile and run correctly. Workarounds include wrapping the Razor code in CDATA, using separate .razor files, or filtering to “Build Only” errors.

Open source OpenSilver is a replacement for Microsoft Silverlight, a rich Internet application framework that was discontinued in 2021 and is no longer supported. For developers maintaining a Silverlight or Windows Presentation Foundation app, Blazor integration offers a way to modernize incrementally. Users can identify controls that need updating, such as an old data grid or a basic text editor, and replace them with modern Blazor equivalents.

    (image/jpeg; 0.24 MB)

    Anthropic integrates third‑party apps into Claude, reshaping enterprise AI workflows 27 Jan 2026, 12:47 pm

    Anthropic has added a new capability inside its generative AI-based chatbot Claude that will allow users to directly access applications inside the Claude interface.

    The new capability, termed as interactive apps, is based on a new extension — MCP Apps — of the open source Model Context Protocol (MCP).

    MCP Apps, first proposed in November and subsequently developed with the help of OpenAI’s Apps SDK, expands MCP’s capabilities to allow tools to be included as interactive UI components directly inside the chat window as part of a conversation, instead of just text or query results.

    “Claude already connects to your tools and takes actions on your behalf. Now those tools show up right in the conversation, so you can see what’s happening and collaborate in real time,” the company wrote in a blog post.

    Currently, the list of interactive apps is limited to Amplitude, Asana, Box, Canva, Clay, Figma, Hex, monday.com, and Slack. Agentforce 360 from Salesforce will be added soon, the company said, adding that the list will be expanded to include other applications.

    Easier integration into workflows

    Claude’s evolution from a chatbot into an integrated execution environment for applications is expected to help enterprises move agentic AI systems toward broader, production-scale deployments, analysts say.

    The ability to access applications directly within Claude’s interface lowers integration friction, making it simpler for enterprises to deploy Claude as an agentic system across workflows, said Akshat Tyagi, associate practice leader at HFS Research.

    “Most pilots fail in production because agents are unpredictable, hard to govern, and difficult to integrate into real workflows. Claude’s interactive apps change that,” Tyagi noted.

    For enterprise developers, the reduction in integration complexity could also translate into faster iteration cycles and higher productivity, according to Forrester Principal Analyst Charlie Dai.

    The approach provides a more straightforward path to building multi-step, output-producing workflows without the need for extensive setup or custom plumbing, Dai said.

    Targeting more productivity

    According to Tyagi, productivity gains will not only be limited to developers, but business teams will stand to benefit as well, and teams don’t need to move between systems, copy outputs, or translate AI responses into actions due to the integration of multiple applications within Claude’s interface.

    MCP Apps and Anthropic’s broader approach to productivity underscore a widening architectural split in the AI landscape, according to Avasant research director Chandrika Dutt, even as vendors pursue the same underlying goal of boosting productivity by embedding agents directly into active workflows.

    While Anthropic and OpenAI are building models in which applications run inside the AI interface, other big tech vendors, including Microsoft and Google, are focused on embedding AI directly into their productivity suites, such as Microsoft 365 Copilot and Google Gemini Workspace, Dutt said.

    “These strategies represent two paths toward a similar end state. As enterprise demand for agent-driven execution grows, in a separate layer, it is likely that these approaches will converge, with Microsoft and Google eventually supporting more interactive, app-level execution models as well,” Dutt added.

    Further, the analyst pointed out that Claude’s new interactive apps will also facilitate easier governance and trust — key facets of scaling agentic AI systems in an enterprise.

    “Claude operates on the same live screen, data, and configuration the user is viewing, allowing users to see exactly what changes are made, where they are applied, and how they affect tasks, files, or design elements in real time, without having to cross-check across different tools,” Dutt said.

    Increased burden of due diligence

    However, analysts cautioned that the unified nature of the interface may exponentially increase the risk from a security standpoint, especially as more applications are added.

    More so because running UIs of these interactive applications means running code that enterprises didn’t write themselves and that will increase the burden of due diligence before connecting to an interactive application, said Abhishek Sengupta, practice director at Everest Group.

    MCP Apps itself, though, offers several security features, in the form of sandboxing UIs, the ability for enterprises to review all templates before rendering, and audit messages between the application server and their Claude client.

    The interactive app feature is currently available to all paid subscribers of Claude, including Pro, Max, Team, and Enterprise subscribers.

    It is expected to be added to Claude Cowork soon, the company said.

    (image/jpeg; 2.6 MB)

    Alibaba’s Qwen3-Max-Thinking expands enterprise AI model choices 27 Jan 2026, 11:25 am

    Alibaba Cloud’s latest AI model, Qwen3-Max-Thinking, is staking a claim as one of the world’s most advanced reasoning engines after posting benchmark results that delivered competitive results against leading models from Google and OpenAI.

    In a blog post, Alibaba said the model was trained using expanded capacity and large-scale computing resources, including reinforcement learning, which led to improvements in factual accuracy, reasoning, instruction following, alignment with human preferences, and agent-style capabilities.

    “On 19 established benchmarks, it demonstrates performance comparable to leading models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro,” the company said.

    Alibaba said it has added two key upgrades to Qwen3-Max-Thinking: adaptive tool use that lets the model retrieve information or run code as needed, and test-time scaling techniques that it says deliver stronger reasoning performance than Google’s Gemini 3 Pro on selected benchmarks.

    Analysts offer a cautious approach to the announcement. Benchmark results evaluate the performance under specific conditions, “but enterprise IT leaders may be deploying foundation models across various use cases under different IT environments,” said Lian Jye Su, chief analyst at Omdia.

    “As such, while Qwen models have shown themselves to be legitimate alternatives to Western mainstream models, their performance still needs to be evaluated in domain-specific tasks, along with their adaptability and customization,” Su said. “It is also critical to assess scalability and efficiency when these models run on Alibaba Cloud infrastructure, which operates differently from Google Cloud Platform and Azure.”

    More options for vendor diversification

    The launch of Qwen3-Max-Thinking is likely to add momentum to AI model diversification strategies within enterprises.

    “Now that Qwen models have demonstrated themselves as legit alternatives to Western models, CIOs should consider them when evaluating pricing models, licensing terms, and the total cost of ownership of their AI projects,” Su said.  “Running on Alibaba Cloud, the cost of ownership is likely more efficient, especially in the Asia Pacific, which is great news for global companies looking to make inroads into the Chinese market or China-friendly markets.”

    Competitive reasoning scores from Qwen models expand the pool of viable suppliers, making diversification more attractive, according to Charlie Dai, principal analyst at Forrester.

    “For CIOs managing digital sovereignty and cost efficiency, strong alternatives change the strategic equation, and rising model parity increases the viability of mixed portfolios that balance sovereignty, compliance, and innovation speed,” Dai said.

    Others said benchmark momentum is also influencing how CIOs think about multi-model strategies.

    “These benchmarks are a good yardstick not just to monitor performance, but also to assess which companies are serious and consistent in investing in foundation model capabilities and adoption,” said Neil Shah, VP for research at Counterpoint Research. “This is shaping how CIOs look at diversifying to multi-model strategies to avoid putting all their eggs in one basket, while weighing performance, cost efficiency, and geopolitical headwinds.”

    That said, CIOs will need to consider the availability of these models outside of APAC alongside other factors such as export controls and compliance with local regulations.

    “The bigger question is how CIOs adopt US versus non-US models based on AI use cases,” Shah said. “Where reliability and compliance are critical, enterprises, especially in Western markets, will favor proprietary US models, while highly capable Chinese models may be used for non-critical workloads.”

    More governance and compliance challenges

    Geopolitical tensions are adding another layer of complexity for enterprises evaluating models such as Qwen3-Max-Thinking. According to Dai, this requires closer scrutiny of operational details, particularly around system logs, model update mechanisms, and how data moves across borders.

    He added that enterprise evaluations should go beyond performance testing to include red-team exercises, strict isolation of sensitive data, and alignment with internal risk and compliance frameworks.

    “Enterprises evaluating Alibaba Cloud-hosted models need to scrutinize how AI safety controls, data isolation, and auditability are implemented in practice, not just on paper,” Su said. “While most cloud providers now offer in-region or on-premise deployments to address sovereignty rules, CIOs still need to assess whether those controls meet internal risk thresholds, particularly when sensitive IP or regulated data is involved.”

    (image/jpeg; 2.39 MB)

    The private cloud returns for AI workloads 27 Jan 2026, 9:00 am

    A North American manufacturer spent most of 2024 and early 2025 doing what many innovative enterprises did: aggressively standardizing on the public cloud by using data lakes, analytics, CI/CD, and even a good chunk of ERP integration. The board liked the narrative because it sounded like simplification, and simplification sounded like savings. Then generative AI arrived, not as a lab toy but as a mandate. “Put copilots everywhere,” leadership said. “Start with maintenance, then procurement, then the call center, then engineering change orders.”

    The first pilot went live quickly using a managed model endpoint and a retrieval layer in the same public cloud region as their data platform. It worked and everyone cheered. Then invoices started arriving. Token usage, vector storage, accelerated compute, egress for integration flows, premium logging, premium guardrails. Meanwhile, a series of cloud service disruptions forced the team into uncomfortable conversations about blast radius, dependency chains, and what “high availability” really means when your application is a tapestry of managed services.

    The final straw wasn’t just cost or downtime; it was proximity. The most valuable AI use cases were those closest to people who build and fix things. Those people lived near manufacturing plants with strict network boundaries, latency constraints, and operational rhythms that don’t tolerate “the provider is investigating.” Within six months, the company began shifting its AI inference and retrieval workloads to a private cloud located near its factories, while keeping model training bursts in the public cloud when it made sense. It wasn’t a retreat. It was a rebalancing.

    AI changed the math

    For a decade, private cloud was often framed as a stepping-stone or, worse, a polite way to describe legacy virtualization with a portal. In 2026, AI is forcing a more serious reappraisal. Not because public cloud suddenly stopped working, but because the workload profile of AI is different from the workload profile of “move my app server and my database.”

    AI workloads are spiky, GPU-hungry, and brutally sensitive to inefficient architecture. They also tend to multiply. A single assistant becomes dozens of specialized agents. A single model becomes an ensemble. A single department becomes every department. AI spreads because the marginal utility of another use case is high, but the marginal cost can be even higher if you don’t control the fundamentals.

    Enterprises are noticing that the promise of elasticity is not the same thing as cost control. Yes, public cloud can scale on demand. But AI often scales and stays scaled because the business immediately learns to depend on it. Once a copilot is embedded into an intake workflow, a quality inspection process, or a claims pipeline, turning it off is not a realistic lever. That’s when predictable capacity, amortized over time, becomes financially attractive again.

    Cost is no longer a rounding error

    AI economics are exposing a gap between what people think the cloud costs and what the cloud actually costs. When you run traditional systems, you can hide inefficiencies behind reserved instances, right-sizing tools, and a few architectural tweaks. With AI, waste has sharp edges. Overprovision GPUs and you burn money. Underprovision and your users experience delays that make the system feel broken. Keep everything in a premium managed stack, and you may pay for convenience forever with little ability to negotiate the unit economics.

    Private clouds are attractive here for a simple reason: Enterprises can choose where to standardize and where to differentiate. They can invest in a consistent GPU platform for inference, cache frequently used embeddings locally, and reduce the constant tax of per-request pricing. They can still use public cloud for experimentation and burst training, but they don’t have to treat every inference call like a metered microtransaction.

    Outages are changing risk discussions

    Most enterprises know complex systems fail. The outages in 2025 did not show that cloud is unreliable, but they did reveal that relying on many interconnected services leads to correlated failure. When your AI experience depends on identity services, model endpoints, vector databases, event streaming, observability pipelines, and network interconnects, your uptime is the product of many moving parts. The more composable the architecture, the more failure points.

    Private cloud won’t magically eliminate outages, but it does shrink the dependency surface area and give teams more control over change management. Enterprises that run AI close to core processes often prefer controlled upgrades, conservative patching windows, and the ability to isolate failures to a smaller domain. That’s not nostalgia; it’s operational maturity.

    Proximity matters

    The most important driver I’m seeing in 2026 is the desire to keep AI systems close to the processes and people who use them. That means low-latency access to operational data, tight integration with Internet of Things and edge environments, and governance that aligns with how work actually happens. A chatbot in a browser is easy. An AI system that helps a technician diagnose a machine in real time on a constrained network is a different game.

    There’s also a data gravity issue that rarely receives the attention it deserves. AI systems don’t just read data; they generate it. Feedback loops, human ratings, exception handling, and audit trails become first-class assets. Keeping those loops close to the business domains that own them reduces friction and improves accountability. When AI becomes a daily instrument panel for the enterprise, architecture must serve the operators, not just the developers.

    Five steps for private cloud AI

    First, treat unit economics as a design requirement, not a postmortem. Model the cost per transaction, per employee, or per workflow step, and decide which are fixed costs and which are variable, because AI that works but is unaffordable at scale is just a demo with better lighting.

    Second, design for resilience by reducing dependency chains and clarifying failure domains. A private cloud can help, but only if you deliberately choose fewer, more reliable components, build sensible fallbacks, and test degraded modes so the business can keep moving when a component fails.

    Third, plan for data locality and the feedback loop as carefully as you plan for compute. Your retrieval layer, embedding life cycle, fine-tuning data sets, and audit logs will become strategic assets; place them where you can govern, secure, and access them with minimal friction across the teams that improve the system.

    Fourth, treat GPUs and accelerators as a shared enterprise platform with precise scheduling, quotas, and chargeback policies. If you don’t operationalize accelerator capacity, it will be captured by the teams who are the loudest but not necessarily the most critical. The resulting chaos will appear to be a technology problem when it’s really a governance problem.

    Fifth, make security and compliance practical for builders, not performative for documents. That means identity boundaries that align with real roles, automated policy enforcement in pipelines, strong isolation for sensitive workloads, and a risk management approach that recognizes that AI is software but also something new: software that talks, recommends, and occasionally hallucinates.

    (image/jpeg; 14.94 MB)

    What is the future for MySQL? 27 Jan 2026, 9:00 am

    In May of 2025, MySQL celebrated its 30th anniversary. Not many technology projects go strong for three decades, let alone at the level of use that MySQL enjoys. MySQL is listed at #2 on the DB-Engines ranking, and it is listed as the most deployed relational database by technology installation tracker 6sense.

    Yet for all its use, MySQL is seen as taking a back seat to PostgreSQL. Checking the Stack Overflow Developer Survey for 2025, 55.6% of developers use PostgreSQL compared to 40.5% that use MySQL. And when you look at the most admired technologies, PostgreSQL is at 46.5% while MySQL languishes at 20.5%. Whereas developers clearly think highly of PostgreSQL, they do not view MySQL as positively.

    Both databases are excellent options. PostgreSQL is a reliable, scalable, and functionality-rich database, but it can be beyond the needs of simple application projects. MySQL is fast to deploy, easy to use, and both scalable and effective when implemented in the right way. But PostgreSQL has fans and supporters where MySQL does not.

    This is not a question of age. PostgreSQL is older than MySQL, as development work started in 1986, though the first version of PostgreSQL wasn’t released until 1995. What is different is that the open source community is committed to PostgreSQL and celebrates the development and diversity taking place. The sheer number of companies and contributors around PostgreSQL makes it easier to adopt.

    In comparison, the MySQL community is … quiet. Although Oracle has been a great steward for MySQL since the company acquired Sun in 2010, the open source MySQL Community Edition has received less love and attention than the paid MySQL Enterprise Edition or cloud versions, at least in terms of adding innovative new features.

    For example, while Oracle’s MySQL HeatWave boasts innovations like vector search, which is essential for AI projects, MySQL Community Edition lacks this capability. Although MySQL Community Edition can store vector data, it cannot perform an index-based search or approximate nearest neighbour search for that data.

    A shock to the community

    In other open source communities, we have seen a “big shock” that led to change. For example, when Redis changed its software license to be “source available,” the community created Valkey as an alternative. When HashiCorp changed its license for Terraform, it led to the creation of OpenTofu. These projects joined open source foundations and saw an increase in the number of companies that provided contributions, support, and maintenance around the code.

    Having avoided such a big shock to the system, the MySQL community has been in stasis for years, continuing with the status quo. Yet in an industry where technology companies are like sharks, always moving forward to avoid death at the hands of the competition, this stasis is detrimental to the community and to the project as a whole.

    However, a big shock may have finally arrived. The loss of many Oracle staffers has impacted the speed of MySQL development. Looking at the number of bug fixes released in each quarterly update, the number of issues fixed has dropped to a third of what it was previously. Compared to Q1 2025 (65 fixes) and Q2 2025 (again, 65 fixes), MySQL 8.4.7 saw just 21 bug fixes released. While straight bug numbers are not a perfectly representative metric, the drop itself does show how much emphasis has been taken off MySQL.

    In response, companies that are behind MySQL are coming together. Rather than continuing with things as they are, these companies recognize that developing a future path for MySQL is essential. What this will lead to will depend on decisions outside the community. Will this act as a spur for a fork of MySQL that has community support, similar to PostgreSQL? Or will this lead to MySQL moving away from the control of a single vendor, as has been the case since it was founded?

    Whatever happens, MySQL as an open source database is still a valid and viable option for developers today. MySQL has a huge community around it, and there is a lot of passion around what the future holds for the database. The challenge is how to direct that passion and get MySQL back to where it should be. MySQL is a great database that makes it easy to implement and run applications, and it is a useful option where PostgreSQL is not a good fit or overkill for an application deployment.

    Now is the time to get involved in the events being organized by the MySQL community, to join the Foundation for MySQL Slack channel, and to help build that future for the community as a whole, and to get excited about the future for MySQL again.

    New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

    (image/jpeg; 8.99 MB)

    From devops to CTO: 8 things to start doing now 27 Jan 2026, 9:00 am

    I was promoted to CTO in my late twenties, and while it is common to see young CTOs leading startups these days, it was unusual in the ‘90s. I was far less experienced back then, and still developing my business acumen. While I was a strong software developer, it wasn’t my architecture and coding skills that helped me transition to a C-level role.

    Of all the technical skills I had back then, my devops skills were the most critical. Of course, we didn’t call it devops, as the term hadn’t been invented yet. We didn’t yet have CI/CD pipelines or infrastructure-as-code capabilities. Nonetheless, I automated our builds, scripted the deployments, standardized infrastructure configurations, and monitored systems performance.

    Developing that scaffolding enabled our development teams to focus on building and testing applications while operations managed infrastructure improvements. With automation in place and a team focused on the technology, I was able to focus on higher-level tasks such as understanding customer needs, partnering with product managers, learning marketing objectives, and learning about sales operations. When our CTO left for another opportunity, I was given the chance to step into the leadership role.

    In my book, Digital Trailblazer, I elaborate on my journey from developer to CTO and CIO. Since the book came out, many readers have asked me for advice about how to accelerate their career trajectories. In this article, I focus on how high-potential employees in devops roles—including developers and engineers—can start making moves toward a CTO role.

    Lead AI programs that deliver business value

    Studies show that a significant number of generative AI experiments fail to get deployed into production. According to the recent MIT State of AI in business report, 95% of organizations are getting zero return on their AI investments. Experimentation is an essential stage in the learning experience, especially when adopting new technologies and AI models. But the C-suite is pressuring IT departments to demonstrate better return on investment (ROI) from AI initiatives.

    Devops leaders have the opportunity to make a difference in their organization and for their careers. Lead a successful AI initiative, deploy to production, deliver business value, and share best practices for other teams to follow. Successful devops leaders don’t jump on the easy opportunities; they look for the ones that can have a significant business impact.

    Recommendation: Look for opportunities with clearly defined vision statements, active sponsors, and a dedicated team committed to the objectives. Take on the role of agile delivery leader and partner with a product manager who specifies targeted user personas, priorities, and success criteria for the AI program.

    Establish development standards for using AI tools effectively

    Another area where devops engineers can demonstrate leadership skills is by establishing standards for applying genAI tools throughout the software development lifecycle (SDLC). Advanced tools and capabilities require effective strategies to extend best practices beyond early adopters and ensure that multiple teams succeed. Some questions to consider:

    “The most relevant engineers will be the ones who treat AI as a collaborator and leadership as a craft,” says Rukmini Reddy, SVP of engineering at PagerDuty. “Resolve to deepen your automation skills, but also strengthen how you communicate, mentor, and create safety across both technical systems and human processes. Resilient operations depend just as much on how teams work together as on the automation that ships our software.”

    Recommendation: The key for devops leaders is to first find the most effective ways to apply AI in the SDLC and operations, then take a leadership role in drafting and communicating standards that teams readily adopt.

    Develop platforms teams want to use

    If you want to be recognized for promotions and greater responsibilities, a place to start is in your areas of expertise and with your team, peers, and technology leaders. However, shift your focus from getting something done to a practice leadership mindset. Develop a practice or platform your team and colleagues want to use and demonstrate its benefits to the organization.

    Devops engineers can position themselves for a leadership role by focusing on initiatives that deliver business value. Look to deliver small, incremental wins and guide solutions that help teams make continuous improvements in key areas.

    Another important area of work is reviewing platform engineering approaches that improve developer experience and creating self-service solutions. Leaders seeking recognition can also help teams adopt shift-left security and improve continuous testing practices.

    Recommendation: Don’t leave it to chance that leadership will recognize your accomplishments. Track your activities, adoption, and impacts in technology areas that deliver scalable and reusable patterns.

    Shift your mindset to tech facilitator and planner

    One of the bigger challenges for engineers when taking on larger technical responsibilities is shifting their mindset from getting work done today to deciding what work to prioritize and influencing longer-term implementation decisions. Instead of developing immediate solutions, the path to CTO requires planning architecture, establishing governance, and influencing teams to adopt self-organizing standards.

    Martin Davis, managing partner at Dunelm Associates, says to become a CTO, engineers must shift from tactical problem-solving to big-picture, longer-term strategic planning. He suggests the following three questions to evaluate platforms and technologies and shift to a more strategic mindset:

    • How will these technologies handle future expansion, both business and technology?
    • How will they adapt to changing circumstances?
    • How will they allow the addition and integration of other tools?

    “There are rarely right and wrong answers, and technology changes fast, so be pragmatic and be prepared to abandon previous decisions as circumstances change,” recommends Davis.

    Recommendation: One of the hardest mindset transitions for CTOs is shifting from being the technology expert and go-to problem-solver to becoming a leader facilitating the conversation about possible technology implementations. If you want to be a CTO, learn to take a step back to see the big picture and engage the team in recommending technology solutions.

    Develop data governance and science expertise

    Many CTOs come up the ranks as delivery leaders focused on building APIs, applications, and now AI agents. Some will have data management skills and understand architecture decisions behind data warehouses, data lakes, and data fabrics.

    But fewer CTOs have a background in data engineering, dataops, data science, and data governance. Therein lies an opportunity for devops engineers who want to become CTOs one day: Get hands-on with the challenges faced by data specialists tasked with building governed data products, which are typically composed of reusable data assets that serve multiple business needs.

    A good area to dive into is improving data quality and ensuring data is AI-ready. It’s an underappreciated function that’s key to building accurate data products and AI agents.

    Camden Swita, head of AI and ML at New Relic, says to prioritize understanding how your datasets can be used by an AI system and sussing out poor data quality. “It’s one thing for a human to recognize poor data and work around it, but AI agents are still not great at it, and using poor or outdated data will lead to undesirable outcomes. Cleaning and improving data will help address common issues like hallucinations, bad recommendations, and other issues,” says Swita.

    Recommendation: Devops engineers have many opportunities to deepen their knowledge and skills in data practices. Consider getting involved in answering some of these 10 data management questions around building trust, monitoring AI models, and improving data lineage. Also, review the 6 data risks CIOs and business leaders should be paranoid about, including intellectual property and third-party data sources.

    Extend your technology expertise across disciplines

    To ascend to a leadership role, gaining expertise in a handful of practices and technologies is insufficient. CTOs are expected to lead innovation, establish architecture patterns, oversee the full SDLC, and collaborate on and sometimes manage aspects of IT operations.

    “If devops professionals want to be considered for the role of CTO, they need to take the time to master a wide range of skills,” says Alok Uniyal, SVP and head of IT process consulting practice at Infosys. “You cannot become a CTO without understanding areas such as enterprise architecture, core software engineering and operations, fostering tech innovation, the company’s business, and technology’s role in driving business value. Showing leadership that you understand all technology workstreams at a company as well as key tech trends and innovations in the industry is critical for CTO consideration.”

    Devops professionals seeking to develop a deep and wide breadth of technology knowledge and expertise recognize it requires a commitment to lifelong learning. You can’t easily invest all the time required to dive into technology expertise, take classes in every technology, or wait for the right opportunities to join programs and teams where you can develop new skills. The most successful candidates find efficient ways to learn through reading, learning from peers, and finding mentors.

    Recommendation: Add learning to your sprint commitments and chronicle your best practices in a journal or blog. Writing helps with retention and adds an important CTO skill of sharing and teaching.

    Embrace experiences outside your comfort zone

    In Digital Trailblazer, I recommend that leadership requires getting out of your comfort zone and seeking experiences beyond your expertise.

    My devops career checklist includes several recommendations for embracing transformation experiences and seeking challenges that will train you to listen, question how things work today, and challenge people to think differently. For example, consider volunteering to manage an end-to-end major incident response to better understand being under pressure and finding problem root causes. That certainly will grow your appreciation of why observability is important and the value of monitoring systems.

    However, to be a CTO, the more important challenge is to lead efforts that require participation from stakeholders, customers, and business teams. Seek out opportunities to experience change leadership:

    • Lead a journey mapping exercise to document the end-user flows through a critical transaction and discover pain points.
    • Participate in a change management program and learn the practices required to accelerate end-user adoption of a new technology.
    • Go on a customer tour or spend time with operational teams to learn firsthand how well—or not well—the provided technology is working for them.

    “One of the best ways I personally achieved an uplift in the value I brought to a business came from experiencing change,” says Reggie Best, director of product management at IBM. “Within my current organization, that usually happened by changing projects or teams—gaining new experiences, developing an understanding of new technologies, and working with different people.”

    John Pettit, CTO at Promevo, says to rise from devops professional to CTO, embrace leadership opportunities, manage teams, and align with your organization’s strategic goals. “Build business acumen by understanding how technology impacts company performance. Invest in soft skills like communication, negotiation, and strategic thinking.”

    Petit recommends that aspiring CTOs build relationships across departments, read books on digital transformation, mentor junior engineers, develop a network by attending events, and find a mentor in a non-tech C-level leadership role.

    Recommendation: The path to CTO requires spending more time with people and less time working with technology. Don’t wait for experience opportunities—seek them out and get used to being uncomfortable: it’s a key aspect of learning leadership.

    Develop a vision and deliver results

    CTOs see their roles beyond delivering technology, architecture, data, and AI capabilities. They learn the business, customers, and employees while developing executive relationships that inform their technology strategies and roadmaps.

    Martin Davis of Dunelm Associates recommends, “Think strategically, think holistically. Always look at the bigger picture and the longer term and how the decisions you make now play out as the organization builds, grows, and develops.”

    My recent research of top leadership competencies of digital leaders includes strategic thinking, value creation, influencing, and passion for making a difference. These are all competencies that aspiring CTOs develop over time by taking on more challenging assignments and focusing on collaborating with people over technical problem solving.

    Beyond strategies and roadmaps, the best CTOs are vision painters who articulate a destiny and objectives that leaders and employees embrace. They then have the leadership chops to create competitive, differentiating technical, data, and AI capabilities while reducing risks and improving security.

    You can’t control when a CTO opportunity will present itself, but if technology leadership is your goal, you can take steps to prepare. Start by changing your mindset from doing to leading, then look for opportunities to guide teams and increase collaboration with business stakeholders.

    (image/jpeg; 14.41 MB)

    Page processed in 0.068 seconds.

    Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.