Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop | InfoWorld
Technology insight for the enterpriseThe ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop 20 Mar 2026, 10:00 am
“A single training run can emit as much CO₂ as five cars do in a year.”
That finding from the University of Massachusetts, Amherst, has become the defining statistic of the generative AI era. But for the engineers and data scientists staring at a terminal, the problem isn’t just carbon, it’s the cloud bill.
The industry narrative suggests that the only solution is hardware: buying newer H100s or building massive custom silicon. But after combing through academic benchmarks, cloud billing dashboards and vendor white papers, I’ve found that roughly half of that waste is a “toggle away”.
Training efficiency isn’t about squeezing GPUs harder; it’s about spending smarter for the same accuracy. The following methods focus on training-time cost levers, changes inside the loop that cut waste without touching your model architecture.
(Note: All code examples below are available in the accompanying Green AI Optimization Toolkit repository.)
The compute levers: Taking weight off the chassis
The easiest way to speed up a race car is to take weight off the chassis. In Deep Learning, that weight is precision.
For years, 32-bit floating point (FP32) was the default. But today, switching to Mixed-Precision Math (FP16/INT8) is the highest ROI change a practitioner can make. On hardware with dedicated tensor units, like NVIDIA Ampere/Hopper, AMD RDNA 3 or Intel Gaudi 2, mixed precision can increase throughput by 3x or more.
However, this isn’t a magic wand for everyone. If you are running on pre-2019 GPUs (like the Pascal architecture) that lack Tensor Cores, you might see almost no speed gain while risking numerical instability. Similarly, compliance workloads in finance or healthcare that require bit-exact reproducibility may need to stick to FP32.
But for the 90% of use cases involving memory-bound models (ResNet-50, GPT-2, Stable Diffusion), the shift is essential. It also unlocks Gradient Accumulation, allowing you to train massive models on smaller, cheaper cards by simulating larger batch sizes. The implementation: Here is how to implement mixed precision and gradient accumulation in PyTorch. This setup allows you to simulate a batch size of 64 on a GPU that can only fit 8 samples.
python
# From 'green-ai-optimization-toolkit/01_mixed_precision.py'
import torch
from torch.cuda.amp import autocast, GradScaler
# Simulate a Batch Size of 64 using a Micro-Batch of 8
eff_batch_size = 64
micro_batch = 8
accum_steps = eff_batch_size // micro_batch
scaler = GradScaler() # Prevents gradient underflow in FP16
for i, (data, target) in enumerate(loader):
# 1. The Toggle: Run forward pass in FP16
with autocast():
output = model(data)
loss = criterion(output, target)
loss = loss / accum_steps # Normalize loss
# 2. Scale gradients and accumulate
scaler.scale(loss).backward()
# 3. Step only after N micro-batches
if (i + 1) % accum_steps == 0:
scaler.step(optimizer)
scaler.update()
optimizer.zero_grad()
The data levers: Feeding the beast
If your GPU utilization is hovering around 40%, you aren’t training a model; you are burning cash. The bottleneck is almost always the data loader.
A common mistake is treating data preprocessing as a per-epoch tax. If you use expensive text tokenizers (like Byte-Pair Encoding) or complex image transforms, cache pre-processed data. Tokenize or resize once, store the result and feed it directly.
Furthermore, look at your file formats. Reading millions of small JPEG or CSV files over a network file system kills I/O throughput due to metadata overhead. Instead, stream data via archives. Sharding your dataset into POSIX tar files or binary formats like Parquet/Avro allows the OS to read ahead, keeping the GPU hungry.
Watch out for:
- Storage ballooning: Caching pre-processed data can triple your storage footprint. You are trading storage cost (cheap) for compute time (expensive).
- Over-pruning: While data deduplication is excellent for web scrapes, be careful with curated medical or legal datasets. Aggressive filtering might discard rare edge cases that are critical for model robustness.
The operational levers: Safety and scheduling
The most expensive training run is the one that crashes 99% of the way through and has to be restarted.
In the cloud, spot instances (or pre-emptible VMs) offer discounts of up to 90%. To use them safely, you must implement robust checkpointing. Save the model state frequently (every epoch or N steps) so that if a node is reclaimed, you lose minutes of work, not days.
Open-source orchestration frameworks like SkyPilot have become essential here. SkyPilot abstracts away the complexity of Spot Instances, automatically handling the recovery of reclaimed nodes and allowing engineers to treat disparate clouds (AWS, GCP, Azure) as a single, cost-optimized resource pool.
You should also implement early stopping. There is no ROI in “polishing noise”. If your validation loss plateaus for 3 epochs, kill the run. This is especially potent for fine-tuning tasks, where most gains arrive in the first few epochs. However, be cautious if you are using curriculum learning, where loss might naturally rise before falling again as harder examples are introduced.
The “smoke test” protocol
Finally, never launch a multi-node job without a dry run. A simple script that runs two batches on a CPU can catch shape mismatches and OOM bugs for pennies.
python
# From 'green-ai-optimization-toolkit/03_smoke_test.py'
def smoke_test(model, loader, device='cpu', steps=2):
"""
Runs a dry-run on CPU to catch shape mismatches
and OOM bugs before the real run starts.
"""
print(f"💨 Running Smoke Test on {device}...")
model.to(device)
model.train()
try:
for i, (data, target) in enumerate(loader):
if i >= steps: break
data, target = data.to(device), target.to(device)
output = model(data)
loss = output.sum()
loss.backward()
print("✅ Smoke Test Passed. Safe to launch expensive job.")
return True
except Exception as e:
print(f"❌ Smoke Test Failed: {e}")
return False
The rapid-fire checklist: 10 tactical quick wins
Beyond the major architectural shifts, there is a long tail of smaller optimizations that, when stacked, yield significant savings. Here is a rapid-fire checklist of tactical wins.
1. Dynamic batch-size auto-tuning
- The tactic: Have the framework probe VRAM at launch and automatically choose the largest safe batch size.
- Best for: Shared GPU clusters (Kubernetes/Slurm) where free memory swings wildly.
- Watch out: Can break real-time streaming SLAs by altering step duration.
2. Continuous profiling
- The tactic: Run lightweight profilers (PyTorch Profiler, NVIDIA Nsight) for a few seconds per epoch.
- Best for: Long jobs (>30 mins). Finding even a 5% hotspot pays back the profiler overhead in a day.
- Watch out: I/O-bound jobs. If GPU utilization is
3. Store tensors in half-precision
- The tactic: Save checkpoints and activations in FP16 (instead of default FP32).
- Best for: Large static embeddings (vision, text). It halves I/O volume and storage costs.
- Watch out: Compliance workloads requiring bit-exact auditing.
4. Early-phase CPU training
- The tactic: Run the first epoch on cheaper CPUs to catch gross bugs before renting GPUs.
- Best for: Complex pipelines with heavy text parsing or JSON decoding.
- Watch out: Tiny datasets where the data transfer time exceeds the compute time.
5. Offline augmentation
- The tactic: Pre-compute heavy transforms (Mosaic, Style Transfer) and store them, rather than computing on-the-fly.
- Best for: Heavy transforms that take >20ms per sample.
- Watch out: Research that studies augmentation randomness; baking it removes variability.
6. Budget alerts & dashboards
- The tactic: Stream cost metrics per run and alert when burn-rate exceeds a threshold.
- Best for: Multi-team organizations to prevent “runaway” billing.
- Watch out: Alert Fatigue. If you ping researchers too often, they will ignore the notifications.
7. Archive stale artifacts
- The tactic: Automatically move checkpoints >90 days old to cold storage (Glacier/Archive tier).
- Best for: Mature projects with hundreds of experimental runs.
- Watch out: Ensure you keep the “Gold Standard” weights on hot storage for inference.
8. Data deduplication
- The tactic: Remove near-duplicate samples before training.
- Best for: Web scrapes and raw sensor logs.
- Watch out: Curated medical/legal datasets where “duplicates” might actually be critical edge cases.
9. Cluster-wide mixed-precision defaults
- The tactic: Enforce FP16 globally via environment variables so no one “forgets” the cheapest knob.
- Best for: MLOps teams managing multi-tenant fleets.
- Watch out: Legacy models that may diverge without specific tuning.
10. Neural architecture search (NAS)
- The tactic: Automate the search for efficient architectures rather than hand-tuning.
- Best for: Long-term production models where efficiency pays dividends over years.
- Watch out: Extremely high upfront compute cost; only worth it if the model will be deployed at massive scale.
Better habits, not just better hardware
You don’t need to wait for an H100 allocation to make your AI stack efficient. By implementing mixed precision, optimizing your data feed and adding operational safety nets, you can drastically reduce both your carbon footprint and your cloud bill.
The most sustainable AI strategy isn’t buying more power, it’s wasting less of what you already have.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Google adds vibe design to Stitch UI design tool 20 Mar 2026, 9:00 am
A key move in Google’s effort is a complete redesign of the Stitch UI. New plans for Stitch were announced March 18. With vibe designing, developers can explore ideas quickly, leading to a higher quality outcome. Instead of starting with a wireframe, developers can start by explaining the business objective they hope to achieve, what they want users to feel, or even examples of what is currently inspiring.
The Stitch UI now features a new AI-native infinite canvas that lets developers grow their ideas from early ideations to working prototypes. The new Stitch canvas is also built to amplify creativity throughout the design process. Developers can bring forth ideas regardless of the shape they take—images, text, or even code—directly to the canvas as context. The canvas context, meanwhile, is paired with a new design agent that can reason across a project’s evolution. Additionally, a new Agent manager is being introduced that tracks progress and helps developers work on multiple ideas in parallel, all while staying organized.
Also being introduced are voice capabilities where developers can speak directly to their canvas. The agent can give developers real-time design critiques and design a new landing page by interviewing the developer. Real-time updates can be made. By acting as a sounding board, AI helps uncover top ideas through dynamic critique and dialogue.
Stitch can act as a bridge to other tools in a team’s workflow. Using the recently released Stitch Model Context Protocol server and SDK, developers can leverage Stitch’s capabilities via skills and tools. Developers can also export designs to developer tools such as AI Studio and Antigravity.
Cloud at 20: Cost, complexity, and control 20 Mar 2026, 9:00 am
When Amazon Web Services launched its Simple Storage Service (S3) in March 2006, it sparked the imagination of IT leaders worldwide. I remember that era well. It was a time when the enterprise was feverishly searching for a way out of restrictive, on-premises silos. S3 and the emerging concept of public cloud promised almost unlimited scalability, pay-as-you-go economics, and the freeing of IT departments from the shackles of hardware, data centers, and routine maintenance. The vision was that enterprises could offload their IT headaches and focus on business outcomes, letting cloud providers do the heavy lifting.
Twenty years later, S3 has ballooned into a monster-scale system of more than 500 trillion objects stored, hundreds of exabytes under management, and operations spanning every corner of the planet. Netflix, Spotify, and other companies have used S3 and the wider AWS ecosystem to reinvent whole industries, scaling to dimensions that would have seemed impossible in the pre-cloud era.
But as we light the candles on S3’s birthday cake, the reality for most enterprises does not match the sleek simplicity once promised. Far from outsourcing all their IT to the cloud, most organizations today find themselves facing greater complexity, reduced transparency, and spiraling costs. Cloud hasn’t eliminated traditional IT problems; it has merely shifted and, in many cases, compounded them.
The failed promise of savings
It turns out, in the vast majority of cases, that moving workloads to the cloud does not inherently reduce costs or complexity. Many enterprises naively expected a simple equation: Moving infrastructure to the cloud equals lower costs plus greater agility. Instead, organizations gradually discovered that the pay-as-you-go model could quickly balloon budgets when left unchecked.
Cloud spending often began as a tiny line item within IT budgets, but two decades on, it’s frequently one of the highest and fastest-growing costs, sometimes eclipsing the very expenses it was meant to replace. Enterprises face a jungle of service tiers, intricate pricing metrics, unexpected data egress fees, licensing quirks, and rapid consumption of compute and storage. The quest for scale and innovation, fueled by cloud computing, often leads to sprawl: hundreds or thousands of workloads distributed across multiple cloud accounts and regions, each adding yet another layer of expense.
The real concern for CIOs isn’t just surprise costs but the struggle to implement cloud discipline. Even for organizations that embrace finops (the emerging practice of cloud financial operations), the speed of innovation outpaces their ability to govern usage. Business units spin up workloads on a whim only to forget or abandon them. Cloud cost visibility is confounded by opaque billing dashboards, and the containerization trend has made it even harder to pin down exactly where money is going.
This runaway spending is not the result of malfeasance, but rather the product of the cloud’s core attribute: enabling rapid, frictionless innovation in exchange for complex, often unpredictable billing. In recent years, the pendulum has begun to swing back, with many organizations repatriating some workloads to on-premises infrastructure just to regain cost control and predictability.
The cloud giveth and taketh away
Early cloud pioneers believed that entrusting security to hyperscale providers would mean fewer headaches. After all, who better to patch, monitor, and defend IT infrastructure than the companies running the largest, most sophisticated data centers on earth? To an extent, this is true. Cloud providers have invested untold billions in every flavor of physical and software security.
But the paradox is that by its very flexibility and openness, cloud introduces enormous security and operational complexity. The infamous rash of open S3 buckets left exposed to the world due to default configurations or simple oversight testified to the challenges of managing distributed, cloud-native architectures. Enterprises must now invest in new skills, new tools, and new processes just to manage the endless stream of updates, permissions, encryption keys, and access policies. Instead of outsourcing pain points, many organizations just replaced one set of hassles with another.
Moreover, the notion of “set it and forget it” in the cloud has proven dangerously outdated. The constant drumbeat of threats, from ransomware to nation-state actors, combined with the proliferation of APIs and services, makes the cloud a shifting, ever-expanding attack surface. Enterprises are forced not only to upskill but also to adopt whole new mindsets around zero trust, observability, and resilience engineering.
The future: more of the same
The original fantasy of cloud was that it would be a single pane of glass: one provider (often AWS), powering an enterprise’s every workload, integrated from edge to core to SaaS. In reality, as we reach this 20-year milestone, we’re in a multicloud reality whether by design, accident, or necessity. Enterprises are now managing portfolios that span AWS, Microsoft Azure, Google Cloud, and sometimes dozens of SaaS or niche providers and their own private clouds.
This shift actually magnifies all previous challenges. Not only do organizations have to master the idiosyncrasies of each provider’s architectures, costs, and security models, but they must also contend with interoperability, data movement, compliance, and the talent gap across every platform in use. The modern IT estate is a patchwork, not a seamless fabric.
As the next chapter of the cloud era unfolds, three trends are set to dominate. First, finops will become inseparable from devops and IT operations as enterprises seek to assert financial discipline over their cloud estates. Second, continuous investment in security will be essential: not just tools, but human capital and process change, with automation serving as a key enabler. Third, complexity won’t disappear; the goal will be to manage and mitigate it, not magically eliminate it.
The 20th anniversary of S3 isn’t just a celebration of a technological achievement; it’s a sobering reminder that the journey to the cloud, for most enterprises, is more marathon than sprint. The dream of outsourced, seamless IT has, for most, become managed, governed, orchestrated complexity. The next 20 years will see continued investments in cost control, security, and managing the multicloud labyrinth. Cloud may have transformed technology forever, but for the enterprise, the work—and the surprises—is just beginning.
AI optimization: How we cut energy costs in social media recommendation systems 20 Mar 2026, 9:00 am
When you scroll through Instagram Reels or browse YouTube, the seamless flow of content feels like magic. But behind that curtain lies a massive, energy-hungry machine. As a software engineer working on recommendation systems at Meta and now Google, I’ve seen firsthand how the quest for better AI models often collides with the physical limits of computing power and energy consumption.
We often talk about “accuracy” and “engagement” as the north stars of AI. But recently, a new metric has become just as critical: efficiency.
At Meta, I worked on the infrastructure powering Instagram Reels recommendations. We were dealing with a platform serving over a billion daily active users. At that scale, even a minor inefficiency in how data is processed or stored snowballs into megawatts of wasted energy and millions of dollars in unnecessary costs. We faced a challenge that is becoming increasingly common in the age of generative AI: how do we make our models smarter without making our data centers hotter?
The answer wasn’t in building a smaller model. It was in rethinking the plumbing — specifically, how we computed, fetched and stored the training data that fueled those models. By optimizing this “invisible” layer of the stack, we achieved over megawatt-scale energy savings and reduced annual operating expenses by eight figures. Here is how we did it.
The hidden cost of the recommendation funnel
To understand the optimization, you have to understand the architecture. Modern recommendation systems generally function like a funnel.
At the top, you have retrieval, where we select thousands of potential candidates from a pool of billions of media items. Next comes early-stage ranking, a high-efficiency phase that filters this large pool down to a smaller set. Finally, we reach late-stage ranking. This is where the heavy lifting happens. We use complex deep learning models — often two-tower architectures that combine user and item embeddings — to precisely order a curated set of 50 to 100 items to maximize user engagement.
This final stage is incredibly feature-dense. To rank a single Reel, the model might look at hundreds of “features.” Some are dense features (like the time a user has spent on the app today) and others are sparse features (like the specific IDs of the last 20 videos watched).
The system doesn’t just use these features to rank content; it also has to log them. Why? Because today’s inference is tomorrow’s training data. If we serve you a video and you “like” it, we need to join that positive label with the exact features the model saw at that moment to retrain and improve the system.
This logging process — writing feature values to a transient key-value (KV) store to wait for user interaction — was our bottleneck.
The challenge of transitive feature logging
To understand why this bottleneck existed, we have to look at the microscopic lifecycle of a single training example.
In a typical serving path, the inference service fetches features from a low-latency feature store to rank a candidate set. However, for a recommendation system to learn, it needs a feedback loop. We must capture the exact state of the world (the features) at the moment of inference and later join them with the user’s future action (the label), such as a “like” or a “click.”
This creates a massive distributed systems challenge: Stateful label joining.
We cannot simply query the feature store again when the user clicks, because features are mutable — a user’s follower count or a video’s popularity changes by the second. Using fresh features with stale labels introduces “online-offline skew,” effectively poisoning the training data.
To solve this, we use a transitive key-value (KV) store. Immediately after ranking, we serialize the feature vector used for inference and write it to a high-throughput KV store with a short time-to-live (TTL). This data sits there, “in transit,” waiting for a client-side signal.
- If the user interacts: The client fires an event, which acts as a key lookup. We retrieve the frozen feature vector from the KV store, join it with the interaction label and flush it to our offline training warehouse (e.g., Hive/Data Lake) as a “source-of-truth” training example.
- If the user does not interact: The TTL expires, and the data is dropped to save costs.
This architecture, while robust for data consistency, is incredibly expensive. We were essentially continuously writing petabytes of high-dimensional feature vectors to a distributed KV store, consuming massive network bandwidth and serialization CPU cycles.
Optimizing the “head load”
We realized that our “write amplification” was out of control. In the late-stage ranking phase, we typically rank a deep buffer of items — say, the top 100 candidates — to ensure the client has enough content cached for a smooth scroll.
The default behavior was eager logging: We would serialize and write the feature vectors for all 100 ranked items into the transitive KV store immediately.
However, user behavior follows a steep decay curve. A user might only view the first 5–6 items (the “head load”) before closing the app or refreshing the feed. This meant we were paying the serialization and I/O cost to store features for items 7 through 100, which had a near-zero probability of generating a positive label. We were effectively DDoS-ing our own infrastructure with “ghost data.”
We shifted to a “lazy logging” architecture.
- Selective persistence: We reconfigured the serving pipeline to only persist features for the Head Load (e.g., top 6 items) into the KV store initially.
- Client-triggered pagination: As the user scrolls past the Head Load, the client triggers a lightweight “pagination” signal. Only then do we asynchronously serialize and log the features for the next batch (items 7–15).
This change decoupled our ranking depth from our storage costs. We could still rank 100 items to find the absolute best content, but we only paid the “storage tax” for the content that actually had a chance of being seen. This reduced our write throughput (QPS) to the KV store significantly, saving megawatts of power previously wasted on serializing data that was destined to expire untouched.
Rethinking storage schemas
Once we reduced what we stored, we looked at how we stored it.
In a standard feature store architecture, data is often stored in a tabular format where every row represents an impression (a specific user seeing a specific item). If we served a batch of 15 items to one user, the logging system would write 15 rows.
Each row contained the item features (which are unique to the video) and the user features (which are identical for all 15 rows). We were effectively writing the user’s age, location and follower count 15 separate times for a single request.
We moved to a batched storage schema. Instead of treating every impression as an isolated event, we separated the data structures. We stored the user features once for the request and stored a list of item features associated with that request.
This simple de-duplication reduced our storage requirement by more than 40%. In distributed systems like the ones powering Instagram or YouTube, storage isn’t passive; it requires CPU to manage, compress and replicate. By slashing the storage footprint, we improved bandwidth availability for the distributed workers fetching data for training, creating a virtuous cycle of efficiency throughout the stack.
Auditing the feature usage
The final piece of the puzzle was spring cleaning. In a system as old and complex as a major social network’s recommendation engine, digital hoarding is a real problem. We had over 100,000 distinct features registered in our system.
However, not all features are created equal. A user’s “age” might carry very little weight in the model compared to “recently liked content.” Yet, both cost resources to compute, fetch and log.
We initiated a large-scale feature auditing program. We analyzed the weights assigned to features by the model and identified thousands that were adding statistically insignificant value to our predictions. Removing these features didn’t just save storage; it reduced the latency of the inference request itself because the model had fewer inputs to process.
The energy imperative
As the industry races toward larger generative AI models, the conversation often focuses on the massive energy cost of training GPUs. Reports indicate that AI energy demand is poised to skyrocket in the coming years.
But for engineers on the ground, the lesson from my time at Meta is that efficiency often comes from the unsexy work of plumbing. It comes from questioning why we move data, how we store it and whether we need it at all.
By optimizing our data flow — lazy logging, schema de-duplication and feature auditing — we proved that you can cut costs and carbon footprints without compromising the user experience. In fact, by freeing up system resources, we often made the application faster and more responsive. Sustainable AI isn’t just about better hardware; it’s about smarter engineering.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
OpenAI buys Python tools builder Astral 19 Mar 2026, 11:05 pm
OpenAI is acquiring Python developer toolmaker Astral, thus bringing open source developer tools into OpenAI’s Codex AI coding system. The acquisition was announced on March 19. Elaborating on the deal, OpenAI said Astral has built widely used open source Python tools, helping developers move faster with modern tools such as uv, Ruff, and ty. These tools power millions of developer workflows and have become part of the foundation of modern Python development, OpenAI said.
By bringing in Astral’s tools and engineering expertise, OpenAI said it will accelerate work on Codex and expand what AI can do across the software development life cycle. OpenAI’s goal with the Codex ecosystem is to move beyond AI that simply generates code and toward systems that can participate in the entire development workflow, helping plan changes, modify codebases, run tools, verify results, and maintain software over time. Astral’s developer tools sit in that workflow. With the integration of these systems with Codex, OpenAI said it will enable AI agents to work more directly with the tools developers already rely on every day.
Python, OpenAI said, has become one of the most important languages in modern software development, powering everything from AI and data science to back-end systems and developer infrastructure. Astral’s open source tools play a key role in that ecosystem, OpenAI said. These tools and their capabilities are cited in OpenAI’s announcement:
- Uv simplifies dependency and environment management.
- Ruff provides fast linting and formatting.
- Ty helps enforce type safety across codebases.
Astral also offers pyx, a Python-native package registry now in beta. OpenAI said that by having Astral, OpenAI will continue to support these open source projects while exploring ways they can work more seamlessly with Codex to enable AI systems to operate across the full Python development workflow.
The closing of the acquisition is subject to customary closing conditions, including regulatory approval, OpenAI said. Until the closing, OpenAI and Astral will remain separate, independent companies. Deeper integrations will be explored that allow Codex to interact more directly with the tools developers already use, helping develop Codex into a true collaborator across the development life cycle, according to OpenAI.
OpenAI buys non-AI coding startup to help its AI to program 19 Mar 2026, 10:51 pm
OpenAI on Thursday announced the acquisition of Astral, the developer of open source Python tools that include uv, Ruff and ty. It says that it plans to integrate them with Codex, its AI coding agent first released last year, as well as continuing to support the open source products.
OpenAI stated in its announcement that its goal with Codex is “to move beyond AI that simply generates code and towards systems that can participate in the entire development workflow — helping plan changes, modify codebases, run tools, verify results, and maintain software over time. Astral’s developer tools sit directly in that workflow. By integrating these systems with Codex after closing, we will enable AI agents to work more directly with the tools developers rely on every day.”
In a blog, Astral founder Charlie Marsh said that since the company was formed in 2023, the “goal has been to build tools that radically change what it feels like to work with Python — tools that feel fast, robust, intuitive and integrated. Today, we are taking a step forward in that mission.”
He added, “In line with our philosophy and OpenAI’s own announcement, OpenAI will continue supporting our open source tools after the deal closes. We’ll keep building in the open, alongside our community – and for the broader Python ecosystem – just as we have from the start.”
Shashi Bellamkonda, principal research director at Info-Tech Research Group, said that many people think that “AI” is just the chat they have with an LLM, not realizing that there is a huge unseen ecosystem of layers that have to work together to help achieve results.
Most of the focus in AI, he said, goes to the model layer: who has the best reasoning, the fastest inference, the biggest context window. But the model is useless if the environment it operates in is broken, slow, or unreliable.
With its acquisition of Astral, OpenAI “is hoping to be more efficient with its coding, since the code has to run somewhere and be efficient and free of errors,” said Bellamkonda. “I hope that OpenAI will keep its promise to continue to develop open-source Python tools, as this is used by a lot of large companies using Python.”
One possible strategy for the purchase, he explained, “could be that OpenAI, having acquired the team that built these open source tools, optimizes these tools to work better inside OpenAI’s stack than anywhere else, giving them an advantage.”
A ‘corrective move’
Describing it as a reality check for AI-led software development, Sanchit Vir Gogia, chief analyst at Greyhound Research, said the acquisition is being framed as a natural next step for Codex. “It is not. It is a corrective move. And if you read between the lines, it tells you exactly where AI coding is struggling when it leaves the demo environment and enters real software engineering systems.”
For the past couple of years, he said, “the conversation around AI in development has been dominated by one idea: speed. How fast code can be generated. How quickly a developer can go from prompt to output. That framing has been convenient, but it has also been incomplete to the point of being misleading.”
Software development is not, and has never been, just about writing code, he pointed out, adding that the actual work sits in everything that happens around it, such as managing dependencies, enforcing consistency, validating outputs, ensuring type safety, integrating with existing systems, and maintaining stability over time. “These are not creative tasks,” he said. “They are structured, repeatable, and often unforgiving. That is what keeps systems from breaking.”
Astral tools ‘constrain, validate, and correct’
According to Gogia, “this is where the tension begins. AI systems generate probabilistic outputs. Engineering systems demand deterministic behavior. That gap is no longer theoretical, it is now showing up in day-to-day development workflows.”
Across enterprises, he said, “what we are seeing is not a clean productivity story. It is far messier. Developers often say they feel faster. And to be fair, in the moment, they are. Code appears quicker, boilerplate disappears, certain tasks collapse from hours to minutes. But when you step back and look at the full lifecycle, the gains start to blur.”
The effort, he explained, “does not disappear, it moves. Time saved at the point of creation starts to reappear downstream. Teams spend more time reviewing what was generated. They spend more time fixing inconsistencies. They deal with dependency mismatches that were not obvious at generation time. They enforce internal standards that the model does not fully understand. Integration takes longer than expected. Testing cycles stretch. In some cases, defects increase because the system looks correct on the surface but breaks under real conditions.”
Astral did not set out to build AI, Gogia said. Instead, “it focused on something far less glamorous and far more important: Making the Python ecosystem faster, stricter, and more predictable. Ruff enforces code quality and formatting at speed, uv simplifies and stabilizes dependency and environment management, ty brings type safety into the workflow with minimal overhead.”
He added, “[these tools] do not generate anything. They constrain, validate, and correct. They operate in a world where outputs must be consistent and reproducible. That is precisely what AI lacks on its own.”
By bringing Astral into the Codex environment, said Gogia, “OpenAI is not just adding features. It is adding discipline. It is effectively saying that if AI is going to participate across the development lifecycle, it needs to operate within systems that can continuously check and correct its behavior. Without that, scale becomes risk.”
Why AI evals are the new necessity for building effective AI agents 19 Mar 2026, 9:00 am
The AI agent market is projected to grow from $5.1 billion in 2024 to over $47 billion by 2030, yet Gartner predicts that more than 40% of agentic AI projects will be canceled by the end of 2027. The reason is not model capability. It is trust.
Traditional AI evaluation tells you whether a model performs well in isolation. Accuracy benchmarks, latency metrics and token efficiency measure what models can do. They do not measure whether users will trust an agent to act on their behalf. As InfoWorld has noted, reliability and predictability remain top enterprise challenges for agentic AI. These are interaction-layer problems, not model-layer problems and they require a different approach to evaluation.
In my experience leading user research for AI-powered collaboration experiences at Microsoft and Cisco, I have observed a consistent pattern: the teams that succeed with agentic AI evaluate agent behavior from the user’s perspective, not just model performance. What follows is a framework for doing exactly that.
The evaluation gap
A 2024 meta-analysis published in Nature Human Behaviour analyzed 106 studies and found something counterintuitive: human-AI combinations often performed worse than either humans or AI alone. Performance degradation occurred in decision-making tasks, while content creation showed gains. The difference was not model quality. It was how humans and AI systems interacted.
This has direct implications for agent builders. Standard benchmarks miss the interaction layer entirely. An agent can score perfectly on retrieval benchmarks and still fail users because it cannot signal uncertainty or interpret requests in ways that diverge from user intent.
Research from GitHub and Accenture reinforces this complexity. While developers using AI assistants completed tasks 55% faster, a GitClear analysis found AI-generated code has 41% higher churn, indicating more frequent revisions. The productivity gains are real, but so is the gap between technically valid outputs and pragmatically correct ones.
Rethinking what AI evaluation should measure
The gap between benchmark performance and user trust points to a fundamental question: what should we actually be evaluating? Traditional metrics tell us whether an agent produced a correct output. They do not tell us whether users understood what the agent was doing, trusted the result or could recover when something went wrong.
This is where user experience methodology becomes essential. UX research has always focused on the gap between what systems do and what users experience. The same methods that reveal usability failures in traditional software reveal trust failures in AI agents. Interaction-layer evaluation applies this lens to agentic AI, shifting focus from “did the model perform well?” to “did the user experience work?”
This reframing reveals three dimensions that determine whether agents succeed in practice.
Does the agent understand what users actually want?
The most common interaction failure is invisible to traditional evaluation. An agent interprets a request differently than the user intended, produces a reasonable response to that interpretation, and passes every accuracy metric. The user, meanwhile, receives something they did not ask for.
This is the intent alignment problem. Standard evaluation cannot detect it because the agent’s interpretation was technically valid. The failure exists only in the gap between what the user meant and what the agent understood.
Effective evaluation measures this gap directly: How often do users correct agent interpretations? How frequently do they abandon tasks after the first response? How many times do they reformulate requests to clarify their original intent? These metrics reveal misalignment that accuracy scores hide.
The major platforms have recognized this challenge. OpenAI’s Operator agent implements explicit confirmation workflows, requiring user approval before consequential actions. Anthropic’s computer use documentation recommends human verification for sensitive tasks, assuming misalignment will occur and building recovery mechanisms accordingly. Microsoft’s HAX Toolkit codifies intent alignment as a design principle with 18 guidelines emphasizing accurate expectation-setting before agents act. Google’s Gemini provides API-level safety controls, leaving interaction-layer confirmation to implementation.
Does the agent know what it does not know?
Agents that express appropriate uncertainty earn trust. Agents that sound confident regardless of their actual reliability erode it. Yet standard evaluation treats all outputs the same: correct or incorrect, with no gradient in between.
This is the confidence calibration problem. Users need to know when to trust agent outputs and when to verify them. Without calibrated uncertainty signals, they either over-rely on unreliable outputs or waste time double-checking everything.
Effective evaluation tracks whether stated confidence levels predict actual reliability. If users override high-confidence outputs at the same rate as low-confidence ones, calibration is broken. If users rubber-stamp approvals regardless of uncertainty indicators, the signals are not reaching them.
Platform approaches to confidence vary significantly. Anthropic explicitly trains Claude to express epistemic uncertainty, with documentation noting that Claude refuses to answer approximately 70% of the time when genuinely uncertain. OpenAI’s models prioritize assertive responses, trading faster task completion against higher hallucination risk. Google provides technical logprobs for developers to assess token-level confidence, though surfacing this to users depends on implementation. Microsoft’s Copilot research found that users who verify AI recommendations make better decisions than those who accept them uncritically.
What do user corrections reveal about agent behavior?
Every time a user modifies agent output, they generate a signal about where the interaction layer is failing. Standard evaluation treats corrections as errors to minimize. Interaction-layer evaluation treats them as diagnostic data.
This is the correction pattern problem. The question is not just how often users correct agents, but what those corrections reveal. Did the agent misunderstand context? Apply wrong assumptions? Produce something technically correct but pragmatically useless?
Effective evaluation categorizes corrections by type and tracks trends over time. Rising rates in specific capability areas signal systematic problems. Consistent patterns across users reveal gaps that no benchmark would detect.
LinkedIn’s agentic AI platform, built on Microsoft’s infrastructure, captures this systematically: all generated emails must be editable and explicitly sent by the user, logging not just whether users edited but what they changed. Google’s PAIR Guidebook, used by over 250,000 practitioners, treats user corrections as training signal for understanding where models diverge from user mental models. Anthropic’s Constitutional AI uses structured feedback to identify systematic gaps between model behavior and user expectations, informing model updates rather than just flagging failures.
How UX research methods strengthen agent evaluation
Traditional AI evaluation relies on automated metrics. Interaction-layer evaluation requires understanding user behavior in context. This is where UX research methodology offers tools that engineering teams often lack.
- Task analysis identifies where agents need evaluation checkpoints. By mapping user workflows before building, teams discover high-stakes moments where intent misalignment causes cascading failures. An agent that misinterprets a request early in a complex workflow creates errors that compound with each subsequent step.
- Think-aloud protocols surface confidence calibration failures invisible to telemetry. When users verbalize their reasoning while interacting with agents, they reveal whether uncertainty signals are registering. A user who says “I guess this looks right” while approving a high-confidence output is exhibiting automation bias. No log file captures this; observation does.
- Correction taxonomies transform user modifications into actionable product signals. Rather than counting corrections as a single metric, categorize them: Did the agent misunderstand the request? Apply incorrect assumptions? Generate something technically valid but contextually wrong? Each category points to a different intervention.
- Diary studies for trust evolution over time. Initial agent interactions look nothing like established usage patterns. A user might over-rely on an agent in week one, swing to excessive skepticism after a failure in week two, then settle into calibrated trust by week four. Cross-sectional usability tests miss this arc entirely. Longitudinal diary studies capture how trust calibrates, or miscalibrates, as users build mental models of what the agent can actually do.
- Contextual inquiry for environmental interference. Lab conditions sanitize the chaos where agents actually operate. Watching users in their real environment reveals how interruptions, multitasking and time pressure shape how they interpret agent outputs. A response that seems clear in a quiet testing room gets confusing when someone is also checking Slack.
Just as important is collecting feedback in the moment. Ask users how they felt about an interaction three days later and you get rationalized summaries, not ground truth. For example, I did a research study to evaluate a voice AI agent, where I asked users to interact with it four times, with four different tasks, and collected user feedback immediately, in the moment, after every task. I collected feedback on the quality of conversation, turn-taking and tone changes and how that impacts the user and their trust in the AI.
This sequential structure catches what single-task evaluations miss. Did turn-taking feel natural? Did a flat response in task two make them speak more slowly in task three? By task four, you’re seeing accumulated trust or erosion from everything that came before.
These methods complement automated evaluation by revealing failure modes that metrics miss. Teams that integrate UX research into their evaluation cycle catch trust failures before they reach production.
Building AI evals into product development
Databricks’ approach to agent evaluation, using LLM judges alongside synthetic data generation, points toward scalable methods. But automated evaluation cannot replace understanding how users experience agent behavior in production.
Effective AI product development integrates interaction-layer evaluation throughout the cycle. This means defining evaluation criteria before building, not after. It means instrumenting for user behavior, not just model performance. Traditional observability captures latency and error rates; interaction-layer observability captures task abandonment, reformulation frequency and the nature of user corrections.
For teams building on foundation models from OpenAI, Anthropic, Google or Microsoft, evaluation cannot stop at API-level metrics. The same model succeeds or fails depending on how the interaction layer surfaces capabilities and limitations to users.
The trust imperative
The research is clear: Human-AI collaboration improves outcomes when agents behave in ways users can understand and predict. It degrades outcomes when agent behavior is technically correct but pragmatically opaque.
Model capability is no longer the bottleneck. The bottleneck is the interaction layer. Trust is not built by better benchmarks. It is built by evaluating the dimensions that benchmarks miss.
The teams that build effective AI agents will evaluate what matters to users, not just what matters to model developers. That trust will determine which agentic AI projects succeed and which join the 40% that get canceled.
Note: All views expressed are my own.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
How to create AI agents with Neo4j Aura Agent 19 Mar 2026, 9:00 am
You may be hearing a lot of buzz about knowledge graphs, GraphRAG, and ontologies in the AI space right now, especially around improving agent accuracy, explainability, and governance. But actually creating and deploying your own agents that leverage these concepts can be challenging and ambiguous. At Neo4j, we’re trying to make building and deploying agents more straightforward.
Neo4j Aura Agent is an end-to-end platform for creating agents, connecting them to knowledge graphs, and deploying to production in minutes. In this post, we’ll explore the features of Neo4j Aura Agent that make this all possible, along with links to coded examples to get hands-on with the platform.
Knowledge graphs, GraphRAG, and ontology-driven AI
Let’s define some key terms before we begin.
Knowledge graphs
Knowledge graphs are a design pattern for organizing and accessing interrelated data. There are many ways to implement them. At Neo4j, we use a Property Label graph, as shown below. Knowledge graphs provide context, standardization, and flexibility in the data layer, making them well-suited for semantic layers, long-term memory stores, and retrieval-augmented generation (RAG) stores.

Neo4j
GraphRAG
GraphRAG is retrieval-augmented generation where a knowledge graph is included somewhere on the retrieval path. GraphRAG improves accuracy and explainability over vector/document search and other SQL queries by leveraging the knowledge graph structure, which symbolically represents context in an expressive and compact manner, allowing you to retrieve more relevant data and, critically, more efficiently fit the relevant context in the context window of the large language model (LLM).

Neo4j
There are lots of GraphRAG retrieval techniques, but the three primary ones are:
- Graph-augmented vector search: Vector search is used to match relevant entities (as nodes or relationships), followed by graph traversal operations to identify and aggregate related context.
- Text-to-query: Text2Cypher (Cypher being the most popular graph query language) query that enables agents to query the graph based on its schema dynamically.
- Query templates: Parameterized, premade graph queries that enable precise, expertly reviewed graph query logic to be executed upon choice by the agent.
Ontology
An ontology is a formal representation of knowledge that defines the concepts, categories, properties, and relationships within a particular domain or area of study. You may have heard about ontologies in the AI space lately. In practice, it is just a data model in the form of a graph schema with additional metadata about the involved domain(s) and use case(s). Ontologies enable AI to reason and make inferences over your data easily. While often associated with Resource Description Framework (RDF) and triple stores, a property graph database (such as that in Neo4j provides the equivalent functionality with a graph schema when paired with an Aura agent system and system and tool prompts.
Neo4j Aura features
Neo4j Aura is a fully managed graph intelligence platform that includes a graph database and data services for importing, dashboarding, exploring, and deploying AI agents on top of data. You can create knowledge graphs to use with agents from structured or unstructured data, or a mix of both.
You can import structured data with Data Importer from a variety of data warehouses including RDBMS stores such as Snowflake, Databricks, and Postgres.

Neo4j
You can also import documents and unstructured data, performing entity extraction and merging graph data according to your schema, using the GraphRAG Python package by Neo4j, or by using other ecosystem tools with supported integrations such as Unstructured, LangChain, and LlamaIndex.
Once the data is imported into Neo4j, you can build an Aura agent on top of it. There are three basic steps:
- Creating your agent
- Adding tools
- Saving, testing, and deploying
You can find step-by-step details on the entire process, including all the necessary query and code snippets here. Below is a summary of the process.
First, creating an agent is easy. Simply provide some basic information: title, description, system prompt, and the database to serve as the agent’s knowledge graph.

Neo4j
Users can autogenerate an initial agent out of the box that they can further edit and tailor by providing a system prompt. The Aura agent will then use the graph schema and other metadata to configure the agent and its retrieval tools.

Neo4j
Neo4j Aura Agent provides three basic tool types, aligning the GraphRAG categories discussed above:
- Vector similarity search
- Text2Cypher
- Cypher templates

Neo4j
These three different types of tools can be used in combination by the agent to chain responses together, improving overall accuracy, especially when compared to using just vector search alone.
The knowledge graph provides an essential structure, allowing Text2Cypher queries to retrieve exactly the right data using the graph schema and user prompt to infer the right patterns. Templatized queries allow for even greater precision by using pre-specified query logic to retrieve exactly the right data.
When responding to users, Neo4j Aura Agent includes its reasoning. During testing, this can be opened in the response tab. This explains the agent’s reasoning and the tool query logic used. Because the Cypher query language expresses relationship patterns in a human-readable format, it can be translated easily to the user and to downstream AI systems, allowing for improved explainability across the entire AI system.

Neo4j
Neo4j Aura Agent can deploy into a production setting automatically, and this is perhaps one of its most significant benefits. Once you are satisfied with the agent’s performance in the UI testing playground, you can select a publicly available endpoint. Doing this will automatically deploy the agent to a secure endpoint that is authenticated via an API key/secret pair.

Neo4j
Neo4j Aura Agent provides managed LLM inference and embeddings for agent runtime, making it easy for users and removing the need to maintain separate accounts and provide API credentials from other model providers.
Once deployed, the Aura agent can be called directly or wrapped in a Model Context Protocol (MCP) server to invoke other downstream AI systems and applications.
Example: Calling through API
url --request POST "$ENDPOINT_URL" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"input": "find Motorola contracts"}' --max-time 60 \
| jq .
{
"content": [
{
"thinking": "The user is asking to find contracts related to a specific organization, \"Motorola\". The `identify_contracts_for_organization` tool is designed for this purpose, taking the organization name as input.",
"type": "thinking"
},
{
"id": "fdfa4d22-334d-44d1-861d-749f436097de",
"input": {
"organization_name": "Motorola"
},
"name": "identify_contracts_for_organization",
"type": "cypher_template_tool_use"
},
{
"output": {
"keys": [
"contract_id",
"agreement_type",
"contract_name",
"effective_date",
"renewal_term",
"expiration_date",
"party_names",
"party_roles",
"party_incorporated_countries"
],
"records": [
{
"agreement_type": "Intellectual Property Agreement",
"contract_id": 18,
"contract_name": "Intellectual Property Agreement",
"effective_date": "2014-04-14",
"expiration_date": "unable to determine",
"party_incorporated_countries": [
"United States",
"United States"
],
"party_names": [
"Motorola Solutions, Inc.",
"Zebra Technologies Corporation"
],
"party_roles": [
"Seller",
"Purchaser"
],
"renewal_term": "unable to determine"
}
],
"summary": {}
},
"tool_use_id": "fdfa4d22-334d-44d1-861d-749f436097de",
"type": "cypher_template_tool_result"
},
{
"thinking": "I have already identified the contracts associated with Motorola using the `identify_contracts_for_organization` tool. The output provides a list of contracts with relevant details, directly answering the original query.",
"type": "thinking"
},
{
"text": "Here are the contracts associated with Motorola:\n\n| contract_id | contract_name | agreement_type | effective_date | expiration_date | party_names | party_roles | party_incorporated_countries |\n|---|---|---|---|---|---|---|---|\n| 18 | Intellectual Property Agreement | Intellectual Property Agreement | 2014-04-14 | unable to determine | Motorola Solutions, Inc., Zebra Technologies Corporation | Seller, Purchaser | United States, United States |",
"type": "text"
}
],
"end_reason": "FINAL_ANSWER_PROVIDED",
"role": "assistant",
"status": "SUCCESS",
"type": "message",
"usage": {
"candidates_token_count": 226,
"prompt_token_count": 7148,
"thoughts_token_count": 301,
"total_token_count": 7675
}
Example: Wrapping in an MCP server and calling through Claude Desktop

Neo4j
Connecting agents to knowledge graphs
The promise of knowledge graphs for AI agents has been clear for some time—better accuracy, transparency in reasoning, and more reliable outputs. But turning that promise into reality has been another story entirely. The complexity of building knowledge graphs, configuring GraphRAG retrieval, and deploying production-ready agents has kept these benefits out of reach for many teams.
Neo4j Aura Agent represents an important first step in changing that. By providing a unified platform that connects agents to knowledge graphs in minutes rather than months, it removes much of the ambiguity that has held teams back. The low-code tool creation simplifies how agents achieve accuracy through vector search, Text2Cypher, and query templates working in concert. The built-in reasoning response and human-readable Cypher queries make explainability straightforward rather than aspirational. And the progression from playground testing to secure API deployment with managed inference eliminates the operational friction that often derails AI projects before they reach production.
This is not the final word on knowledge graph-powered agents, but it is a critical step forward. As organizations continue exploring how to make their AI systems more accurate, explainable, and governable, platforms that reduce complexity while preserving power will be essential. Neo4j Aura Agent points the way toward that future, making sophisticated GraphRAG capabilities accessible to teams ready to move beyond vector search and rigid knowledge management systems.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
9 reasons Java is still great 19 Mar 2026, 9:00 am
In a world obsessed with disruption, Java threads the needle between stability and innovation. It’s the ultimate syncretic platform, synthesizing the best ideas from functional programming, concurrency, cloud computing, and AI under a reliable, battle-tested umbrella.
Java unites meticulous planning with chaotic evolution, enterprise reality with open source ideals, along with a healthy dose of benevolent fortune. Let’s look at the key factors that make Java as much a champion today as it was in 1996.
1. The Java Community Process
At the heart of Java’s success are the developers and architects who love it. The Java community is vital and boisterous, and very much engaged in transforming the language. But what makes Java special is its governance architecture.
Far from a smoothly operating machine, Java’s governance is a riotous amalgam of competing interests and organizations, all finding their voice in the Java Community Process (JCP). The fractious nature of the JCP has been criticized, but over time it has given Java a massive advantage. The JCP is Java’s version of a functional democracy: A venue for contribution and conflict resolution among people who care deeply about the technology. The JCP is a vital forum where the will and chaos of the worldwide developer community negotiate with Java’s formal managing body.
2. OpenJDK
I still remember my astonishment when the Java language successfully incorporated lambdas and closures. Adding functional constructs to an object-oriented programming language was a highly controversial and impressive feat in its day. But that achievement pales in comparison to the more recent introduction of virtual threads (Project Loom), and the ongoing effort to unify primitives and objects (Project Valhalla).
The people working on Java are only half of the story. The people who work with it are the other half, and they reflect the diversity of Java’s many uses. Social coding and open source are not unique to Java, but they are key components in the health of the Java ecosystem. Like JavaScript, Java evolved in tandem with the coding community as the web gained traction. That origin story is a large part of its character. Java’s community responsiveness, including the existence of OpenJDK, ensures developers that we are working within a living system—one that is being continuously nurtured and cultivated for success in a changing world.
3. Open source frameworks and tools
Another major source of Java’s success is its wealth of open source frameworks and the tools built up around it. Java has one or more high-quality libraries for just about any task you can imagine. If you like a project, there’s a good chance it’s open source and you can contribute to it, which is great for both learning and building community.
The wealth of projects in the Java ecosystem extends from modest examples to foundational components of the Internet. Classic workhorses like Hibernate and Jetty are still vital, while the landscape has broadened to include tools that define the cloud era. We now have Testcontainers, which revolutionized integration testing by bringing Docker directly into the test lifecycle. And we have Netty, the asynchronous networking engine that quietly powers everything from high-frequency trading platforms to video games.
Perhaps most exciting, we have the new wave of AI integration tools like LangChain4j, which bridge the gap between stable enterprise systems and the wild west of LLMs. These are all open source projects that invite contributors, creating a set of capabilities that is unrivaled in its depth.
4. The Spring framework
No appreciation for Java’s ecosystem would be complete without tipping the hat to Spring. For years, Spring was the undisputed standard to which all other Java-based frameworks aspired. Today, it remains a dominant force in the enterprise.
With Spring, developers use the same facility to compose custom code as they do to incorporate third-party code. With dependency injection and inversion of control (IoC), Spring both supports standardizing your own internal components and ensures third-party projects and vendor components meet the same standard. All of this allows for greater consistency in your programs.
Of course, there are valid critiques of Spring, and it’s not always the right tool for the job. Google Guice is another tool that works similarly. But Spring is the framework that first offered a clean and consistent way to provision and compose application components. That was a game changer in its time and continues to be vital today. And, of course, the addition of Spring Boot has made Spring even easier to adopt and learn.
5. Java microframeworks
Next up are the cloud-native Java microframeworks: Quarkus, Micronaut, and Helidon. These frameworks launched Java into the serverless era, focusing on sub-second startup times and low-memory footprints. Fierce competition in this space forced the entire ecosystem to evolve faster.
Today, Java developers don’t just inherit a stack: They choose between robust options that all play nicely with the modern cloud. This social coding environment allows Java to absorb the best ideas from newer languages while retaining its massive library of existing solutions.
6. The miracle of virtual threads
Threads have been the core concurrency abstraction since time immemorial, not only for Java but for most languages. Threads of old mapped directly to operating system processes, but Java conclusively evolved beyond that model with the coming of virtual threads.
Similar to how Java once moved memory management into the JVM, it has now moved threading there. When you use virtual threads—now the default concurrency mechanism for Java—you get an instance of a lightweight object that is orchestrated by a highly optimized pool. These are intelligently farmed out to actual worker threads, invisible to you as the developer.
The efficiency boost can be mind-blowing. Without any extra work on your end, virtual threads can take a server from thousands to millions of concurrent requests. Successfully patching such a widely deployed platform in such a fundamental way—in full view of the entire industry—stands as one of the truly great achievements in the history of software.
7. Data-oriented programming
In a development landscape enamored of functional programming, it has become fashionable to trash-talk Java’s object orientation. And while Java’s stewards have responded by incorporating functional programming idioms into the language, they’ve also steadfast insisted that Java remains a strongly object-oriented language, where everything is, indeed, an object.
You can write awesome code in any paradigm, and the same is true for awful code. In a Java system, you know up front that the language is strongly typed and that everything will be contained in classes. (For the one exception, see the last item below.) The absoluteness of this design decision cuts away complexity and lends a cleanness to the Java language that stands the test of time. Well-written Java programs have the mechanical elegance of interacting objects; components in a Java-based system interact like gears in a machine.
Now, rather than abandoning its object-oriented programming roots, Java has evolved by embracing data-oriented programming as another layer. With the arrival of records, pattern matching, and switch expressions, Java solved its historical verbosity problem. We can now model data as immutable carriers and process it with the conciseness of a functional language. Data-oriented constructs offer the elegance of object models without the boilerplate that once drove developers crazy.
8. The JVM (better than ever)
Once viewed as a heavy abstraction layer, the Java virtual machine is recognized as a masterpiece of engineering today. In devops containers and serverless architectures, the JVM offers a well-defined deployment target. Modern Java virtual machines are a wonder to behold. They deliver sophisticated automatic memory management with out-of-the-box performance approaching C.
Now, the JVM is undergoing its most significant transformation yet. I wrote about Project Valhalla a while ago, describing it as Java’s epic refactor. Today, that prediction is a reality. For decades, Java objects paid a memory tax in the form of headers and pointers. Valhalla removes this by introducing value classes, allowing developers to create data structures that code like a class, but work like an int.
Value classes flatten memory layouts and eliminate the cache misses that modern CPUs hate. Moreover, they bring all Java types into a single mental model (no more “boxing”). Project Valhalla proves the JVM isn’t just a static runtime but a living system, capable of replacing its own engine, even while flying the plane.
9. AI integration and orchestration
When the AI boom first hit, Python wrote all the models but Java still ran the business back end. Now, Java is fast becoming the universal AI layer, merging business and AI technology.
Tools like LangChain4j and Spring AI are transforming Java into an AI orchestration engine for the enterprise. These frameworks allow developers to integrate LLMs with the proven safety, security, and type-checking of the JVM. While Python is great for experimentation, Java is the platform you use when you need to connect an AI agent to your banking system, your customer database, or your secure cloud infrastructure.
Conclusion
Software development is made up of two powerful currents: the enterprise and the creative. There’s a spirit of creative joy to coding that is the only possible explanation for, say, working on a dungeon simulator for 25 years. That fusion of creativity with a solid business use case is the alchemy that keeps Java alive and well. So far, and into the foreseeable future, Java remains the syncretic champion. It’s boring enough to trust—yet daring enough to win.
OpenAI’s $50B AWS deal puts its Microsoft alliance to the test 19 Mar 2026, 1:05 am
Despite OpenAI’s multiple re-affirmations that its relationship with Microsoft is strong and central, in view of recent developments, Redmond doesn’t seem to be convinced.
According to reports, the tech giant is considering legal action against OpenAI and Amazon over the $50 billion cloud deal the two recently struck to make Amazon Web Services (AWS) the exclusive third-party cloud distribution provider for OpenAI Frontier.
This third-party exclusivity agreement could conflict with OpenAI’s existing Azure partnership. Unnamed Microsoft execs purportedly consider the AWS arrangement unworkable, and say it breaches, if not explicitly, but in principle, their agreement with the AI darling.
The three companies are said to be in discussions to resolve the issue before Frontier goes live following a limited preview, without having to resort to litigation.
“This is a tricky issue, and prospective early adopters of the OpenAI-AWS Frontier capabilities will need to proceed with caution,” said Scott Bickley, advisory fellow at Info-Tech Research Group. The OpenAI-Microsoft agreement is “quite convoluted, and contains several provisions that lack absolute clarity in terms of where boundaries reside for IP use and IP sharing, likely by design.”
Is OpenAI double-dipping with Microsoft and AWS?
In late February, AWS and OpenAI announced their intentions to “co-create” a stateful runtime environment, powered by OpenAI models, that would be made available on Amazon Bedrock for AWS customers. “Stateful AI” is meant to overcome the challenges of so-called “stateless AI,” where models offer one-off answers without factoring in context from previous sessions.
According to the agreement, AWS would not only invest another $50 billion in OpenAI, but would be the exclusive third-party cloud provider for Frontier, which is currently in limited preview with a small group of AI-native companies including Abridge, Ambience, Clay, Decagon, Harvey, and Sierra. OpenAI says it will soon expand the program to other AI builders.
AWS has also agreed to give OpenAI 2GW of Trainium capacity to support demand for the stateful environment, Frontier, and “other advanced workloads.” Further, the two companies would develop models specifically for Amazon applications, and expand their existing $38 billion multi-year agreement to $100 billion over eight years.
However, at the time of that announcement, OpenAI also felt the need to concurrently announce that nothing about its collaborations with other tech companies “in any way” changed the terms of its partnership with Microsoft. Azure would remain the exclusive cloud provider of stateless OpenAI APIs.
The two companies stressed that, as in their original agreement:
- OpenAI has the flexibility to commit to compute elsewhere, including through infrastructure initiatives like the Stargate project.
- Both companies can independently pursue new opportunities.
- An ongoing revenue share arrangement will stay the same; however, that agreement has “always” included revenue-sharing from partnerships between OpenAI and other cloud providers.
OpenAI and Microsoft also underscored the fact that the tech giant will maintain an exclusive license and access to intellectual property (IP) across OpenAI models and products, and that OpenAI’s Frontier and other first-party products would continue to be hosted on Azure.
They stated that their ongoing partnership “was designed to give Microsoft and OpenAI room to pursue new opportunities independently, while continuing to collaborate, which each company is doing, together and independently.”
This re-affirmation followed yet another affirmation of the “next chapter” of the companies’ collaboration in October 2025. Microsoft was one of OpenAI’s earliest financial backers, investing $1 billion in 2019 and $10 billion in 2023.
Concerns about potential future lock-in
Clearly, OpenAI has for some time sought to maintain its independence, while seeking out strategic partnerships with the biggest names in tech. The ChatGPT builder seems to have struck (or is in the midst of striking) deals with nearly every big company out there, including Nvidia, Cerebras, Cisco, Accenture, Snowflake, Oracle, and many others.
“OpenAI is seeking to exploit a loophole between what rights Microsoft has to ‘stateless’ versus ‘stateful’ implementations of LLM models,” Info-Tech’s Bickley observed. Stateful is essential to multi-step agentic workflows, he noted, as it allows AI agents to retain memory and context over time.
But, as with many things, “the devil may reside in the details,” he said, as the AWS announcement calls for the creation of a stateful runtime environment. So, for instance, if Frontier is simply an orchestration layer designed to ensure that API calls are made to an Azure-hosted LLM, Microsoft would get paid for that usage.
The reality is that OpenAI has little choice but to “push the boundaries” of its agreement with Microsoft, and to develop products hosted and used on other hyperscaler clouds, Bickley said. The market is “too big to ignore the AWS and [Google Cloud Platforms] of the world.” Additionally, OpenAI’s massive forecasts of its requirements for capacity (250 GW of data center demand), revenue, and expense/cash burn necessitate a global use case.
“OpenAI is dependent on raising massive amounts of capital to fund this growth trajectory,” said Bickley, and the $50 billion Amazon investment is predicated on the delivery of Frontier.
However, the recent reaffirmation of the Microsoft relationship “muddies the waters,” because it grants OpenAI the right to strike deals with cloud rivals, as long as Microsoft retains its rich revenue-sharing agreement and exclusive hold over stateless models, he noted. This seems to imply that stateful models “may be out of this exclusive IP scope.”
Ultimately, “Microsoft’s aggressive legal response is standard fare for IP disputes among large tech firms, and should not scare away would-be customers,” Bickley emphasized, adding that it will likely be resolved via negotiations.
However, an additional looming issue is the potential for vendor lock-in, he noted. Frontier is tied to OpenAI’s architecture, and now adds “additional lock-in layers” for customer data stored in AWS, along with proprietary orchestration layers through which AI agents will flow. Therefore, as these agentic workflows begin to manage critical enterprise processes, customers’ business workflows could be “distinctly tied” to AWS.
“This will be quite sticky and difficult to migrate off of in the future, assuming there is an alternative to migrate to,” said Bickley.
This article originally appeared on NetworkWorld.
Java future calls for boosts with records, primitives, classes 18 Mar 2026, 11:46 pm
Oracle’s latest Java language ambitions are expected to offer improvements in records, classes, primitives, and arrays. As part of these plans, pending features not now marked for a future release of the language are under consideration to officially be part of Java.
In a March 17 presentation at the JavaOne conference in Redwood City, Calif., Oracle’s Dan Smith, senior developer in the company’s Java platform group, cited planned features for inclusion, but added that these features may change or go away instead. New Java language features include preserving the feel of Java and minimizing disruption, making it easier to work with immutable data, being more declarative and less imperative, and minimizing the seams between different features. Reducing the “activation energy” for Java also was cited as a theme.
Among the features under consideration is value classes and objects, a Java Enhancement Proposal (JEP), which calls for enhancing the Java platform with value objects: class instances that have only final fields and lack object identity. Created in August 2020 and updated this month, this proposal is intended to allow developers to opt-in to a programming model for domain values in which objects are distinguished solely by the values of their fields, much as the int value 3 is distinguished from the int value 4. Other goals of this proposal include supporting compatible migration of existing classes that represent domain values to this programming model, and maximizing the freedom of the JVM to store domain values to improve memory footprint, locality, and garbage collection efficiency.
The derived record creation JEP in preview, meanwhile, would provide a concise means to create new record values derived from existing record values. The proposal also is intended to streamline the declaration of record classes by eliminating the need to provide explicit wither methods, which are the immutable analog of setter methods. Records are immutable objects, with developers frequently creating new records from old records to model new data. Derived creation streamlines code by deriving a new record from an existing record, specifying only the components that are different, according to the proposal, created in November 2023 and marked as updated in April 2024.
Also cited by Smith were the enhanced primitive boxing JEP, which is a feature in preview, and the primitive types in patterns, instanceof, and switch JEP, a feature actually undergoing its fourth preview in JDK 26. Enhanced primitive boxing, created in January 2021 and marked as updated in November 2025, uses boxing to support language enhancements that treat primitive types more like reference types. Among goals is allowing boxing of primitive values when they are used as the “receiver” of a field access, method invocation, or method reference. Also on the agenda for this JEP is supporting primitive types as type arguments, implemented via boxing at the boundaries with generic code. Unboxed return types would be allowed when overriding a method with a reference-typed return. The primitive types feature, meanwhile, calls for enhancing pattern matching by allowing primitive types in all pattern contexts and by extending instanceof and switch to work with all primitive types. This feature was created in June 2025 and last updated in December 2025.
For arrays, plans under consideration involve declarative array creation expressions, final arrays, non-null arrays, and covariant primate arrays. Declarative array creation covers capabilities including having a lambda to compute initial values. With final arrays, components cannot be mutated and must be declaratively initialized. Covariant primitive arrays can treat an int[] as a non-null Integer[]. Boxes can be accessed as needed.
Edge.js launched to run Node.js for AI 18 Mar 2026, 11:39 pm
Wasmer has introduced Edge.js as a JavaScript runtime that leverages WebAssembly and is designed to safely run Node.js workloads for AI and edge computing. Node apps can run inside a WebAssembly sandbox.
Accessible from edgejs.org and introduced March 16, Edge.js is intended to enable existing Node.js applications to run safely and with startup times impossible to get with containers, according to Wasmer. Instead of introducing new APIs, Edge.js preserves Node compatibility and isolates the unsafe parts of execution using WebAssembly. Existing Node.js applications and native modules can run unmodified while system calls and native modules are sandboxed through WASIX, an extension to the WebAssembly System Interface (WASI). WASIX was designed to make WebAssembly more compatible with POSIX programs, enabling seamless execution of more complex applications in both server and browser environments.
Reimagining Node.js, Edge.js is sandboxed via --safe mode. It is built for AI and serverless workloads, Wasmer said. Edge.js currently supports the V8 and JavaScriptCore JavaScript engines. The architecture is engine-agnostic by design. Plans call for adding support for the QuickJS and SpiderMonkey engines. Additional engines are welcome.
Edge.js is currently about 5% to 20% slower than current Node.js when run natively, and 30% slower when run fully sandboxed with Wasmer. In some cases, when NativeWasm work is intense, as when doing HTTP benchmarks, there could be a bigger gap. Wasmer intends to focus on closing that gap for Edge.js 1.0 and for the next releases of Wasmer.
Snowflake’s new ‘autonomous’ AI layer aims to do the work, not just answer questions 18 Mar 2026, 1:00 pm
Snowflake has taken the covers off a product, currently under development, which it describes as an “autonomous” AI layer that promises to turn its data cloud from a place that answers questions about data into one that actually does the work: stitching together analysis, reports, and even slide decks on behalf of business users.
Named as Project SnowWork, the new conversational AI interface offering, which combines Snowflake’s existing technologies, such as its AI Data Cloud, Snowflake Intelligence, and Cortex Code, is Snowflake’s attempt to implant itself into enterprise workflows, Bala Kasiviswanathan, VP of developer and AI experiences at Snowflake, told InfoWorld.
“Project SnowWork comes from a pretty simple belief: if AI is going to really matter in the enterprise, it has to first work for everyday workflows, and it has to be deeply connected to the data and systems that actually run the business,” Kasiviswanathan said.
“The idea is to have AI act more like a proactive collaborator. So instead of just asking questions, business users across functions like finance, marketing, or sales can ask for outcomes. Things like putting together a board-ready forecast, identifying churn risks, generating a report with recommended actions, or digging into supply chain issues,” Kasiviswanathan added.
What’s in it for enterprises?
Analysts say SnowWork could be valuable to enterprises, especially in accelerating operational business decisions and reducing the workload burden on data practitioners, which is often the real cause of delay.
“Every Fortune 500 company we talk to has the same bottleneck. A head of sales wants to understand regional churn patterns, so they file a ticket with the data team. Three weeks later, they get a CSV and a shrug. By then, the decision window has closed, and they’ve already gone with gut instinct. That cycle is broken, and everyone knows it,” said Ashish Chaturvedi, leader of executive research at HFS Research.
SnowWork, according to Chaturvedi, promises to cut that queue to zero as users can get a finished analysis directly in minutes without having to engage a data practitioner.
“If it works as advertised, the productivity unlock is substantial. Not just the time saved, but timely decisions can be made while the information is still warm,” the analyst added.
In fact, removing the need to engage a data practitioner for analysis, according to Moor Insights and Strategy principal analyst Robert Kramer, will allow data teams to spend more time on governance, modeling, and oversight instead of handling repetitive requests.
Play for enterprise AI land grab
Snowflake’s Kasiviswanathan also pitched SnowWork against other chatbots and AI assistants, claiming that it was more accurate and far less reliant on manual coordination because it runs on secured and governed enterprise data.
However, analysts say this is a clever strategy to increase stickiness of its platform, as nearly all technology vendors, including Microsoft, Google, AWS, Salesforce, ServiceNow, Workday, OpenAI, and Anthropic, are moving in aggressively to try and own the majority share of AI in enterprises with their own offerings.
“This is about platform stickiness through surface area expansion. Snowflake’s core data cloud business is in a knife fight — Databricks is breathing down its neck, open-source alternatives are chipping away at the margins, and enterprise CFOs are getting louder about consumption costs,” said HFS’ Chaturvedi.
“Today, the average business user has never logged into Snowflake. Their experience of the platform is indirect, which is filtered through a BI tool. SnowWork puts Snowflake directly on the business user’s desktop, and that changes the commercial gravity entirely. You go from being a back-end utility that procurement reviews once a year to a front-office productivity layer that hundreds of people touch every day,” Chaturvedi pointed out.
That strategy also pitches it directly against Microsoft, Google, and Salesforce, the analyst further said, because if those vendors succeed in making their AI layers the default workspace for enterprise employees, Snowflake could find itself reduced to the pipes underneath the stack — essential but interchangeable, and far removed from the everyday users it now hopes to reach, Chaturvedi added.
Compression of the enterprise technology stack
More broadly, though, the analyst says this could be a part of a broader industry shift where the traditional enterprise technology stack is getting compressed like an accordion.
“The old model had five distinct layers, including data warehouse, BI tool, analyst, deliverable, and decision-maker. Each handoff added latency, cost, and the telephone-game risk of lost context. SnowWork collapses that into three layers comprising data platform, autonomous agent, decision-maker,” Chaturvedi said.
“Every major platform player is making a version of this move. Databricks is building lakehouse apps. Salesforce has Agentforce. Microsoft has Copilot wired into everything. ServiceNow is embedding agentic workflows,” Chaturvedi added.
Still, for all the ambition, there are some obvious caveats, especially SnowWork’s vision that comes with its share of unanswered questions, analysts say.
HFS’s Chaturvedi was skeptical because the product is still under development and Snowflake didn’t reveal the pricing model: “If SnowWork compresses your decision cycle from three weeks to three minutes but triples your Snowflake bill, the CFO math gets complicated fast.”
Similarly, HyperFRAME Research’s practice lead of AI stack Stephanie Walter hinted at vendors’ credibility gap, in general, around AI execution in enterprise or production settings.
“In practice, enterprise AI has shown mixed results when it comes to producing fully usable, end-to-end deliverables without significant human oversight. Moving from assisted analysis to autonomous output is a non-trivial leap, and SnowWork will need to prove that its agents can consistently deliver accurate, contextually correct outcomes before enterprises fully trust it as a system of action,” Walter said.
Snowflake has yet to announce a launch date or timeline, as SnowWork is being tested by select Snowflake customers.
Markdown is now a first-class coding language: Deal with it 18 Mar 2026, 9:00 am
Folks are all in a tizzy because a guy posted some Markdown files on GitHub.
Mind you, it’s not just any guy, and they aren’t just any Markdown files.
The guy is Garry Tan, president and CEO of Y Combinator, which is among the most widely known startup incubator and venture capital firm in tech. Garry is a long-time builder, having founded the blogging platform Posterous.
The Markdown files? Tan created what he calls gstack — a collection of Claude Code skills that help to focus Claude in on the specific steps of developing a software product. And, yes, they are done in Markdown. Just a bunch of text files. Some love it, and some are … not so impressed.
A little Markdown backstory
Tan seems to have had a typical career arc — started out coding, built something, had success, and went into management, thus abandoning code. And like many people who leave coding behind for leadership positions, he missed coding:

Foundry
What Tan found 45 days prior to the above tweet was Claude Code. He discovered what many of us have — that agentic coding can be an almost unbelievable experience. He, like many others, found he could do in days what normally would take teams of people months to do.
Tan found, as we all do, that Claude can be a bit unfocused in its work. It will do exactly what it is asked to do and often can’t see the big picture. Tan calls this “mushy mode.” He uses that phrase because Claude can be trained to behave better, and given the right input, can be more focused in specific areas, Tan created gstack to give Claude the capability to play certain roles — product manager, QA, engineering, DevOps, etc.
People seem to be losing their minds over this.
Some folks are calling it, “God mode for development,” but others are saying, “This is just a bunch of prompts.”
Tan posted the repo on Product Hunt, and some folks were less than enthused.
Mo Bitar posted a video calling Tan “delusional” and implying that he’s succumbed to the sycophancy of AI. The comments indicate he is not alone. Bitar points out, too, that it is “just a bunch of text files.”
It is this point that I want to home in on. Yep, it is just “a bunch of text files posted on Github.” (though that is not strictly true — the repo does include code to build binaries to help Claude browse web apps better), but guess what? All the code you carefully craft by hand without the aid of AI is “just a bunch of text files posted on GitHub.” Your Docker files, JSON, and YAML are all “just a bunch of text files.” Huh.
Here is something for folks to consider: Markdown is the new hot coding language. Some folks write Python, some folks write TypeScript, and now, some folks write Markdown. Humans use compilers to convert C++ code into working apps. Now, humans can use Claude to convert Markdown into working apps.
Everything computer science has done over the last hundred years has been to improve our abstraction layers. We used to code by literally flipping mechanical switches. Then we built automatic switches and figured out how to flip them electrically. Then we figured out how to flip them automatically using binary code, and then assembler code, and then higher-level languages.
Markdown is the new hot coding language. Deal with it.
We mistook event handling for architecture 18 Mar 2026, 9:00 am
Events are essential inputs to modern front-end systems. But when we mistake reactions for architecture, complexity quietly multiplies. Over time, many front-end architectures have come to resemble chains of reactions rather than models of structure. The result is systems that are expressive, but increasingly difficult to reason about.
A different architectural perspective is beginning to emerge. Instead of organizing systems around chains of reactions, some teams are starting to treat application state as the primary structure of the system. In this model, events still occur, but they no longer define the architecture; they simply modify state. The UI and derived behavior follow from those relationships.
This shift toward state-first front-end architecture offers a clearer way to reason about increasingly complex applications.
When reaction became the default
Front-end engineering runs on events. User interactions, network responses, timers, and streaming data constantly enter our applications, and our systems are designed to respond to them. Events are unavoidable; they represent moments when something changes in the outside world. It’s no wonder that we have become remarkably sophisticated in how we process events. We compose streams, coordinate side effects, dispatch updates, and build increasingly expressive reactive pipelines. Entire ecosystems have evolved around structuring these flows in disciplined and predictable ways.
As applications grew more dynamic and stateful, that sophistication felt not only justified but necessary. Yet somewhere along the way, we began treating event handling not merely as a mechanism, but as architecture itself. We started to think about systems primarily in terms of events and reactions. That subtle shift changed how we reason about systems. Instead of modeling what is true, we increasingly modeled how the system reacts.
And that is where complexity quietly began to accumulate.
Redux and the era of structured change
One of the most influential milestones in modern front-end architecture was the rise of Redux. Redux introduced a compelling discipline: State should be centralized, updates should be predictable, and changes should flow in a unidirectional manner. Instead of mutating values implicitly, developers dispatched explicit actions, and reducers computed new state deterministically.
Redux brought structure where there had been chaos. It introduced a discipline that made state transitions explicit and traceable, which in turn made debugging more systematic and application behavior easier to reason about.
More importantly, Redux normalized a particular way of thinking about front-end systems. Centralized stores, action dispatching, side-effect layers, and explicit update flows became architectural defaults. Variations of this model appeared across frameworks and libraries, influencing how teams structured applications regardless of the specific tools they used.
Even when implementations differed, the underlying assumption remained consistent: Architecture was largely about controlling how events move through the system. This was a major step forward in discipline. But it also reinforced a deeper habit — organizing our mental models around reactions.
Events are inputs, but architecture is structure
An event tells us what just happened. A user clicked a button. A request has been completed. A value changed. Architecture answers a different question: What is true right now?
Events are transient. They describe moments in time. Architecture defines relationships that persist beyond those moments. When systems are organized primarily around events, behavior is often modeled as a chain of reactions: This dispatch triggers that update, which causes another recalculation, which notifies a subscriber elsewhere.
In smaller systems, that chain is easy to follow. In larger systems, understanding behavior increasingly requires replaying a timeline of activity. To explain why a value changed, you trace dispatches and effects. To understand dependencies, you search for subscriptions or derived selectors. The structure exists, but it is implicit in the flow. And implicit architecture becomes harder to reason about as scale increases.
The cognitive cost of flow-centric thinking
Event-driven models, especially in their more structured forms, provided front-end engineering with much-needed rigor. They allowed teams to tame asynchronous complexity and formalize change management.
However, expressiveness does not automatically produce clarity. As applications grow, flow-oriented designs can obscure structural relationships. Dependencies between pieces of state are often inferred from dispatch logic rather than expressed directly. Derived values may be layered through transformations that require understanding not just what depends on what, but when updates propagate.
Thus, event-driven models introduce a subtle cognitive burden. Engineers must simulate execution over time instead of inspecting relationships directly. Questions that should be straightforward — What depends on this value? What recalculates when it changes? — often require tracing reactive pathways through the system.
The more sophisticated the orchestration, the more effort it takes to understand the architecture as a whole.
The shift toward state-first thinking
A quieter architectural shift is emerging across modern front-end development. Rather than organizing systems primarily around what just happened, teams are increasingly organizing them around what is currently true.
In a state-first model, change does not propagate because an event is fired. It propagates because relationships exist. Dependencies are declared explicitly. Derived values are expressed as direct functions of the underlying state. When something changes, the system recalculates what depends on it in a deterministic manner — not because we manually choreographed the flow, but because we described the relationships.
Events remain essential. User interactions and network responses continue to drive applications forward. The difference is that events resume their proper role as inputs that modify state, rather than serving as the backbone of architectural reasoning. Instead of replaying timelines, engineers inspect relationships. Instead of coordinating flows, they model structure.
This shift does not eliminate reactivity; it refines it.
Redefining front-end architectural skill
For years, front-end mastery often meant orchestrating events with precision: dispatching actions cleanly, managing side effects thoughtfully, and coordinating asynchronous boundaries without introducing chaos. Those skills remain valuable.
But architectural maturity increasingly depends on something deeper: the ability to model state clearly, define dependencies explicitly, and design systems whose behavior can be understood by examining structure rather than replaying history.
Redux was a major step forward. It disciplined change and made the event flow traceable. Yet architecture does not end at disciplined dispatch. Architecture begins when relationships are first-class, when state, derivation, and dependency are visible and intentional rather than consequences of flow.
This shift is already visible across modern frameworks. Systems like Angular Signals, fine-grained reactive models, and state-driven UI architectures are all converging on the same idea: The structure of state should define system behaviour, not the choreography of events.
I describe this emerging model as “state-first front-end architecture,” where application state becomes the primary source of truth, and the UI is derived from it rather than driven by chains of events.
The real question for modern front-end teams is no longer “How do we react to this event?” It is “What is the simplest, clearest way to model what is true?”
When we begin with structure instead of reaction, complexity tends to shrink. Systems become easier to explain, easier to test, and easier to evolve. Events still enter the system, but they no longer define it.
That shift may sound philosophical, but it has practical consequences. It changes how we design components. It changes how we organize the state. It changes how we reason about scale.
Events are indispensable. They are the inputs that move our applications forward. But architecture is not about what just happened. It is about what remains true.
Events will always enter our systems, but they should not define their architecture. The next generation of front-end systems will be shaped less by how elegantly we orchestrate events and more by how clearly we model state. Frameworks like Angular Signals suggest that this transition has already begun, pointing toward a future where UI is treated primarily as a projection of state rather than a reaction to events.
I ran Qwen3.5 locally instead of Claude Code. Here’s what happened. 18 Mar 2026, 9:00 am
If you’ve been curious about working with services like Claude Code, but balk at the idea of hitching your IDE to a black-box cloud service and shelling out for tokens, we’re steps closer to a solution. But we’re not quite there yet.
With each new generation of large language models, we’re seeing smaller and more efficient LLMs for many use cases—small enough that you can run them on your own hardware. Most recently, we’ve seen a slew of new models designed for tasks like code analysis and code generation. The recently released Qwen3.5 model set is one example.
What’s it like to use these models for local development? I sat down with a few of the more svelte Qwen3.5 models, LM Studio (a local hosting application for inference models), and Visual Studio Code to find out.
Setting up Qwen3.5 on my desktop
To try out Qwen3.5 for development, I used my desktop system, an AMD Ryzen 5 3600 6-core processor running at 3.6 Ghz, with 32GB of RAM and an RTX 5060 GPU with 8GB of VRAM. I’ve run inference work on this system before using both LM Studio and ComfyUI, so I knew it was no slouch. I also knew from previous experience that LM Studio can be configured to serve models locally.
For the models, I chose a few different iterations of the Qwen3.5 series. Qwen3.5 comes in many variations provided by community contributors, all in a range of sizes. I wasn’t about to try the 397-billion parameter version, for instance: there’s no way I could crowbar a 241GB model into my hardware. Instead, I went with these Qwen3.5 variants:
qwen3.5-9b@q5_1: A 9.5-billion parameter version, which weighs in at a mere 6.5GB, and uses 5-bit quantization.qwen3.5-9b-claude-4.6-opus-reasoning-distilled: A community variant of the 9.5-billion parameter model—this one “enhanced with reasoning data distilled from Qwen3.5-27B” and using 4-bit quantization.qwen3.5-4b: A third variant, with 4 billion parameters and 6-bit quantization.
In each case, I was curious about the tradeoffs between the model’s parameter size and its quantization. Would smaller versions of the same model have comparable performance?
Running the models on LM Studio did not automatically allow me to use them in an IDE. The blocker here was not LM Studio but VS Code, which doesn’t work out of the box with any LLM provider other than GitHub Copilot. Fortunately, a third-party add-on called Continue lets you hitch VS Code to any provider, local or remote, that uses common APIs—and it supports LM Studio out of the box.

Continue is a VS Code extension that connects to a variety of LLM providers. It comes with built-in connectivity options for LM Studio.
Foundry
Setting up the test drive
My testbed project was something I’m currently developing, a utility for Python that allows a Python package to be redistributed on systems without the Python runtime. It’s not a big project— one file that’s under 500 lines of code—which made it a good candidate for testing a development model locally.
The Continue extension lets you use attached files or references to an open project to supply context for a prompt. I pointed to the project file and used the following prompt for each model:
The file currently open is a utility for Python that takes a Python package and makes it into a standalone redistributable artifact on Microsoft Windows by bundling a copy of the Python interpreter. Examine the code and make constructive suggestions for how it could be made more modular and easier to integrate into a CI/CD workflow.
When you load a model into memory, you can twiddle a mind-boggling array of knobs to control how predictions are served with it. The two knobs that have the biggest impact are context length and GPU offload:
- Context length is how many tokens the model can work with in a single prompt; the more tokens, the more involved the conversation.
- GPU offload is how many layers of the model are run on the GPU to speed it up; the more layers, the faster the inference.
Turning up either of these consumes memory—system and GPU memory, both—so there are hard ceilings to how high they can go. GPU offload has the biggest impact on performance, so I set that to the maximum for each model, then set the context length as high as I could for that model while still leaving some GPU memory.

Serving predictions locally with LM Studio, by way of the Continue plugin for VS Code. The Continue interface doesn’t provide many of the low-level details about the conversation that you can see in LM Studio directly (e.g., token usage), but does allow embedding context from the current project or any file.
Foundry
Configuring the three models
qwen3.5-9b@q5_1 was the largest of the three models I tested, at 6.33GB. I set it to use 8,192 tokens and 28 layers, for 7.94GB total GPU memory use. This proved to be way too slow to use well, so I racked back the token count enough to use all 32 layers. Predictions came far more snappily after that.
qwen3.5-9b-claude-4.6-opus-reasoning-distilled weighed in at 4.97GB, which allowed for a far bigger token length (16,000 tokens) and all 32 layers. Out of the gate, it delivered much faster inference and tokenization of the input, meaning I didn’t have to wait long for the first reply or for the whole response.
qwen3.5-4b, the littlest brother, is only 3.15GB, meaning I could use an even larger token window if I chose to, but I kept it at 16,000 for now, and also used all 32 layers. Its time-to-first reply was also fast, although overall speed of inference was about the same as the previous model.

A variety of Qwen3.5 models with different sizes and quantizations. Some are far too big to run comfortably on commodity hardware; others can run on even a modest PC.
Foundry
The good, the bad, and the busted
With each model, my query produced a slew of constructive suggestions: “Refactor the main entry point to use step functions,” or “Add support for environment variables.” Most were accompanied by sample snippets—sometimes full segments of code, sometimes brief conceptual designs. As with any LLM’s output, the results varied a lot between runs—even on the same model with as many of the parameters configured as possible to produce similar output.
The biggest variations between the outputs were mainly in how much detail the model provided for its response, but even that varied less than I expected. Even the the smallest of the models still provided decent advice, although I found the midsized model (the “distilled” 9B model) struck a good balance between compactness and power. Still, having lots of token space didn’t guarantee results. Even with considerable context, some conversations stopped dead in the middle for no apparent reason.
Where things broke down across the board, though, is when I tried to let the models put their advice into direct action. Models can be provided with contextual tools, such as changing code with your permission, or looking things up on the web. Unfortunately, most anything relating to working with the code directly crashed out hard, or only worked after multiple attempts.
For instance, when I tried to let the “distilled” 9B model add recommended type hints to my project, it failed completely. On the first attempt, it crashed in the middle of the operation. On the second try, it got stuck in an internal loop, then backed out of it and decided to add only the most important type hints. This it was able to do, but it mangled several indents in the process, creating a cascading failure for the rest of the job. And on yet another attempt, the agent tried to just erase the entire project file.
Conclusions
The most disappointing part of this whole endeavor was the way the models failed at actually applying any of their recommended changes, or only did so after repeated attempts. I suspect the issue isn’t tool use in the abstract, but tool use that requires lots of context. Cloud-hosted models theoretically have access to enough memory to use their full context token window (262,144 for the models I evaluated). Still, from my experience, even the cloud models can choke and die on their inputs.
Right now, using a compact local model to get insight and feedback about a codebase works best when you have enough GPU memory for the entire model plus the needed context length for your work. It’s also best for obtaining high-level advice you plan to implement yourself, rather than advanced tool operations where the model attempts to autonomously change the code. But I’ve also had that that experience with the full-blown versions of these models.
USAT Leverages Times Square Crowd to Demonstrate Instant Digital Dollar Payments 18 Mar 2026, 8:07 am
USAT is transforming Times Square into a live demonstration of instant, internet-native payments. The digital dollar, issued by Anchorage Digital Bank, is taking over Times Square as 2 million spectators flood the streets of New York City for St. Patrick’s Day. The brand activation combines synchronized digital billboards with a street-level campaign designed to introduce digital dollar payments to a mainstream audience, coinciding with the world’s oldest and largest St. Patrick’s Day Parade.
The campaign features coordinated imagery across several of Times Square’s most recognizable digital screens, culminating in a synchronized share-of-voice takeover that transforms multiple screens into a single, unified visual, showing how digital dollars move between people in an instant. At street level, brand ambassadors will distribute 25,000 promotional postcards throughout Times Square and along the parade route, inviting passersby to scan a QR code to download the Rumble Wallet and claim $10 in USAT, free, right from their phone. The activation kicked off at 10 AM ET and ended at 11:59 PM ET.
The activation reflects a growing shift in fintech marketing toward experiential campaigns that translate complex financial technology into tangible consumer experiences, using high-traffic cultural moments and large-scale digital displays to capture public attention. The mechanic is simple by design: Scan. Download. Receive. It is the same technology that already moves money for more than 550 million people worldwide, now available to anyone walking through Times Square with a smartphone in their pocket.
Stablecoins are blockchain-based digital dollars designed to maintain a stable value while enabling instant, internet-native payments between digital wallets. They combine the price stability of traditional currency with the speed and programmability of blockchain networks.
“USAT builds on the principles that made USDT the most widely used stablecoin in the world,” said Paolo Ardoino, CEO of Tether. “Today, USDT is used by more than 550 million people globally, helping move digital dollars across the internet instantly and reliably. USAT brings those same foundations to a new audience, making it easier for people to experience how digital dollars can function in everyday life.”
“Times Square on St. Patrick’s Day is one of the most electric environments in the world,” said Bo Hines, CEO of Tether USAT. “We are not just running ads, we are handing people the future of money and letting them use it on the spot. This activation invites people to experience the next generation of money right on their smartphones. By pairing digital billboards with a dynamic street activation, we are turning a complex technology into something people can see, experience, and use for themselves.”
Digital dollars no longer require a tutorial. They require an opportunity. Large-scale activations like this have become an increasingly common strategy for fintech and technology brands looking to bridge the gap between digital infrastructure and mainstream awareness – and USAT is making that bridge as short as a QR code scan. USAT is a digital dollar designed to maintain a 1:1 value with the U.S. dollar while enabling instant digital payments through blockchain networks. Send it, receive it, spend it – globally, in seconds, using compatible wallets and applications. Moving money should feel as simple as sending a message. With USAT, it does.
Project Detroit, bridging Java, Python, JavaScript, moves forward 17 Mar 2026, 5:43 pm
Java’s revived Detroit project, to enable joint usage of Java with Python or JavaScript, is slated to soon become an official project within the OpenJDK community.
Oracle officials plan to highlight Detroit’s status at JavaOne on March 17. “The main benefit [of Detroit] is it allows you to combine industry-leading Java and JavaScript or Java and Python for places where you want to be able to use both of those technologies together,” said Oracle’s Georges Saab, senior vice president of the Java Platform Group, in a briefing on March 12. The goal of the project is to provide implementations of the javax.script API for JavaScript based on the Chrome V8 JavaScript engine and for Python based on CPython, according to the Detroit project page on openjdk.org.
Initially proposed in the 2018 timeframe as a mechanism for JavaScript to be used as an extension language for Java, the project later fizzled when losing sponsorship. But interest in it recently has been revived. The plan is to address Java ecosystem requirements to call other languages, with scripting for business logic and easy access to AI libraries in other languages. While the plan initially calls for Java and Python support, other languages are slated to be added over time. The Java FFM (Foreign Function & Memory) API is expected to be leveraged in the project. Other goals of the project include:
- Improving application security by isolating Java and native heap executions.
- Simplifying access to JS/Python libraries until equivalent Java libraries are made.
- Delivery of full JS/Python compatibility by leveraging the V8 and CPython runtimes. Also, maintenance cost is to be reduced by harnessing the V8 and CPython ecosystem.
- Leveraging existing investments in performance optimizations for the JS and Python languages.
JDK 26: The new features in Java 26 17 Mar 2026, 5:21 pm
Java Development Kit (JDK) 26, the latest standard Java release from Oracle, moves to general production availability on March 17.
The following 10 Jave Enhancement Proposal (JEP) features are officially targeted to JDK 26: A fourth preview of primitive types in patterns, instanceof, and switch; ahead-of-time object caching; an eleventh incubation of the Vector API; second previews of lazy constants and PEM (privacy-enhanced mail) encodings of cryptographic objects; a sixth preview of structured concurrency; warnings about uses of deep reflection to mutate final fields; improving throughput by reducing synchronization in the G1 garbage collector (GC); HTTP/3 for the Client API; and removal of the Java Applet API.
For AI accommodations, Oracle has cited five of these features: the primitive types in patterns capability, Vector API, structured concurrency, lazy constants, and AOT object caching.
JDK 26 is downloadable from Oracle.com. A short-term release of Java backed by six months of Premier-level support, JDK 26 follows the September 16 release of JDK 25, which is a Long-Term Support (LTS) release backed by several years of Premier-level support. General availability of JDK 26 follows two rampdown releases and two release candidates.
The latest JEP feature to be added, primitive types in patterns, instanceof, and switch, is intended to enhance pattern matching by allowing primitive types in all pattern contexts, and to extend instanceof and switch to work with all primitive types. Now in its fourth preview, this feature was previously previewed in JDK 23, JDK 24, and JDK 25. Goals for this feature include enabling uniform data exploration by allowing type patterns for all types, aligning type patterns with instanceof and aligning instanceof with safe casting, and allowing pattern matching to use primitive types in both nested and top-level pattern contexts. Other goals include providing easy-to-use constructs that eliminate the risk of losing information due to unsafe casts, and—following the enhancements to switch in Java 5 (enum switch) and Java 7 (string switch)—allowing switch to process values of any primitive type. Changes in this fourth preview include enhancing the definition of unconditional exactness and applying tighter dominance checks in switch constructs. The changes enable the compiler to identify a wider range of coding errors. For AI, this feature simplifies integration of AI with business logic.
With ahead-of-time object caching, the HotSpot JVM would gain improved startup and warmup times, so it can be used with any garbage collector including the low-latency Z Garbage Collector (ZGC). This would be done by making it possible to load cached Java objects sequentially into memory from a neutral, GC-agnostic format, rather than mapping them directly into memory in a GC-specific format. Goals of this feature include allowing all garbage collectors to work smoothly with the AOT (ahead of time) cache introduced by Project Leyden, separating AOT cache from GC implementation details, and ensuring that use of the AOT cache does not materially impact startup time, relative to previous releases. AI applications also get improved startup via AOT caching with any GC.
The eleventh incubation of the Vector API introduces an API to express vector computations that reliably compile at runtime to optimal vector instructions on supported CPUs. This achieves performance superior to equivalent scalar computations. The incubating Vector API dates back to JDK 16, which arrived in March 2021. The API is intended to be clear and concise, to be platform-agnostic, to have reliable compilation and performance on x64 and AArch64 CPUs, and to offer graceful degradation. The long-term goal of the Vector API is to leverage Project Valhalla enhancements to the Java object model. The performance of AI computation is also improved with the Vector API.
Also on the docket for JDK 26 is another preview of an API for lazy constants, which had been previewed in JDK 25 via a stable values capability. Lazy constants are objects that hold unmodifiable data and are treated as true constants by the JVM, enabling the same performance optimizations enabled by declaring a field final. Lazy constants offer greater flexibility as to the timing of initialization, as well as efficient data sharing in AI applications.
The second preview of PEM (privacy-enhanced mail) encodings calls for an API for encoding objects that represent cryptographic keys, certificates, and certificate revocation lists into the PEM transport format, and for decoding from that format back into objects. The PEM API was proposed as a preview feature in JDK 25. The second preview features a number of changes, including changing the name of the PEMRecord class to PEM. This class also now includes a decode() method that returns the decoded Base64 content. Also, the encryptKey methods of the EncryptedPrivateKeyInfo class now are named encrypt and accept DEREncodable objects rather than PrivateKey objects, enabling the encryption of KeyPair and PKCS8EncodedKeySpec objects.
The structured concurrency API simplifies concurrent programming by treating groups of related tasks running in different threads as single units of work, thereby streamlining error handling and cancellation, improving reliability, and enhancing observability. Goals include promoting a style of concurrent programming that can eliminate common risks arising from cancellation and shutdown, such as thread leaks and cancellation delays, and improving the observability of concurrent code. This feature in Java 26 also brings enhanced concurrency for AI.
New warnings about uses of deep reflection to mutate final fields are intended to prepare developers for a future release that ensures integrity by default by restricting final field mutation; in other words, making final mean final, which will make Java programs safer and potentially faster. Application developers can avoid both current warnings and future restrictions by selectively enabling the ability to mutate final fields where essential.
The G1 GC proposal is intended to improve application throughput when using the G1 garbage collector by reducing the amount of synchronization required between application threads and GC threads. Goals include reducing the G1 garbage collector’s synchronization overhead, reducing the size of the injected code for G1’s write barriers, and maintaining the overall architecture of G1, with no changes to user interaction. Although G1, which is the default garbage collector of the HotSpot JVM, is designed to balance latency and throughput, achieving this balance sometimes impacts application performance adversely compared to throughput-oriented garbage collectors such as the Parallel and Serial collectors. The G1 GC proposal further notes:
Relative to Parallel, G1 performs more of its work concurrently with the application, reducing the duration of GC pauses and thus improving latency. Unavoidably, this means that application threads must share the CPU with GC threads, and coordinate with them. This synchronization both lowers throughput and increases latency.
The HTTP/3 proposal calls for allowing Java libraries and applications to interact with HTTP/3 servers with minimal code changes. Goals include updating the HTTP Client API to send and receive HTTP/3 requests and responses; requiring only minor changes to the HTTP Client API and Java application code; and allowing developers to opt in to HTTP/3 as opposed to changing the default protocol version from HTTP/2 to HTTP/3.
HTTP/3 is considered a major version of the HTTP data communications protocol for the web. Version 3 was built on the IETF QUIC (Quick UDP Internet Connections) transport protocol, which emphasizes flow-controlled streams, low-latency connection establishment, network path migration, and security among its capabilities.
Removal of the Java Applet API, now considered obsolete, is also targeted for JDK 26. The Applet API was deprecated for removal in JDK 17 in 2021. The API is obsolete because neither recent JDK releases nor current web browsers support applets, according to the proposal. There is no reason to keep the unused and unusable API, the proposal states.
In addition to its 10 major JEP features, JDK 26 also has a variety of additional, smaller features that did not warrant an official JEP, according to Oracle. Among these are hybrid public key encryption, stricter version checking when using the jlink tool to cross link, extending the HTTP client request timeout to cover the response body, and virtual threads now unmounting from the carrier when waiting for another thread to execute a class initializer.
Oracle unveils the Java Verified Portfolio 17 Mar 2026, 2:00 pm
Oracle has introduced the Java Verified Portfolio (JVP), which provides developers with a curated set of Oracle-supported tools, libraries, frameworks, and services. Assets included at the JVP launch include the JavaFX Java-based UI framework, the Java Platform extension for Microsoft’s Visual Studio Code editor, and the Helidon Java framework for microservices, Oracle said.
Announced March 17 in conjunction with the release of Java Development Kit (JDK) 26, the JVP offers licensing and support for a developer’s broader application and development stack. More technologies will be added to the portfolio over time, Oracle said. With the JVP initiative, Oracle is acknowledging that Oracle customers and Java developers depend on a wide range of JDK-related tools and other Java technologies that do not belong in the Oracle JDK itself. JVP provides an enterprise-grade set of components that are fully supported and governed by Oracle, with roadmap transparency and life cycle management, the company said.
Oracle’s Java Verified Portfolio offers the following benefits to developers, according to Oracle:
- Streamlined licensing and roadmap management, with the core JDK separated from portfolio offerings to simplify licensing, support, and future roadmap planning from both Oracle JDK and the JVP.
- Centralized and flexible value delivery, with centralized access to a comprehensive set of Oracle-backed Java tools, frameworks, libraries, and services, with support timelines and JDK compatibility mappings.
- Enhanced trust and supply chain integrity, providing governance and ongoing support for all included components, helping organizations trust their Java supply chain. This reduces risk compared to adopting unsupported open source alternatives, Oracle said.
- Alignment with the Oracle Java ecosystem and customer needs. The JVP is backed by Oracle’s Java Platform Group to ensure consistency, alignment with OCI (Oracle Cloud Infrastructure), and other Oracle products.
The portfolio is accessible through preferred download sites and tools, with license and support free for all OCI Java workloads and Java SE subscription customers. Many releases are licensed free to all users, Oracle said.
Page processed in 0.446 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2026, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.
