Or try one of the following: 詹姆斯.com, adult swim, Afterdawn, Ajaxian, Andy Budd, Ask a Ninja, AtomEnabled.org, BBC News, BBC Arabic, BBC China, BBC Russia, Brent Simmons, Channel Frederator, CNN, Digg, Diggnation, Flickr, Google News, Google Video, Harvard Law, Hebrew Language, InfoWorld, iTunes, Japanese Language, Korean Language, mir.aculo.us, Movie Trailers, Newspond, Nick Bradbury, OK/Cancel, OS News, Phil Ringnalda, Photoshop Videocast, reddit, Romanian Language, Russian Language, Ryan Parman, Traditional Chinese Language, Technorati, Tim Bray, TUAW, TVgasm, UNEASYsilence, Web 2.0 Show, Windows Vista Blog, XKCD, Yahoo! News, You Tube, Zeldman
Node.js previews network inspection support | InfoWorld
Technology insight for the enterpriseNode.js previews network inspection support 4 Oct 2024, 10:21 pm
Node.js v20.18.0, a just-introduced update to the Long-Term Support (LTS) version of the popular asynchronous, event-driven JavaScript runtime, features experimental network inspection support.
Introduced October 3, Node.js 20.18.0 enables users to inspect network activities occurring within a JavaScript application. Still in active development, this capability is initially limited to HTTP and HTTPS modules only. To use this feature, Node.js must be started with the command $ node --inspect-wait --experimental-network-inspection index.js
.
Other highlights of Node.js 20.18.0 include a new option for the tls.createSecureContext
API. Developers can use tls.create.SecureContext({allowPartialTrustChain:true})
to treat non-self-signed certificates in the trust CA certificate list as trusted.
Node.js 20.18.0 also implements a new flavor of vm.createContext()
that creates a context with a freezable globalThis
, meaning it creates a context without contextifying its global object when vm.constants.DONT_CONTEXTIFY
is used. This is suitable when developers want to freeze the context or speed up the global access if they do not need the interceptor behavior imposed when the global is contextified, according to release notes.
Google ships Gemini 1.5 Flash-8B AI model 4 Oct 2024, 9:38 pm
Google’s Gemini 1.5 Flash-8B AI model is now production-ready. The company said the stable release of Gemini 1.5 Flash-8B has the lowest cost per intelligence of any Gemini model.
Availability was announced October 3. Developers can access gemini-1.5-flash-8B for free via Google AI Studio and the Gemini API. Gemini 1.5 Flash-8B offers a 50% lower price compared to 1.5 Flash and twice the rate limits. Lower latency on small prompts also is featured.
An experimental version of Gemini 1.5 Flash-8B had been released in September as a smaller, faster variant of 1.5 Flash. Flash-8B nearly matches the performance of the 1.5 Flash model launched in May across multiple benchmarks and performs well on tasks such as chat, transcription, and long context language translation, Google said.
The stable release of Gemini 1.5 Flash-8B is priced at the following rates:
- $0.0375 per 1 million input tokens on prompts
- $0.15 per 1 million output tokens on prompts
- $0.01 per 1 million tokens on cached prompts
Developers on the paid tier will be billed beginning October 14. The new price, along with work Google has done to drive down developer costs with the 1.5 Flash and 1.5 Pro models, show the company’s commitment to ensuring that developers have the freedom to build products and services that push the world forward, Google said.
Why cloud security outranks cost and scalability 4 Oct 2024, 9:00 am
According to a study by Akamai Technologies, 87% of digital-native businesses (which seems to be a term specific to Asia/Pacific) now prioritize security over cost and scalability when selecting a cloud provider. While this study focused on Asia, we see similar buying patterns here in the United States. This “security-first” approach reflects a broader shift in how businesses operate amidst accelerated technology adoption.
As businesses integrate cloud computing, they grapple with escalating complexity and cyberthreats. To remain agile and competitive, they embrace cloud-native design principles, an operational model that allows for independence and scalability through microservices and extensive API usage. However, this does not come without its challenges.
How security became king
The shift in prioritizing cloud security over cost and scalability is a significant trend driven by several factors:
Rising cyberthreats are both a perception and a reality. As businesses increasingly rely on cloud services, they face more sophisticated cyberthreats. High-profile data breaches and cyberattacks have heightened awareness and made security a top priority.
Complex cloud environments mean that adopting cloud-native designs introduces layers of complexity. Ensuring security across distributed components (microservices and APIs) becomes crucial, as misconfigurations or vulnerabilities can lead to significant risks. I’ve been screaming about this for years, along with others. Although we accept complexity as a means to an end in terms of IT, it needs to be managed in light of its impact on security.
Compliance and regulatory pressures mean that many industries face strict regulations regarding data protection and privacy (e.g., GDPR, CCPA). Ensuring compliance requires robust security measures to protect sensitive information in the cloud. Many enterprises are moving to sovereign or local clouds that are local to the laws and regulations they adhere to. Companies view this as reducing risk; even if those clouds are more expensive, the risk reduction is worth it.
Business reputation and trust are always vulnerable; companies recognize that a security breach can instantly damage both. Indeed, you’ll get yourself on the morning news and watch your stock drop by 50%. By prioritizing security, businesses aim to safeguard their reputation and customer relationships.
Long-term cost implications mean that focusing initially on cost and scalability might seem feasible, but the long-term financial impact of security incidents can be severe. Most people in the cybersecurity space understand that risk equals money. The more risk, the less your systems are worth, considering the potential for a breach. Prioritizing security can prevent costly breaches and downtime.
Innovation and agility mean that to remain competitive, businesses need to innovate rapidly. A secure cloud infrastructure enables this by providing a reliable foundation for building and deploying new services without compromising data integrity or security.
This landscape is driving businesses to adopt a “security-first” mindset. Although this can be a platitude, we must recognize that other benefits of cloud computing—cost savings and scalability—can be undermined without good security planning and mechanisms. This shift mirrors a broader global movement toward valuing resilience and reliability alongside traditional operational metrics.
How to lower security costs
Balancing cloud costs with security involves strategic approaches to optimize resources while safeguarding systems and data. This directly correlates with the price of the cloud versus the value of security, and they are not often that easy to connect. Many assume that the more security you’ll need, the higher the cost of the cloud services. The study mentioned at the beginning of this article assumes that more security is always more costly. I have not found that to be the case. Indeed, in many instances, the exact opposite is true.
Here are a few words of advice to help you find value in security and move away from the accepted mentality that more security always means more money.
Build security into the architecture from the start to avoid expensive fixes later. This seems obvious but it’s often not done. Security is an afterthought about half the time, and companies then are forced to toss money at the problem.
Automate compliance and management to reduce manual efforts and costs. Automation means repeating good processes without depending on humans; security is no different.
Use strong access controls to ensure only authorized users access critical data. Identity management is the most used approach here, and for good reason.
Regularly audit cloud usage to eliminate wasteful spending and optimize resource allocation. Also, train teams to efficiently manage cloud resources and security.
This is not that hard when you get down to it. What’s concerning is that enterprises truly believe they have to spend a great deal of money to reach an appropriate security level. Nothing can be further from the truth.
Visual Studio Code 1.94 improves file search 3 Oct 2024, 11:25 pm
The September 2024 release of Microsoft’s Visual Studio Code editor, version 1.94, features improvements for finding files using the File Explorer. The upgrade also introduces the ability to run Python tests with coverage.
Introduced October 3, Visual Studio Code 1.94 can be downloaded for Windows, Mac, or Linux via the project web page.
In the Visual Studio Code 1.94 release, Microsoft has improved the Find feature in the Explorer view to make it easier to search for files in large projects. Developers can open the Find control in the File Explorer by using the Ctrl+Alt+F keyboard shortcut. When searching, users can switch between fuzzy matching and continuous matching for more flexible results.
For Python, developers now can run Python tests with coverage and get rich results in the editor, Microsoft said. To run tests with coverage, users must select the coverage run icon in Test Explorer or “Run with coverage” from any menu that triggers test runs. The Python extension will run coverage by using the pytest-cov plugin if developers are using pytest, or by using coverage.py if using unittest. Once the coverage is complete, lines are highlighted in the editor for line-level coverage. The Python extension also has added a default problem matcher, simplifying issue tracking in Python code and providing more contextual feedback.
The Source Control Graph in Visual Studio Code 1.94 features a new history item reference picker in the view title, allowing developers to use the reference picker to filter the history items shown in the graph to a different branch or to view multiple branches. The Source Control Graph also expands the list of actions available in the context menu for source control history items. Actions have been added to create a new branch/tag from a history item, cherry-pick a history item, and check out an item.
Elsewhere in Visual Studio Code 1.94:
- Visual Studio Code is now fully converted to ESM (ECMAScript modules). All layers of VS Code core (Electron, Node.js, browser, workers) now use the
import
andexport
syntax in JavaScript for module loading and exporting. The move to ESM improves startup performance “massively,” Microsoft said. - The native REPL editor, used by the Python extension, now supports GitHub Copilot Inline Chat and code completions right in the input box. Also, GitHub Copilot Inline Chat has been upgraded to the GPT-4o mini model, for faster, more accurate, and higher-quality code explanations when using chat in the editor. And when using GitHub Copilot Inline Chat to generate code in a notebook, users now can accept and directly run the generated code from Inline Chat.
- Developers can now easily attach additional files as context for a GitHub Copilot Inline Chat prompt by dragging files or editor tabs from the workbench directly into the chat.
- A test failure preview feature adds specialized logic for diagnosing failing unit tests.
- JavaScript and TypeScript support now use TypeScript 5.6, which includes language and tool improvements as well as bug fixes and performance optimizations.
Visual Studio Code 1.94 follows last month’s VS Code 1.93 release, which introduced a new Profiles editor.
SingleStore acquires BryteFlow to boost data ingestion capabilities 3 Oct 2024, 6:00 pm
SingleStore, the company behind the relational database SingleStoreDB, is acquiring data ingestion and integration software provider BryteFlow to boost its ability to connect to disparate data sources for data analytics.
“SingleStore’s acquisition of BryteFlow accelerates its ability to ingest data from a broad set of sources like SAP, Oracle, and Salesforce,” said Dion Hinchcliffe, VP, CIO Practice Lead, The Futurum Group.
The acquisition comes at a time when the demand for real-time analytics and generative AI is surging, making it a timely move to meet shifting customer needs, Hinchcliffe said.
“It follows a series of strategic moves by SingleStore, including partnerships with Snowflake and advancements in data lakehouse integrations, positioning it to capitalize on this market momentum while the window is open,” Hinchcliffe explained.
SingleStore will integrate BryteFlow’s capabilities into its database offering to create a new interface, SingleConnect, the companies said in a joint statement.
What is BryteFlow?
BryteFlow has multiple products, some of which are specific to vendors such as Oracle and SAP.
Its core products are BryteFlow Ingest, Ingest XL, Blend, TruData, and ControlRoom. Ingest and Ingest XL (for larger quantities of data) are data ingestion tools with a no-code interface that can be used by enterprises to replicate data from a source. Blend and TruData are tools used to automate extract, transform, load (ETL) processes and automate data reconciliation and validation, while ControlRoom is an operational dashboard that enterprises can use to monitor the running of their Ingest and Blend instances.
It offers them under two subscriptions: Standard Edition is only available on AWS and encompasses BryteFlow Ingest for Data Replication and BryteFlow ControlRoom for monitoring Ingest instances, while Enterprise Edition is available across AWS and Azure and encompasses all tools, including Ingest XL, TruData, and Blend.
The company also offers its data integration tool for Amazon S3, Amazon Redshift and Snowflake on AWS via the AWS Marketplace.
Additionally, it also has a separate tool, SAP Data Lake Builder, that can be used to ingest data from SAP. Other specific data ingesting tools on offer include BryteFlow for Oracle, BryteFlow for SQL, BryteFLow for SQL Server, BryteFlow for PostgreSQL, and integrations with Databricks, Teradata, Google BigQuery, and Apache Kafka.
What effect will the acquisition have?
BryteFLow’s existing customers may have to look elsewhere for help with data integration if they use databases other than SingleStoreDB: “Our number one priority is to integrate BryteFlow into SingleStore and bring value to our customers through SingleConnect. We have no plan to sell BryteFlow independently at this time, apart from some special cases,” SingleStore CEO Raj Verma said via email.
But there will be opportunities for SingleStore customers, said Duncan Van Kouteren, research analyst at Nucleus Research. “The acquisition will enable customers to integrate data from various sources while maintaining real-time data analytics functionalities by utilizing BryteFlow’s capabilities such as change data capture (CDC),” he said.
Futurum’s HinchCliffe pointed out that SingleConnect, which is likely to be a no-code interface, akin to what BryteFlow offered, will simplify data ingestion, making it easier for enterprise customers to operationalize their data faster with less technical overhead, in turn speeding up time to market.
For SingleStore, it could also be an opportunity to win new customers by opening up markets in enterprise data integration, HinchCliffe said, adding that SingleStore’s enablement of real-time data processing from major ERP and CRM providers can help it tap into industries that rely heavily on these platforms such as finance, manufacturing, and retail.
The acquisition could be bad news for the likes of Databricks, Snowflake, or Google’s BigQuery.
“Rivals like Snowflake and Databricks which have CDC and real-time replication features but rely on third-party tools or complex configurations to do so are expected to feel some pressure,” Kouteren explained.
OpenAI updates API with model distillation, prompt caching abilities 3 Oct 2024, 10:53 am
In what can only be seen as OpenAI’s efforts to catch up with rivals, the ChatGPT-maker released several updates to its API to help ease the development of generative AI-based applications.
These updates, introduced during its DevDay conference this week, include capabilities such as model distillation and prompt caching, which are already offered by rivals.
Model distillation to help reduce costs of gen AI applications
Model distillation, a derivative of knowledge distillation, is a technique used in large language model training. The technique is used to teach a smaller model desired or required knowledge from a larger model.
Model distillation is preferred by developers as it can maintain the performance of a model underpinning an application while reducing the computation requirements and in turn costs.
The rationale is that smaller models, which use less compute, are able to perform like a larger model in a specified field of knowledge or expertise.
Several experts claim that model distillation can be used effectively in real-time natural language processing tasks or in industry sectors such as finance and healthcare that need the model to have domain expertise.
The model distillation capability introduced inside OpenAI API includes three components — Stored Completions, Evals, and Fine-tuning — all of which can be accessed via the API.
In order to distill a model using the OpenAI API, developers need to create an evaluation, either manually or using the Evals component, which is in beta, to measure the performance of the smaller model.
The idea is to continuously monitor the model after distilling it to ensure that it is performing as desired, OpenAI explained.
Post creating the evaluation, developers can use Stored Completions to create a dataset of outputs from the larger model on the desired topic on which the smaller model is to be trained.
Stored Completions, according to OpenAI, is a new free feature inside the API that can be used to automatically capture and store input-output pairs generated by any of the LLMs provided by the company, like GPT-4o or o1-preview.
Once the dataset is created using Stored Completions, it can be reviewed, filtered, and then used to fine-tune the smaller model or can be used as an evaluation dataset.
After this, developers can conduct an evaluation of the smaller model to see if it is performing optimally or is close to the larger model, the company said.
Rivals Google, Anthropic, and AWS already offer model distillation capabilities.
While Google previously offered the capability to create distilled models for PaLM and currently offers the capability to use Gemini to distill smaller models, AWS provides access to Llama 3.1-405B for synthetic data generation and distillation to fine-tune smaller models.
Model distillation as a feature inside OpenAI API is generally available, the company said, adding that any of its larger models can be used to distill smaller models.
Prompt Caching to reduce latency in gen AI applications
Alongside the distillation ability, OpenAI has also made available prompt caching capability for the latest versions of GPT-4o, GPT-4o mini, o1-preview, and o1-mini, as well as fine-tuned versions of those models.
Prompt caching is a technique used in the gen AI-based application development process that allows the model to understand natural language faster by storing and reusing contexts that are repetitively used while making API calls.
“Many developers use the same context repeatedly across multiple API calls when building AI applications, like when making edits to a codebase or having long, multi-turn conversations with a chatbot,” OpenAI explained, adding that the rationale is to reduce token consumption when sending a request to the LLM.
What that means is that when a new request comes in, the LLM checks if some parts of the request are cached. In case it is cached, it uses the cached version, otherwise it runs the full request.
OpenAI’s new prompt caching capability works on the same fundamental principle, which could help developers save on cost and time.
“By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times,” OpenAI said.
Additionally, OpenAI has introduced a public beta of the Realtime API, an API that allows developers to build low-latency, multi-modal experiences including text and speech in apps.
Understanding VBS Enclaves, Windows’ new security technology 3 Oct 2024, 9:00 am
Recently Microsoft announced efforts to improve security in its planned Recall AI assistant. Many of the details were not a surprise, as it built on familiar tools and services I’ve written about during the past few years, but the most important feature, VBS Enclaves, will be new to most of us.
VBS Enclaves are the latest piece of Microsoft’s push to use virtualization to secure Windows by isolating critical functions in encrypted virtual machines using its low-level Krypton hypervisor.
Windows’ always-on hypervisor, Krypton
Krypton is an important piece of the modern Windows, as it allows virtual machines and the host OS to share a scheduler, allowing features like Windows login to be isolated from the rest of the OS. This allows them to continue to run as Windows features while protecting passwords and biometric information. They stay isolated where malware running in the host Windows instance can’t access them.
Krypton is the foundation of much of Windows 11’s hardware security, working with your PC’s trusted platform modules (TPM) to manage encryption keys, digital signatures, and verification hashes. Think of it as a way of significantly reducing the risk of sensitive information leaking from your PC, as well as reducing the risk of malware replacing what would normally be trusted Windows functions.
Virtualization-based security has been on Microsoft’s radar for a long time, with a heritage that dates back to Windows Server 2016 and the introduction of Secure Windows containers. Although Windows’ support for Docker and other container-based application isolation tools allowed some form of process isolation, it wasn’t perfect.
Secure Windows containers mixed both containers and Hyper-V’s security tool to add more isolation by hosting secure containers in a virtual machine rather than on the host OS. This did add the overhead that comes with running a separate OS for your application containers, but features like Nano Server and Windows Server Core kept it to a minimum, with successive releases reducing server image sizes significantly.
Virtualization-based security in Windows 11
Microsoft mixed this approach with TPM capabilities to add enhanced security to Windows over time, enforcing it in Windows 11. That’s why Windows 11 requires hardware with TPM 2.0 support.
The TPM holds the keys and certificates to manage digital signatures so that tools like Windows memory integrity service can run in a hardened virtualized environment using Hyper-V-secured VMs running on the Krypton hypervisor. With memory integrity, kernel-mode drivers and binaries are checked for valid signatures before they run; unsigned code is blocked before it can compromise your PC.
Microsoft recently extended its virtualization-based security model to what it calls VBS Enclaves. If you’ve looked at implementing confidential computing on Windows Server or in Azure, you’ll be familiar with the concept of enclaves, using Intel’s SGX instruction set to lock down areas of memory, using them as a trusted execution environment. This approach requires specific processors, with the latest generation of SGX limited to enterprise Xeon hardware.
Using software to run trusted execution environments
VBS Enclaves offer a similar approach to securing memory but without requiring specific hardware. That allows Microsoft to provide secure enclaves on Intel, AMD, and Arm hardware. As a result, Recall will only run in trusted memory under the control of the Krypton hypervisor, with encryption keys managed by your PC’s TPM and with access controlled by Windows Hello to ensure user presence.
Putting a trusted execution environment on a PC is useful for more than securing AI. It protects sensitive data, adding a new level of protection beyond at rest and in motion: in use. While it does require more work to define and use a VBS Enclave, it’s worth it to have more security with only limited performance impact.
With Windows 11’s memory integrity tools, a VBS Enclave uses Windows’ integral hypervisor to create a new, isolated, high-privilege area of system memory: Virtual Trust Level 1. Most of your code, and Windows itself, continues to run at Virtual Trust Level 0. VTL 1 is used by a secure version of the Windows kernel, with its own isolated user mode. This is where your VBS Enclave runs, as part of an application that appears to cross the boundary between the two zones. In reality, you’re separating off the VTL 1 enclave and using secure channels to communicate with it from the rest of your application in VTL 0.
Using VBS Enclaves in your applications
So how do you build and use VBS Enclaves? First, you’ll need Windows 11 or Windows Server 2019 or later, with VBS enabled. You can do this from the Windows security tool, via a Group Policy, or with Intune to control it via MDM. It’s part of the Memory Integrity service, so you should really be enabling it on all supported devices to help reduce security risks, even if you don’t plan to use VBS Enclaves in your code.
The best way to think of it is as a way of using encrypted storage securely. So, for example, if you’re using a database to store sensitive data, you can use code running in an enclave to process and query that data, passing results to the rest of your application. You’re encapsulating data in a secure environment with only essential access allowed. No other parts of your system have access to the decryption keys, so on-disk data stays secure.
Code running in the VTL 1 environment must be signed by Microsoft, with an OS-level handoff between the two trust zones that resets CPU registers to reduce the risk of state transferring between relatively insecure user modes and your VBS Enclave. Using a VBS Enclave is, naturally, more computationally expensive, and operations take longer to run (though still only microseconds).
VBS Enclaves are DLLs and need a host application to run. You’re limited to a subset of the available Windows system-level C++ APIs, with a list of available Universal C Runtime APIs listed in the development documentation. Other APIs supported are in the VBS Enclave Runtime (Vertdll) and the Bcrypt cryptographic library.
Start with a sample enclave
Microsoft provides a useful sample application that illustrates the life cycle of a VBS Enclave application, showing how to call enclave functions from the host application. External functions need to be explicitly exported—and only those functions can be called by the host. Compiling an enclave needs specific configuration in the linker, ensuring that the right libraries are included and that the resulting DLL is instrumented correctly. Other features ensure that the VBS Enclave is protected from attacks via forged platform DLLs.
Once compiled, your code must be signed. A VBS Enclave signature requires three specific EKUs (Extended Key Usage): one for code signing, one for the enclave, and one for the author. In production you can use Microsoft’s own Trusted Signing service, which offers a profile for signing enclaves. This approach lets you automate signing with the Azure CLI.
There are some important points to consider when writing code that uses VBS Enclaves. They can be loaded by any application running on the host PC, so you need to write code in the enclave that explicitly treats anything outside the enclave as untrusted.
Essential security for sensitive data
The same technology is used for Azure SQL and SQL Server’s Always Encrypted feature. This approach ensures that only authorized users have access to sensitive data. T-SQL operations can cross the secure enclave boundary using confidential queries over an internal TLS channel. Operations take place inside the enclave, allowing the underlying data to always remain encrypted.
VBS Enclaves are necessarily complex, with significant restrictions over standard DLLs. Without those constraints, though, they’d only be as secure as a standard DLL. By locking-down code at a library and header level to run in a trusted execution environment, you’re significantly reducing the risk of data leaking, either deliberately or accidently.
Although it takes a fraction longer and requires some extra system resources, if you’re working with sensitive information, it’s a trade-off worth making. Using VBS Enclaves and encrypted storage should be essential wherever you’re using personally identifiable information, whether it’s a user’s vector-indexed Recall history, stored payment information, or health records.
If there’s any risk associated with data, then you really need a compelling reason not to use this technology. When it comes to sensitive data, VBS Enclaves should be the default choice.
How to use extension methods in C# 3 Oct 2024, 9:00 am
In the C# programming language, extension methods enable you to extend the functionality of existing types without modifying them or deriving new types from them using inheritance. You don’t need to create subclasses of existing classes or recompile or modify your existing classes to use extension methods. Extension methods improve the readability of your code while at the same time allowing you to extend the functionality of existing classes or interfaces. Microsoft introduced extension methods in C# 3.0.
Common extension methods in .NET include the LINQ standard query operators that add additional query capabilities to the System.Collections.IEnumerable and System.Collections.Generic.IEnumerable
Extension methods enable you to “add” methods to existing types without creating a new derived type, recompiling, or otherwise modifying the original type. Extension methods are a special kind of static method, but they are called as if they were instance methods on the extended type.
Essentially, an extension method is a special type of static method that allows you to add functionality to an existing type even if you don’t have access to the source code of the type. An extension method is just like another static method but always includes the “this” reference as its first parameter. You can add as many extension methods as you want to any type. Most importantly, you can even add extension methods to a value type.
Let’s examine how we can make use of extension methods in C#.
Create a console application project in Visual Studio
First off, let’s create a .NET Core console application project in Visual Studio. Assuming Visual Studio 2022 is installed in your system, follow the steps outlined below to create a new .NET Core console application project.
- Launch the Visual Studio IDE.
- Click on “Create new project.”
- In the “Create new project” window, select “Console App (.NET Core)” from the list of templates displayed.
- Click Next.
- In the “Configure your new project” window, specify the name and location for the new project.
- Click Next.
- In the “Additional information” window shown next, choose “.NET 8.0 (Long Term Support)” as the framework version you would like to use.
- Click Create.
We’ll use this .NET 8 console application project to work with the code examples shown in the subsequent sections of this article.
Define an extension method in C#
Extension methods are a feature of the C# programming language that enable you to extend functionality to a type without inheriting from the type, recompiling the code, or modifying the original type. To define an extension method in C#, follow these steps:
- Create a class in C# and use the “static” keyword to mark the class as static.
- Define a static method inside the class, i.e., a method that has the “static” keyword specified in the method signature.
- Ensure that the first parameter of this static class accepts the “this” reference.
The following is an example of a typical C# class named MyExampleClass that contains an instance method named MyExampleMethod. This class is followed by a static class called MySampleClassExtensions that extends the functionality of the instance method. Note that the static class MySampleClassExtensions contains an extension method named MyExampleExtensionMethod that accepts the “this” reference as the first parameter of the method.
public class MyExampleClass
{
public void MyExampleMethod()
{
Console.WriteLine("This is an instance method.");
}
}
public static class MyExampleClassExtensions
{
public static void MyExampleExtensionMethod(this MyExampleClass obj)
{
Console.WriteLine("This is an extension method.");
}
}
You can use the following piece of code to invoke the extension method.
MyExampleClass myClass = new MyExampleClass();
myClass.MyExampleExtensionMethod();
When you execute the above piece of code, the text “This is an extension method” will be displayed at the console as shown in Figure 1.
IDG
Note that if you define an extension method with a signature identical to that of an instance method, the instance method will always take precedence. The extension method will never be called.
Using extension methods to extend existing types
You can take advantage of extension methods to add additional functionality — integers, strings, chars, etc. — to existing types in C#. For example, the code snippet below illustrates how you can extend the string class in C# to add a new method that returns the count of non-space characters in the string.
public static class MyStringExtensions
{
public static int CountNonSpaceCharacters(this string input)
{
return new string(input.ToCharArray()
.Where(c => !Char.IsWhiteSpace(c))
.ToArray()).Length;
}
}
The following code snippet shows how you could use the CountNonSpaceCharacters extension method.
string str = "This is a test string";
Console.WriteLine("The number of non-space characters in the string is: "+str.CountNonSpaceCharacters());
Console.Read();
Figure 2 shows the output when you run the preceding piece of code.
IDG
Here’s another example of an extension method. This extension method extends the List class in C#.
public static class MyListExtensions
{
public static T GetLastElement(this List list)
{
if(list.Count > 0)
return list[list.Count - 1];
return default(T);
}
}
The GetLastElement is an extension method that returns the last element of a list. You can invoke this extension method using the following code snippet.
List integers = new List { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int element = integers.GetLastElement();
Console.WriteLine(element);
Overloading an extension method in C#
Similar to other methods, you can also overload an extension method. The following code snippet shows how you can overload the Substring method of the string class to return a substring of a string. This overloaded Substring method takes the starting and ending index and a Boolean as parameters. The Boolean denotes if the returned string should be converted to upper case. If you pass true in this parameter when calling the extension method, the returned string will be converted to upper case.
public static string Substring(this string str, int startIndex,
int endIndex, bool convertToUpper)
{
string result = str.Substring(startIndex, endIndex - startIndex);
if (convertToUpper)
{
return result.ToUpper();
}
return result;
}
You can call the preceding extension method by using the following code snippet.
string str = "Hello World!";
string result = str.Substring(6,11, true);
Console.WriteLine(result);
When you run the program, the text “World” will be displayed at the console window as shown in Figure 3.
IDG
Extending an interface in C#
Extension methods can be used to add additional functionality to interfaces as well. To do this, create an extension method in C# that can accept an interface as its first argument. Consider the following interface called IMyInterface.
public interface IMyInterface
{
public void MyMethod();
}
The MyClass class below implements the IMyInterface.
public class MyClass: IMyInterface
{
public void MyMethod()
{
Console.WriteLine("This is an instance method.");
}
}
Interestingly, most LINQ methods are based on extension methods. For example, the standard query operators in LINQ — i.e., select, where, etc. — are implemented in the C# class named Enumerable. Each of these methods is implemented as an extension method pertaining to the IEnumerable
Working with extension methods in C#
Here are a few important points you should remember regarding extension methods in C#:
- Extension methods of a class must be static.
- The first argument of an extension method must be a reference to the type on which the extension method is to be invoked.
- Extension methods cannot be used to access private members of a class.
- You cannot modify the private state of an object using extension methods.
- You can use an extension method to extend an interface or a class but you cannot override them.
Extension methods enable you to write code in a more expressive and fluent way. That said, you should not overuse extension methods because they can make your source code more difficult to comprehend. Hence, use extension methods judiciously; never use them only to make your code more concise. Extension methods are not meant for reducing the KLOC — an acronym for thousands of lines of code.
OpenAI previews Realtime API for speech-to-speech apps 2 Oct 2024, 10:00 pm
OpenAI has introduced a public beta of the Realtime API, an API that allows paid developers to build low-latency, multi-modal experiences including text and speech in apps.
Introduced October 1, the Realtime API, similar to the OpenAI ChatGPT Advanced Voice Mode, supports natural speech-to-speech conversations using preset voices that the API already supports. OpenAI also is introducing audio input and output in the Chat Completions API to support use cases that do not need the low-latency benefits of the Realtime API. Developers can pass text or audio inputs into GPT-4o and have the model respond with text, audio, or both.
With the Realtime API and the audio support in the Chat Completions API, developers do not have to link together multiple models to power voice experiences. They can build natural conversational experiences with just one API call, OpenAI said. Previously, creating a similar voice experience had developers transcribing an automatic speech recognition model such as Whisper, passing text to a text model for inference or reasoning, and playing the model’s output using a text-to-speech model. This approach often resulted in loss of emotion, emphasis, and accents, plus latency.
With the Chat Completions API, developers can deal with the entire process with one API call, though it remains slower than human conversation. The Realtime API improves latency by streaming audio inputs and outputs directly, enabling more natural conversational experiences, OpenAI said. The Realtime API also can handle interruptions automatically, like ChatGPT’s advanced voice mode.
The Realtime API enables development of a persistent WebSocket connection to exchange messages with GPT-4o. The API backs function calling, which makes it possible for voice assistants to respond to user requests by pulling in new context or triggering actions. Also, the Realtime API leverages multiple layers of safety protections to mitigate the risk of API abuse, including automated monitoring and human review of flagged model inputs and outputs.
The Realtime API uses text tokens and audio tokens. Text input costs $5 per 1M tokens and text output costs $20 per 1M tokens. Audio input costs $100 per 1M tokens and audio output costs $200 per 1M tokens.
OpenAI said plans for improving the Realtime API include adding support for vision and video, increasing rate limits, adding support for prompt caching, and expanding model support to GPT-4o mini. The company said it would also integrate support for the Realtime API into the OpenAI Python and Node.js SDKs.
Docker tutorial: Get started with Docker 2 Oct 2024, 9:00 am
Containers are a lightweight way to make application workloads portable, like a virtual machine but without the overhead and bulk typically associated with VMs. With containers, you can package apps and services and move them freely between physical, virtual, or cloud environments.
Docker, a container creation and management system created by Docker Inc., takes the native container functionality found in Linux and makes it available to end-users through a command-line interface and a set of APIs.
Many common application components are now available as prepackaged Docker containers, making it easy to deploy stacks of software as decoupled components—an implementation of the microservices model. It helps to know how the pieces fit together from the inside out, though. In this guide, we’ll investigate how Docker works. We’ll start by looking at how to set up Docker across the Linux, Windows, and macOS platforms. Next, we’ll install an instance of the Apache web server in a Docker container. You’ll also learn how to work with Dockerfiles to automate Docker image builds.
Choose a Docker product
At its core, Docker uses an open source project, Docker Engine. It’s possible to install Docker Engine by itself and work with it directly from the command line, although only on Linux (or through WSL in Windows).
Your second option is Docker Desktop, a convenient GUI-based application for working with containers across multiple platforms. For developers working on Microsoft Windows, Docker Desktop is the most convenient solution.
The main consideration with Docker Desktop is its licensing. It’s free for individual, non-commercial open source, and educational use, but business use generally involves licensing fees, although the costs scale depending on the size of the organization.
IDG
You can also obtain binary editions of the standalone Docker Engine for Windows, macOS, and Linux. However, you’ll have to perform the entire setup process manually, as Docker has more to it than just a binary artifact. The standalone binaries also don’t have any self-updating mechanism, and they may lack many of the features found in the full Docker product.
Using Docker with Linux
Docker started with Linux, as container technology relied on features in the Linux kernel. On Linux, you can use Docker’s core open source features directly in the form of Docker Engine. Setting up Docker Engine requires a different procedure for each major Linux distribution, but the same goes for setting up Docker Desktop on Linux. Once installed, Docker Desktop on Linux provides more convenient ways to manage a Docker setup than the command line alone.
Using Docker with Windows
On Windows, Docker Desktop can work in one of two modes: with Windows’s native Hyper-V virtualization system, or through a Linux instance in WSL2. Both back ends offer the same functionality, and both have the same hardware requirements: 64-bit CPU with SLAT support, at least 4GB RAM, and BIOS-enabled hardware virtualization support.
Of the two, WSL2 is the more lightweight and broadly available option. Hyper-V is more demanding and ships only with Windows 10 or 11’s Professional or Enterprise editions. Hyper-V provides more process isolation features as of Windows 11 and Windows Server 2022, but those may not be crucial features if you’re just starting out.
If you want to use another VM or hypervisor system to run Docker containers, like VMware, Docker only supports that in its business or enterprise editions.
Using Docker with macOS
Installing Docker Desktop on macOS works much the same as any other desktop application. Double-click the Docker.dmg
file to open it, then drag the Docker icon inside to your Applications folder. It’s also possible to run the setup process from the command line.
Working with the Docker CLI
The docker
command-line utility is where you’re likely to do most of your work with Docker. You can run docker
from any console once it’s been properly installed, and view all available Docker commands by simply typing docker
. For an up-to-date rundown of all commands, their options, and full descriptions, consult the official command-line client documentation.
When you have Docker set up, one of the first commands to run with it is docker info
, which returns basic information about your Docker installation. The output shows the number of containers and images, along with other pertinent information. Note that it may be quite lengthy; this example shows only the last of several pages.
IDG
The Docker Desktop client isn’t meant to replace the docker
CLI, but to augment it. It gives you a convenient GUI to do most common day-to-day work with containers: running containers, examining installed images, inspecting created volumes, listing container image builds, and controlling Docker Desktop extensions. Docker Desktop also provides its own built-in console host to give you access to the console without having to switch away.
We’ll use the Docker CLI as the default way to interact with Docker.
Working with Docker containers and images
Docker containers are much more efficient than virtual machines. When a container is not running a process, it is completely dormant. You might think of Docker containers as self-contained processes—when they’re not actively running, they consume no resources apart from storage.
Containers require an image to run, and by default, no images are present in a Docker installation. If you want to run an image that isn’t present, it’ll have to be downloaded and added to the local image repository. You can download and add images to the image repository semi-automatically, as you’ll see in the next example.
Launching a container
Let’s say we want to launch a basic Ubuntu Linux Docker image and run the bash
shell, we can use the following command:
docker run -i -t ubuntu /bin/bash
The output will look something like this:
PS C:\Users\serda> docker run -i -t ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
dafa2b0c44d2: Pull complete
Digest: sha256:dfc10878be8d8fc9c61cbff33166cb1d1fe44391539243703c72766894fa834a
Status: Downloaded newer image for ubuntu:latest
root@16fd4752b26a:/#
This shows Docker fetching the ubuntu
image and starting a container based on it. The last line is the prompt for the bash
shell running in the container, where you can type commands. Note that any commands typed at that prompt will be run in the Docker image, not in the system at large.
Examining running containers
You can view active and inactive containers using the docker ps
command. (Remember to run this from your actual system console, not the above prompt that’s actually running inside a container.) If you use docker ps -a
, it will show all containers on the system regardless of their status; docker ps
alone will show only the running containers.
The output for docker ps
may look something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
16fd4752b26a ubuntu "/bin/bash" 25 minutes ago Up 25 minutes stoic_gould
Each running container has an ID associated with it—here, it’s the string beginning with 16fd
—plus information about which image was used to create it, and a friendly name for the container (here, stoic_gould
). The friendly name can be manually assigned with docker run
‘s --name
switch, or you can assign one randomly on startup.
Pulling containers
When we ran docker run
, it automatically pulled an Ubuntu container image from the Docker Hub registry service. Most of the time, though, you’ll want to pull container images into the local cache ahead of time, rather than do that on demand. To do so, use docker pull
, like this:
docker pull ubuntu
A full, searchable list of images and repositories is available on the Docker Hub.
Docker images vs. containers
Something worth spelling out at this point is how images, containers, and the pull/push process all work together.
Docker containers are built from images, which are essentially shells of operating systems that contain the necessary binaries and libraries to run applications in a container.
Images are labeled with tags, essentially metadata, that make it easy to store and pull different versions of an image. Naturally, a single image can be associated with multiple tags: ubuntu:16.04
, ubuntu:xenial-20171201
, ubuntu:xenial
, ubuntu:latest
.
The command docker pull ubuntu
, which we saw earlier, pulls the default Ubuntu image from the Ubuntu repository, which is the image tagged latest
. In other words, the command docker pull ubuntu
is equivalent to docker pull ubuntu:latest
.
Note that if I had typed docker pull -a ubuntu
, I would have pulled all images (the -a
flag) in the Ubuntu repository into my local system. This would be convenient if I wanted to work with a variety of Ubuntu images without having to fetch each individually, but it would take up a lot of space locally.
Most of the time, though, you will want either the default image or a specific version. For example, if you want the image for Ubuntu Saucy Salamander, you’d use docker pull -a ubuntu:saucy
to fetch the image with that particular tag from that repo.
The same logic behind repos and tags applies to other image manipulations. If you pulled saucy
, as in the above example, you would run it by typing docker run -i -t ubuntu:saucy /bin/bash
. If you typed docker image rm ubuntu
, to remove the ubuntu
image, it would remove only the image tagged with latest
. To remove images other than the default, such as Ubuntu Saucy, you must include the appropriate tag: docker image rm ubuntu:saucy
.
Docker image and container workflow
Once you’ve pulled an image, you start a live container using the image’s contents by executing the docker run
command.
Images are immutable. They are not changed when you run a container; the container starts off as essentially a copy of what’s in the image, and any changes that take place are lost when the container is terminated.
If you want to make changes to the image, you can do this in a couple of ways. You can modify the image’s Dockerfile and build a new image using those changes. Or, you can make changes inside the running container, and create a new image incorporating those changes with the docker commit command. In either case, you’re not modifying the original image, but creating a new one with the changes.
It’s important to note that Docker only stores the deltas, or changes, in images built from other images. As you build your own images, only the changes you make to the base image are stored in the new image, which links back to the base image for all its dependencies. Thus, you can create images that have a virtual size of 266MB but take up only a few megabytes on disk.
Fully configured containers can then be pushed up to a central repository to be used elsewhere in the organization or even shared publicly. In this way, an application developer can publish a public container for an app, or you can create private repositories to store all the containers used internally by your organization.
Create a new Docker image from a container
Now that you have some understanding of how images and containers work, let’s set up an Apache web server container and make it permanent.
Build a new Docker container
First, you need to build a new container. There are a few ways to do this, but because you have a few commands to run, start a root shell in a new container:
docker run -i -t --name apache_web ubuntu /bin/bash
This creates a new container with a unique ID and the name apache_web
. It also gives you a root shell because you specified /bin/bash
as the command to run. Now, install the Apache web server using apt-get
:
apt-get install apache2
Note that you don’t need to use sudo
, because you’re running as root inside the container. Note that you do need to run apt-get update
, because, again, the package list inside the container is not the same as the one outside of it. (The other instructions inside the container do not require sudo
unless explicitly stated.)
The normal apt-get
output appears, and the Apache2 package is installed in your new container. Once the install has completed, start Apache, install curl
, and test the installation, all from within your container:
service apache2 start
apt-get install curl
curl http://localhost
If you were doing this in a production environment, you’d next configure Apache to your requirements and install an application for it to serve. Docker lets directories outside a container be mapped to paths inside it, so one approach is to store your web app in a directory on the host and make it visible to the container through a mapping.
Create a startup script for a Docker container
Remember that a Docker container runs only as long as its process or processes are active. So if the process you launch when you first run a container moves into the background, like a system daemon, Docker will stop the container. Therefore, you need to run Apache in the foreground when the container launches, so that the container doesn’t exit as soon as it fires up.
Create a script, startapache.sh
, in /usr/local/sbin
:
apt-get install nano
nano /usr/local/sbin/startapache.sh
(You don’t have to use the nano
editor to do this, but it’s convenient.)
The contents of startapache.sh
:
#!/bin/bash
. /etc/apache2/envvars
/usr/sbin/apache2 -D FOREGROUND
Save the file and make it executable:
chmod +x /usr/local/sbin/startapache.sh
All this small script does is bring in the appropriate environment variables for Apache and start the Apache process in the foreground.
You’re done modifying the contents of the container, so you can leave the container by typing exit
. When you exit the container, it will stop.
Commit the container to create a new Docker image
Now you need to commit the container to save the changes you’ve made:
docker commit apache_web local:apache_web
The commit will save your container as a new image and return a unique ID. The argument local:apache_web
will cause the commit to be placed in a local repository named local
with a tag of apache_web
.
You can see this by running the command docker images
:
REPOSITORY TAG IMAGE ID CREATED SIZE
local apache_web 540faa63535d 24 seconds ago 233MB
ubuntu latest b1e9cef3f297 4 weeks ago 78.1MB
Note that the exact details of your image—the image ID and the size of the container—will be different from my example.
Docker networking basics
Now that you have your image, you can start your container and begin serving pages. Before you do that, let’s discuss how Docker handles networking.
Docker can create various virtual networks used by Docker containers to talk to each other and the outside world:
- bridge: This is the network that containers connect to by default. The
bridge
network allows containers to talk to each other directly, but not to the host system. - host: This network lets containers be seen by the host directly, as if any apps within them were running as local network services.
- none: This is essentially a null or loopback network. A container connected to
none
can’t see anything but itself.
Other network drivers also exist, but these three are most crucial for starting out.
When you want to launch a container and have it communicate with both other containers and the outside world, you need to manually map ports from that container to the host. For the sake of my example, you can do this on the command line when you launch your newly created container:
docker run -d -p 8080:80 --name apache local:apache_web /usr/local/sbin/startapache.sh
The -p
switch is used for port mapping. Here, it maps port 8080 on the host to port 80 inside the container.
Once you run this command, you should be able to point a web browser at the IP address of the host and see the default Apache web server page.
You can see the status of the container and the TCP port mappings by using the docker ps
command:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
81d8985d0197 local:apache_web "/usr/local/sbin/sta…" 13 minutes ago Up 12 minutes 0.0.0.0:8080->80/tcp apache
You can also look up the network mappings by using the docker port
command, in this case docker port apache
80/tcp -> 0.0.0.0:8080
Note that you could use the -P
option on the docker run
command to publish all open ports on the container to the host and map a random high port such as 49153 back to port 80 on the container. This can be used in scripting as necessary, but it’s generally a bad idea to do this in production.
At this point, you have a fully functional Docker container running your Apache process. When you stop the container, it will remain in the system and can be restarted at any time via the docker restart
command.
Use Dockerfiles to automate Docker image builds
As educational as it is to build Docker containers manually, it is pure tedium to do this repeatedly. To make the build process easy, consistent, and repeatable, Docker provides a form of automation for creating Docker images called Dockerfiles.
Dockerfiles are text files, stored in a repository alongside Docker images. They describe how a specific container is built, letting Docker perform the build process for you automatically. Here is an example Dockerfile for a minimal container, much like the one I built in the first stages of this demo:
FROM ubuntu:latest
RUN apt-get update
RUN apt-get install -y curl
ENTRYPOINT ["/bin/bash"]
If you save this file as dftest
in your local directory, you can build an image named ubuntu:testing
from dftest
with the following command:
docker build -t ubuntu:testing -
In PowerShell, you’d use this command:
cat .\dftest | docker build -t ubuntu:testing -
Docker will build a new image based on the ubuntu:latest
image. Then, inside the container, it will perform an apt-get update
and use apt-get
to install curl
. Finally, it will set the default command to run at container launch as /bin/bash
. You could then run:
docker run -i -t ubuntu:testing
Et voilà! You have a root shell on a new container built to those specifications. Note that you can also launch the container with this command:
docker run -i -t dftest
Numerous operators are available to be used in a Dockerfile, such as mapping host directories to containers, setting environment variables, and even setting triggers to be used in future builds. See the Dockerfile reference page for a full list of Dockerfile operators.
Next steps with Docker
There’s much more to Docker than we’ve covered in this guide, but you should have a basic understanding of how Docker operates, a grasp of the key Docker concepts, and enough familiarity to build functional containers. You can find more information on the Docker website including an online tutorial that goes into more granular detail about Docker features.
Spring AI: An AI framework for Java developers 2 Oct 2024, 9:00 am
Artificial intelligence has been something of a fiesta for programmers for the last few years, and one language—Python—has been the undeniable belle of the ball. Java and other languages have been a bit sidelined. But now we are entering a new phase where AI models are the key component of machine learning, and the key question is how to integrate their functionality into larger systems. That kind of integration happens to be a Java specialty. Even better for Java developers, the Spring framework has recently introduced Spring AI, which streamlines programming for a wide range of AI projects. With Spring AI, you can apply familiar Spring semantics and everything you already know about enterprise infrastructure to machine learning.
Could Java rival Python for AI development? Only time will tell, but Spring AI is one of several newer projects that raise the possibility. Let’s take a look.
What is Spring AI?
Spring AI aims to encapsulate a wide range of AI tool providers including libraries and frameworks that support natural language processing and generative AI:
- Natural language processing (NLP): OpenAI’s GPT, Google’s Gemini, Hugging Face Transformers
- Computer vision: TensorFlow, PyTorch, OpenAI’s DALL-E
- Speech recognition and synthesis: Google Speech-to-Text, Amazon Transcribe, Azure Speech Services
- Recommendation systems: TensorFlow Recommenders, Amazon Personalize
- Generative AI: Stable Diffusion, OpenAI’s DALL-E, Midjourney
- Extract, transform, load (ETL): Vector store transformations
Spring AI also includes or is planned to include specialized providers for anomaly detection, time series analysis, and reinforcement learning. You can find a full list of planned providers on the Spring AI overview page. Spring is currently focused on the LLM use case and supports ChatGPT directly from OpenAI or as an Azure service. Additional AI providers include Google, Hugging Face, and Amazon.
The idea going forward is to wrap these services in an abstract form to integrate a wide range of AI tooling into a consistent, Spring-style component system. In the Spring AI model, POJOs will be the building blocks of an application to the AI domain.
Currently, getting a little chatbox to deliver coherent responses to custom enterprise data can be an enormous undertaking. Efforts to simplify the process and make it smoother are welcome.
Set up a Spring AI project
One way to use Spring AI is to set up a new Spring Boot app for it. Enter the following in your CLI:
spring boot new --from ai --name myProject
Or, if you already have an existing project, you can just add the following to it:
spring boot add ai
This command adds the spring-ai-bom dependency to an existing project.
The Spring AI API
Spring AI’s API consists of several branches, with the broadest being the Model interface. Model provides a generic component that developers can use to integrate almost any kind of AI functionality into an application. The interface also acts as a common target for AI providers to make their platforms available within the Spring ecosystem.
In Spring AI, many different types of AI are extended as implementations of the Model interface, including ChatModel, EmbeddingModel, ImageModel, and SpeechModel. There is also a streaming version called StreamingModel, for providers that support such an implementation.
These model implementations encapsulate the work done by the provider, which is consumed by the ChatClient implementation.
Spring AI also supports function calling, enabling custom application code to provide an API the AI can interact with to form its responses. So far, Spring AI includes support for:
- Anthropic Claude
- Azure OpenAI
- Google VertexAI Gemini
- Groq
- Mistral AI
- Ollama
- OpenAI
Spring AI also includes ETL (extract, transform, load) support on vector databases. This is modeled as a document reader, transformer, and writer. All the major vendors are covered.
Spring AI is also rolling out extensive embedding support. The EmbeddingModel interface abstracts the conversion of text into numeric format for a variety of providers.
Another area of complexity that Spring AI tackles is multimodality. This allows you to mix text and images. Here’s an example from the Spring AI documentation:
byte[] imageData = new ClassPathResource("/multimodal.test.png").getContentAsByteArray();
var userMessage = new UserMessage(
"Explain what do you see in this picture?", // content
List.of(new Media(MimeTypeUtils.IMAGE_PNG, imageData))); // media
ChatResponse response = chatModel.call(new Prompt(List.of(userMessage)));
Prompts help structure user input with whatever consistent framework your app requires—something like a view with variable interpolation. At first glance, prompts seem simple, but in fact thay can entail quite a bit of complexity, including framing content and contextual information like roles.
The StructuredOutput interface aids in structuring the output of models; especially important when channeling the output into another system as input.
Another interesting facet of AI development is testing, where using a model (possibly a different one from the primary) to evaluate the responses is an important approach. Spring AI includes support for this need.
Spring AI application example
We can take a look at the workings of a simple example in the Spring AI Azure Workshop repository. The repo includes a few examples, but let’s look at the simplest. This is a project in a Maven layout and the first thing to note is the application.resources
file, which contains the following line:
// src/main/resources/application.resources
spring.ai.azure.openai.chat.options.deployment-name=gpt-35-turbo-16k
This creates a property with the value of gpt-turbo-16k
. The name spring.ai.azure.openai.chat.options.deployment-name
is important because it’s tied by autoconfiguration to a Spring Bean configurator that will produce a ChatClient using that parameter. The following dependency in the pom.xml
provides that client:
org.springframework.ai
spring-ai-azure-openai-spring-boot-starter
In essence, when Spring scans the project looking for a ChatClient, it’ll use the property to make one using naming conventions in the openai
starter project. In the simple helloworld
example we are looking at, that ChatClient is called for by the controller:
package com.xkcd.ai.helloworld;
import org.springframework.ai.chat.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.util.Map;
@RestController
public class SimpleAiController {
private final ChatClient chatClient;
@Autowired
public SimpleAiController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/ai/simple")
public MapString, generation(
@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("generation", chatClient.call(message));
}
}
This is a typical Spring REST controller, where the chatClient
member is method annotated as @Autowired
. That ChatClient is then used to handle the requests at /ai/simple
. (Request parameter defaults are provided on the method, so a request with no parameters will be set as “Tell me a joke.”) The endpoint method returns a map with a “generation” key whose value is the return value of chatClient.call(message)
.
For all this to work, you need an API key for Azure. The key is set as an environment variable:
export SPRING_AI_AZURE_OPENAI_API_KEY=
You also need to tell the engine where the AI endpoint is located:
export SPRING_AI_AZURE_OPENAI_ENDPOINT=
With all those elements in place, you can run the project with $ maven spring-boot:run
. Now, if you visit localhost:8080/ai/simple, you should see an AI-generated joke.
Other examples in the Azure repository demonstrate how to layer on additional features to this basic frame. For example, you can easily add a prompt template to the example app:
// src/main/resources/prompts/joke-prompt.st
Tell me a {adjective} joke about {topic}
Which is used in the controller like so:
@Value("classpath:/prompts/joke-prompt.st")
private Resource jokeResource;
And then in the endpoint, you might add something like:
PromptTemplate promptTemplate = new PromptTemplate(jokeResource);
The case for Spring AI
An essential question to ask before adopting a technology is whether it is justified by the return on investment. In other words, look at the complexity and effort in relation to what the new tech brings to the table. A primary value proposition of a tool like Spring is to simplify and reduce complexity. Spring seeks to make things more consistent, then deliver extra features on top of that framework.
If you are seeking simplicity, you might be tempted to start out by integrating the AI providers you use by making calls from your application code to OpenAI or Google APIs, for instance. This approach has a directness to it. But if your project is already using Spring, adopting Spring AI might make more sense, especially in the longer term. The more complex and ambitious your AI use cases are—and AI definitely tends towards sprawling complexity—the more you will appreciate the structure, consistency, and templating you find in Spring AI.
Microsoft releases official OpenAI library for .NET 2 Oct 2024, 7:30 am
Microsoft has released its official OpenAI library for .NET, with the goal of ensuring a smooth, reliable integration experience for developers working with OpenAI and Azure OpenAI services in .NET applications.
Announced October 1, the now-stable library follows a beta release published in June. Installable via NuGet, the OpenAI library for .NET provides full OpenAI REST API support and full support for OpenAI flagship models including GPT-4o, GPT-4o mini, o1-preview, and o1-mini.
Other capabilities include:
- Extensibility, allowing the community to build additional libraries.
- Sync and async APIs, providing flexibility to use asynchronous or synchronous patterns depending on an application’s needs.
- Streaming completions, for accessing these completions via
, for more dynamic interaction models. - Numerous quality-of-life improvements.
- .NET 2.0 Standard 2.0 compatibility, via a library written in C#.
Supported on GitHub, the open source .NET library provides supported integration with OpenAI and Azure OpenAI and complements OpenAI’s libraries for Python and TypeScript/JavaScript developers, Microsoft said.
JDK 24: The new features in Java 24 1 Oct 2024, 10:15 pm
Java Development Kit (JDK) 23 having arrived September 17, work already has begun on JDK 24, with three features so far proposed for the release: warnings to prepare developers for future restrictions on the use of JNI (Java Native Interface) and a late barrier expansion for the G1 garbage collector. A multitude of other features, including many already in preview in JDK 23, also are possible for inclusion.
Due March 18, 2025, JDK 24 has been designated a non-long-term support (LTS) release. Like just-released JDK 23, JDK 24 will receive only six months of Premier-level support from Oracle.
The first JDK 24-targeted feature, officially called “Prepare to Restrict the Use of JNI,” calls for issuing warnings about uses of JNI and adjusting the foreign function and memory (FFM) API, featured in JDK 22, to issue warnings in a consistent manner. These warnings are intended to prepare for a future release that ensures integrity by default by uniformly restricting JNI and the FFM API. Goals of the plan include preserving JNI as a standard way to interoperate with native code, preparing the Java ecosystem for future releases that disallow interoperation with native code by default, and aligning the use of JNI and the FFM API so library maintainers can migrate from one to the other without requiring developers to change command-line options.
The second feature, late barrier expansion for the G1 garbage collector, is intended to simplify the implementation of G1’s barriers. The G1 garbage collector’s barriers record information about application memory accesses, by shifting their expansion from early in the C2 compilation pipeline to later. Goals include reducing the execution time of C2 compilation when using the G1 collector, making G1 barriers comprehensible to HotSpot developers who lack a deep understanding of C2, and guaranteeing that C2 preserves invariants about the relative ordering of memory accesses, safepoints, and barriers. A fourth feature is preserving the quality of C2-generated JIT (just-in-time)-compiled code, in terms of speed and size.
A third feature, the class-file API, previously previewed in JDK 22 and JDK 23, would be finalized in JDK 24, with minor changes. This API provides a standard API for parsing, generating, and transforming Java class files. It aims to provide an API for processing class files that tracks the class file format defined by the Java Virtual Machine specification. A second goal is to enable JDK components to migrate to the standard API, and eventually remove the JDK’s internal copy of the third-party ASM library. Changes since the second preview include a renaming of enum values, removal of some fields, the addition of methods and method overloads, methods renamed, and removal of interfaces and methods deemed unnecessary.
Additional features targeting JDK 24 will be determined during the next several months. Potential Java 24 features include further previews or final releases of features being previewed in JDK 23. These include stream gatherers, to enhance the stream API for custom intermediate operations; module import declarations, for succinctly importing all packages exported by a module and simplifying reuse of modular libraries; structured concurrency, to simplify concurrent programming; scoped values, for sharing immutable data; and flexible constructor bodies, giving developers greater freedom to express behavior of constructors.
Another feature in preview in JDK 23 and a contender for JDK 24 is primitive types in patterns, instanceof, and switch, which aims to enhance pattern matching by allowing primitive type patterns in all pattern contexts, and to extend instanceof
and switch
to work with all primitive types. Another possible JDK 24 feature is the vector API, now in an eighth incubation stage in JDK 23. The vector API is geared to expressing vector computations that reliably express at runtime to optimal vector instructions on supported CPU architectures. Ahead-of-time class loading, a feature designed to speed Java startups, and string templates, a feature previewed in JDK 21 and JDK 22 but dropped from JDK 23, could also be targeted to JDK 24.
The most-recent LTS release, JDK 21, arrived in September 2023 and is due to get at least five years of Premier support from Oracle. The next LTS version, JDK 25, is due in September 2025. LTS releases have dominated Java adoption, which means adoption of JDK 23 and JDK 24 could be on the low end as users await JDK 25.
PyTorch library makes models faster and smaller 1 Oct 2024, 9:29 pm
The PyTorch Foundation, makers of the PyTorch machine learning framework, has launched torchao, a PyTorch native library that makes models faster and smaller by leveraging low-bit dtypes, sparsity, and quantization. It is a toolkit of techniques that span both training and inference, Team PyTorch said.
Unveiled September 26, torchao works with torch.compile()
and FSDP2
over most PyTorch models on Hugging Face. A library for custom data types and optimizations, torchao is positioned to make models smaller and faster for training or inference out of the box. Users can quantize and sparsify weights, gradients, optimizers, and activations for inference and training. The torchao library serves as an accessible toolkit of techniques mostly written in easy-to-read PyTorch code spanning inference and training, according to Team Pytorch. Featured is torchao.float8
for accelerating training with float8 in native PyTorch.
Released under a BSD 3 license, torchao makes liberal use of new features in PyTorch and is recommended for use with the current nightly or latest stable release of PyTorch, Team PyTorch advises.
Two good Visual Studio Code alternatives 1 Oct 2024, 9:30 am
The conductor of my choir famously tells us singers, “I only want everything all the time.” Well, as a developer, my mantra for code editors and IDEs (integrated development environments) is exactly that: I only want everything all the time. What’s “everything” in this context in 2024? It has to be fast enough not to get in my way. It has to be able to support my software development life cycle in all the programming languages that I write, with, at a minimum, syntax highlighting, and preferably some in-line auto-completion and syntax checking. And it has to be able to act as my pair programmer at a useful level.
The four alternatives to Visual Studio Code we’ll examine below—Zed, Eclipse Theia IDE, Lite XL, and Cursor—are all available for Linux, macOS, and Windows, with the slight exception of Zed, which is currently available for Linux and macOS only. The Zed project says a Windows version is coming “soon.” All four products are free, except Cursor, which is available in a limited free version, a pro version that costs $20 per month, and a business version that costs $40 per month (see pricing).
All four products can be extended to support AI-assisted coding, but Zed and Cursor also have native AI integrations that set them apart.
AI-powered pair programming is evolving quickly. It used to be controversial; now most programmers accept AI help for code completion and sometimes for code generation.
I have to repeat the obligatory warning, however: AI code generation is only safe if you knowledgeably review, understand, test, and debug the code. It’s not for beginners. It’s not for generating code in a language you don’t know. Generating code for an unfamiliar library or framework may be OK, as long as you also familiarize yourself at least to the point where you can tell safe usage from unsafe usage.
If you use AI code generation without doing your due diligence, you may still boost your productivity as a programmer. You may, however, be producing buggier, less efficient code; in other words, garbage, dreck, crap.
Inside Visual Studio Code
Visual Studio Code is a free, lightweight, and yet powerful source code editor that is available for Windows, macOS, Linux, and Raspberry Pi OS. It comes with built-in support for JavaScript, TypeScript, and Node.js and it has a rich ecosystem of extensions for other programming languages (such as C++, C#, Java, Python, PHP, and Go), and for runtimes (such as .NET and Unity), environments (such as Docker and Kubernetes), and clouds (such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform). As I put it in a previous article about VS Code, “Come for the fast editing. Stay for the debugging, source code management support, and huge ecosystem of extensions.”
The code in Microsoft’s Visual Studio Code (vscode) repository is open source under the MIT License. The Visual Studio Code product itself ships under a standard Microsoft product license, as it has a small percentage of Microsoft-specific customizations. It’s free despite the commercial license. The vscode repository is formally called Visual Studio Code – Open Source (“Code – OSS”).
The core Code-OSS repo isn’t the whole story, however. The Code project consists of the vscode repository plus a number of satellite projects. One of the more important satellite projects is the Language Server Protocol, which defines the protocol used between an editor or IDE and a language server that provides language features like auto complete, go to definition, find all references, etc. The goal of the Language Server Index Format (LSIF, pronounced like ‘else if’) is to support rich code navigation in development tools or a Web UI without needing a local copy of the source code. Another important satellite project is the standalone Monaco Editor, which is the full-featured code editor from Visual Studio Code.
Because Visual Studio Code is a hybrid app built on Chromium and Node.js running inside a native Electron shell, it isn’t as small or as fast as it would be if it were written as a native application. That’s an opportunity for those who want to build smaller, faster code editors and IDEs.
Inspired by Visual Studio Code: An overview
It shouldn’t come as a surprise that others have built products that look and behave like Visual Studio Code, and may even use parts of Code OSS, both for the desktop and the cloud.
Zed is billed as “a next-generation code editor designed for high-performance collaboration with humans and AI.” Zed looks like VS Code and has many of its features, but it was “written from scratch in Rust to efficiently leverage multiple CPU cores and your GPU.” It’s about half the size of VS Code and much faster on my M1 MacBook Pro. It’s even faster than Sublime Text. Zed also has access to many large language models (LLMs), both inline and in a separate window, in the spirit of Tabnine.
IDG
Eclipse Theia is a framework for building IDEs and other development tools; the Eclipse Theia IDE is a cloud and desktop IDE built on the Theia framework. Theia IDE actually uses the Monaco editor component, and provides language support via Visual Studio Code’s LSP (Language Server Protocol) and DAP (Debug Adapter Protocol). Further, it can host VS Code extensions (it has 83 extensions built-in, although most are very basic) and provides terminal access. About half-a-dozen other products have been built from the Theia framework. Theia IDE is neither smaller nor faster than VS Code, and has numerous issues.
IDG
Lite XL is pitched as a “lightweight, simple, fast, feature-filled, and extremely extensible text editor written in C, and Lua, adapted from lite.” It supports a LSP plug-in for language support, and a terminal plug-in. (I was able to install the terminal plug-in, but not get it to run on my M1 MacBook Pro.) Various other common editing features are also implemented as plug-ins. Lite XL is about a tenth of the size of Visual Studio Code. It installs on macOS using MacPorts. It is also supposed to install using builds on its releases page, but that version wouldn’t start on my M1 MacBook Pro.
IDG
Cursor, a fork of Visual Studio Code, is built to make you “extraordinarily productive,” and claims to be “the best way to code with AI.” Cursor has several code completion and chat models of its own built-in, and can also use Claude Opus and the premium models GPT-4, GPT-4o, and Claude 3.5 Sonnet. In the course of this review, more models were released, such as o1-mini. Exactly what models you can call, how often, and at what priority, depend on your Cursor and GitHub Copilot subscription plans and what API keys you supply. Cursor’s own code completion does more than GitHub Copilot. It can edit the code around your cursor and remove text, not just insert text at your cursor position. Cursor is about the same size as VS Code.
IDG
Comparing the contenders
Overall, I find Theia IDE too slow and underpowered for my taste. I currently can’t get many plug-ins to run in Lite XL, meaning that I can only use it as a basic programming editor, albeit a small, fast, theoretically extensible programming editor. The other two Visual Studio Code alternatives, Zed and Cursor, are more to my taste. It’s worthwhile comparing the two to VS Code and to each other.
Zed
The Zed editor was developed by the team that wrote Atom, Electron, and Tree-sitter. They didn’t want to use a JavaScript/Electron framed web application approach again. Instead, they decided to go for speed, using Rust and GPUI. Rust, as you probably know, is a fast, memory-efficient, compiled programming language that has features to ensure memory safety and thread safety.
Have I mentioned that Zed is fast? Compared to Visual Studio Code, it’s night and day. I already knew first-hand that VS Code is slower than Sublime Text, but I thought the slowdown was an inevitable consequence of VS Code being an IDE rather than an editor. Zed proves that what I thought is wrong.
Using Rust rather than JavaScript would only partially explain the speed of Zed. The other major factor is the use of GPU acceleration, which is a little surprising because some other new editors haven’t been able to achieve any noticeable speed boost from trying to use the GPU.
GPUI is a GPU-accelerated UI framework for Rust, developed by the Zed team in parallel with Zed, and currently part of the Zed repository. On macOS, GPUI uses Metal, an Apple-specific integrated graphics and compute API with a shading language for rendering. Metal has functionality that combines that of OpenGL GPU-accelerated vector graphics and OpenCL parallel programming of diverse accelerators. Using the Metal API is something I would expect in a real-time game, but not an editor.
On Linux, Zed requires a Vulkan 1.3 driver; Vulkan is a GPU-only API usually used for real-time game programming. Zed can use either the X11 window manager or Wayland, a newer combined window manager and display server, depending on which is present in the Linux system. If both are present, Zed will prefer Wayland unless you tell it not to.
How does Zed know whether to use Metal or Vulkan? I didn’t find an explicit platform-dependent switch
or if
statement, as I would expect in C/C++ code. As I discovered by asking the Zed Assistant while browsing the GPUI app.rs file in Zed, GPUI uses a Platform
trait to abstract platform-specific functionality, and the actual platform implementation is injected when creating the app.
Traits are a feature of the Rust language; they define the functionality or behavior a particular type has, a bit like interfaces in Java or C#. Using a file search for “platform,” I found an implementation of platform-dependent code injection using #[cfg(target_os = …)]
directives in platform.rs. That Rust directive is similar to the C/C++ #ifdef
preprocessor directive.
I mentioned using the Zed Assistant to explain code. Zed’s generative AI integration has two parts: the inline assistant, and the Assistant panel. The Assistant panel is more than an AI chat — it’s a context editor.
That bears explanation. Zed calls the Assistant panel a context and documents that: “Contexts are like conversations in most assistant-like tools. A context is a collaborative tool for sharing information between you, your project, and the assistant/model.” This allows you to work back and forth between the code in your project and your conversation with the model. It’s habit-forming once you get used to it. (See the eight-minute introductory video to get an idea how all this works.)
Slash commands augment what you can do in the context window. They range from simple insertions, such as /now
(time and date) and /tab
(contents of active or open tabs), to /workflow
(establishing a context for the assistant to suggest edits to your code). In addition, Zed keeps a history of all your queries and its responses so that you can always look at them again.
The Zed inline assistant is closer to what you might expect from Visual Studio Code’s inline GitHub Copilot assistant, but still has additional twists. Press ctrl-enter or click the assistant icon at the top right of the edit pane, and you’ll either send the current line or the current selection to the model for completion. The inline assistant uses the assistant panel to provide context or instructions that direct the model about how to modify the inline code. You’ll get the hang of that quickly, even though it may be confusing at first.
Zed can use models from Anthropic, GitHub Copilot Chat, Google, Ollama, and OpenAI. You get more access if you supply the corresponding API keys or, in the case of GitHub Copilot, have a subscription that Zed can look up. To use open source (or partially open source) models like Llama, Mistral, and Gemma in Ollama, you need to be running Ollama in your machine, which takes significant memory.
Zed has an extensive collaboration feature that includes topic-specific channels and named contacts. As I’m kind of a lone wolf these days, I rarely use such things. If you’re working on a project with several other people, it sounds like it should be useful, but you have to trust anyone you allow to collaborate with you, since the feature gives them access to your local file system.
Zed has most of the editing features you’ll find in VS Code, Sublime Text, and BBEdit. These include multi-buffer editing, interactive programming with REPLs and notebooks, support for many languages, a terminal and task runner, full Vim bindings, and remote development (still in preview).
IDG
Cursor
Since Cursor AI is a fork of VS Code, it’s no faster than VS Code, and not nearly as fast as Zed. Its major improvement over VS Code is its handling of code completion and chat. Some of that couldn’t have been accomplished as an extension.
The distinguishing features of Cursor are mostly focused around AI. When generating code, Cursor sees your recent changes, so it can predict what you want to do next. It can suggest multiple edits at once, something that you’ll also find in Zed. If you type incorrect code that’s close enough to guess your intent, Cursor will fix it. Cursor can also predict your next cursor position.
In the chat tab, Cursor can answer questions about your code base, refer to code you reference, use images as input, and refer to the web if you ask it to. It can apply code suggestions from chat back into your code base, and refer to libraries or documentation that you mention.
In a code context, you can pop up a prompt bar to give Cursor instructions, and optionally supply a highlighted range. You can also ask quick questions from this bar. Finally, you can ask Cursor for help in the terminal window and it will generate commands.
IDG
While I’m not going to abandon Visual Studio Code or Sublime Text, or give up on the free level of Cursor, I expect to adopt Zed as my primary code development environment. It’s currently free. I don’t know how I’ll react, however, if they ask for a subscription price that’s over my budget.
The battle cry of 2025: Do cloud local! 1 Oct 2024, 8:45 am
Microclouds can have many names and offer many features. They can be hosted by a regional cloud provider or specialize in a specific use case, such as the rise of GPU cloud providers focusing on the exploding AI space. They could even be clouds that don’t have an international presence and are instead located in a single country.
Microcloud vendors are all moving toward the same objective: Provide alternatives to the big public cloud providers by offering more business value and personalized services. So far, it’s working.
Rise of regional providers
The public cloud computing landscape, dominated by Amazon Web Services, Google Cloud, and Microsoft Azure, is shifting as regional providers emerge. Indeed, it’s possible that the tech giants are facing greater challenges than they are ready to acknowledge.
Concerns about dependency, data sovereignty, and the lack of competition have spurred demand for alternatives. Many businesses are also looking for providers that work in specific geographical regions or that understand the privacy or compliance regulations for a particular region or country.
An excellent example of this is Lidl grocery stores, a German discount retail chain that operates more than 12,000 stores worldwide. Schwarz Digits is Lidl’s former IT division that has emerged as a leading regional cloud provider with digital products and services that comply with strict data protection standards and sovereignty laws. Last year it transitioned into a stand-alone entity, generating €1.9 billion in revenue by 2023 with a workforce of 7,500. That’s not micro.
Its rise centers on offering data sovereignty. This is especially attractive for companies subject to strict European data protection regulations like General Data Protection Regulation (GDPR). By storing data solely in Germany and Austria, Schwarz Digits has attracted clients seeking robust data control that prioritizes privacy and regulatory compliance.
I’m often teased when I use this as an example. “You trust a grocery store to host your data?” Well, countless enterprises trust a bookseller with their cloud computing products and services. This is not much different.
Promising alternatives
Does this mean I am promoting microclouds as the new go-to option for public cloud services? Not at all. For many companies, it’s crucial to have an international presence and access to thousands of cloud services, from AI toolkits to databases to development ecosystems. The new regional providers won’t be able to compete with that level of products and services, but that doesn’t matter.
None of these smaller players talk about wanting to knock the Big Three providers off their pedestals. Instead, they are looking to provide a value-driven alternative that can return more value to certain businesses. With cost being a significant concern for those using the larger providers, this is an alternative offered at the right time in the maturation of the cloud market.
Microclouds offer solutions that address the deficiencies of global giants. Besides data sovereignty, their agility enables them to rapidly adapt solutions to meet local regulatory and cultural demands, unlike larger cloud providers’ one-size-fits-all approach. However, these regional entities can’t scale to match the global infrastructure. They are unlikely to make that investment anytime soon. Also, developing extensive partner networks is a requirement for continuing customer acquisition and service maintenance.
In their defense, the regional entities will quickly recommend that you go to the Big Three if you’re looking for those attributes. They are focused on providing essential services at a good value, which is what many businesses are looking for. Enterprises want storage, compute, and security that is sound and cost-effective. Indeed, if that is all they are utilizing from a more prominent provider; they might as well pay the lower price of a microcloud. The larger providers are sending many smaller enterprises to the poorhouse, given the cost of these cloud services in 2025.
Market adjustments
The cloud market is expected to diversify as regional providers chip away at the giants’ dominance. Although these smaller vendors face obstacles in scalability and credibility, their entrance introduces essential competition and aligns services better with local needs. The future of the cloud ecosystem will likely involve both global and regional players.
So, is this a massive shift? Not really. It’s an adjustment. Alternatives are needed given the cost-to-value problems that public cloud computing seems to be developing. Microcloud providers fill that need to the point that many enterprises may rely entirely on one of two of these smaller players and kick the big providers to the curb. This won’t be a drastic market upheaval that makes us suddenly take notice, but it’s a movement, nonetheless.
Breaking through AI data bottlenecks 1 Oct 2024, 8:30 am
As AI models become increasingly commoditized, the data required to train and fine-tune them has never been more critical. While procuring high-quality data is expensive and raises privacy concerns, there is a powerful alternative that companies like Google and JPMorgan are exploring: synthetic data. As enterprises move beyond experimenting with general-purpose models to fine-tuning their own specialized AI models, synthetic data is emerging as a key solution to break through common bottlenecks and drive the next wave of innovation.
Here‘s how synthetic data is addressing three major bottlenecks in specialized machine learning and AI model development.
The data scarcity bottleneck
One of the most significant bottlenecks in training specialized AI models is the scarcity of high-quality, domain-specific data. Building enterprise-grade AI requires increasing amounts of diverse, highly contextualized data, of which there are limited supplies. This scarcity, sometimes known as the “cold start” problem, is only growing as companies license their data and further segment the internet. For startups and leading AI teams building state-of-the-art generative AI products for specialized use cases, public data sets also offer capped value, due to their lack of specificity and timeliness.
While major players like OpenAI are dredging the internet for any potentially useful data — an approach fraught with consent, copyright, privacy, and quality issues — synthetic data offers a more targeted, secure, and ethical solution. By synthesizing unlimited variations and edge cases based on existing seed data, synthetic data allows organizations to:
- Expand limited proprietary data sets or even expand on seed examples from expert users to form a robust foundation for training specialized models.
- Create data for rare or “what if” scenarios that may not exist in real-world data sets.
- Rapidly iterate and experiment with different data distributions and curations to optimize model performance.
Synthesizing data not only increases the volume of training data but also enhances its diversity and relevance to specific problems. For instance, financial services companies are already using synthetic data to rapidly augment and diversify real-world training sets for more robust fraud detection — an effort that is supported by financial regulators like the UK’s Financial Conduct Authority. By using synthetic data, these companies can generate simulations of never-before-seen scenarios and gain safe access to proprietary data via digital sandboxes.
The data quality and management bottleneck
Even when organizations have substantial data, they often face a bottleneck in terms of data quality and organization. This issue manifests in at least three ways:
- Data drift and model collapse: Existing training data sets may become outdated or irrelevant over time, leading to models progressively losing their ability to accurately represent the full spectrum of real-world scenarios they need to account for.
- Incomplete or unbalanced data: Real-world data sets often have gaps or biases that can skew model training.
- Lack of proper annotation: Effective model training requires well-labeled data, but manual annotation is time-consuming and prone to biases and inconsistencies.
Synthetic data breaks through this bottleneck by:
- Generating high-quality data to fill gaps in existing data and correct for biases.
- Creating fully annotated information tailored to specific industry rules or compliance requirements, eliminating the need for manual labeling.
- Allowing for rapid scaling of the data annotation process, significantly reducing time and resource constraints.
Using synthetic data results in cleaner, more organized data that can dramatically improve model accuracy and efficiency.
The data privacy and security bottleneck
For many organizations, especially those in highly regulated industries, data privacy and security concerns create a significant bottleneck in AI development. Stringent privacy standards and tightening regulations, such as the GDPR and EU AI Act, restrict the amount of valuable data that is usable for AI initiatives.
Synthetic data, when combined with modern privacy-preserving techniques like differential privacy, shatters this bottleneck by serving as a secure interface to access rich data insights without compromising individual privacy. This approach allows organizations to:
- Leverage sensitive data that would otherwise be off-limits for AI training.
- Safely share and collaborate on data-driven projects across departments, between organizations, and in the public, open community.
- Comply with stringent data protection regulations and respect consumer privacy, while still advancing applied science and innovating with AI.
In the healthcare sector, synthetic data is enabling companies to safely anonymize and operationalize data from electronic health records and transcripts, powering use cases from analytics to customized LLM training sets without compromising patient privacy.
The path forward: Synthesized data
By breaking through these critical bottlenecks, synthetic data is democratizing access to AI innovation and enabling the development of highly specialized, sustainable AI models that were previously out of reach for many organizations.
As we move forward, the quality, relevance, and ethical use of training data will increasingly determine the success of AI initiatives. It’s no longer just about how sophisticated your model is, but how good your data is.
Synthetically designed data is cleaner, more customizable, less biased, and faster than traditional real-world data. It opens up new possibilities for safe data collaborations and AI development that will benefit startups, scientists and researchers, global brands, and governments alike.
As AI continues to evolve, the role of synthetic data in breaking through bottlenecks and enabling agile and iterative model training will only grow in importance. Organizations that embrace this technology now will be well-positioned to lead in the AI-driven future.
Alex Watson is co-founder and chief product officer at Gretel.
—
Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
The worst programmer I know 1 Oct 2024, 8:30 am
If all code is legacy code, then we are all maintainers. I carry an image in my head of a vast sea of silent, diligent, dedicated coders who silently maintain much of the code that makes the world go round. I carry this image because I am one of those coders.
For the better part of the last 15 years, I have either directly maintained 30-year-old code or managed developers who did. And I don’t think I’m alone. I figure for every developer discussing cutting-edge topics on Hacker News, there are seven or eight of us out there, quietly grinding away at keeping that inventory system running on the current version of Windows, despite large chunks of the running code having originally been written in Turbo Pascal 3.0 in 1987.
It is often said, “The worst programmer I know is me six months ago.” But I know a programmer worse than that — the developer (maybe it was me) who wrote that code from 37 years ago.
I have spent a serious amount of time trying to decipher what that person did. What in the world were they thinking? Who thought that 13 nested if statements were a good idea? Who decided that five interlocking boolean conditions should go into the fourth nesting of those statements? Hadn’t they heard of an explaining variable? What the heck does the literal value 4 mean here? 9280 lines is a bit much for a single procedure, don’t you think?
So I shake my head, puzzle it out, and fix that bug. Or I wedge a new feature into that thousands-of-lines-of-code procedure. And I marvel that the code still runs, still (mostly) works, and still lives on.
Sure, we all laugh at the crazy stuff developers did back then. But it occurs to me that a little humility is in order. To our modern eye, which has the benefit of the wisdom of the past few decades, that code looks terrible. But maybe we just didn’t know any better back then.
Standing on spaghetti
At the time, the tools casually encouraged you to write code that today makes us cringe. Much of the code we maintain came from the “onClick” era. Back in the 1990s, “rapid application development” (RAD) was all the rage, and it practically begged us to tightly couple our business logic to our user interface. Double-click on the button and start coding! What could be better?
Back then we were all worried that our administrative assistants were going to use these RAD tools to put us developers out of business. Today we worry that it will be AI. I don’t even want to think what the next thing will be that will allegedly put skilled developers out to pasture. (I’m not a believer that it will ever happen, but that doesn’t stop pundits from scaremongering.)
My awkward dance with Turbo Pascal ought to remind us that the code we write today will likely be viewed in exactly the same way by some developer (maybe it will be you) 37 years from now.
Our profession grows and improves, and we find better ways to do things. We stand on the shoulders of giants. Software engineers like Martin Fowler, Robert C. “Uncle Bob” Martin, Kent Beck, Michael Feathers, and Grady Booch all noticed — and then codified — practices we all take for granted today.
For instance, object-oriented programming (OOP) revolutionized the way we write code. Yet today, the way we do OOP is vastly different from the way we did even 10 or 15 years ago.
We wrote a lot of OOP code and eventually figured out principles like “Prefer composition over inheritance” and “Encapsulation is the most important pillar.” It was through hard-won experience that Uncle Bob Martin developed the SOLID principles in 2000.
Beck didn’t start teaching us to write better code through test-driven development until the early 2000s. Fowler didn’t introduce the term “dependency injection” until 2004. Even today people don’t code against abstractions enough. There was a lot of coupled, multiple-responsibility, abstract code written before these folks and others saw better ways, and yet we still argue over whether they are right and whether these practices do indeed make our code better.
What goes around
So that takes me back to the current situation. I have to confess that for many years, I’d whine and complain about the code I work with. I’d shake my head in wonder and ask how anyone could write such crappy code. I’d think that it all needs to be rewritten and how could this possibly work anyway?
But at some point, I’d come to realize that it does work. It actually works out there on customers’ machines in the wild. And I decided that I respected that. Sure, fixing things takes more time than I’d like, and adding features can often reinforce the “bad” coding practices, but working code is the ultimate goal, right? And if it works today…?
There is a strong temptation to say “We need to rewrite this to make it better,” and maybe that is the better long-term solution. It’s often hard to say. But for the “vintage” code out there, maybe “fixing bugs and making it work correctly no matter what” is kind of a form of “better.” Sometimes the right thing to do is just fix the bug and add 10 more lines of code to that 1500-line procedure.
So as I work through some code that makes little sense to the modern eye, I try to remember that when that code was written, it was probably cool and innovative and exciting. The person who wrote it was doing their best with what they had and what they knew.
And I try to remember that there is definitely some eight-year-old kid out there who one day will look at the new, modern, and cool code I write today and shake her head in wonder and mild disgust, asking “How could he have possibly written that?”
So maybe the worst programmer I know is me, right now.
JRuby 10 due to arrive in early-2025 1 Oct 2024, 8:00 am
JRuby, which dates back to 2001 as a Ruby language implementation for the JVM, is set to arrive at version 10 shortly after the new year. JRuby 10 promises to be fully compatible with Ruby 3.4.0, a planned update to Ruby that brings changes for frozen string literals and class updates. JRuby 10 will also support Rails 7.1 and later versions.
This will be the first time JRuby is released with compatibility with the latest version of Ruby, said JRuby project co-leader Charles Oliver Nutter, architect and technologist at Headius Enterprises, which supports JRuby. With the planned JRuby 10 release, a minimum of Java 17 or Java 21 will be required, enabling JRuby to take advantage of more-modern JVM features, Nutter said.
The developers of JRuby are particularly interested in virtual thread support from Java’s Project Loom, to implement fibers, a key Ruby feature, Nutter said. “I think developers should be excited about JRuby because we are constantly pushing the edges of what Ruby and dynamic languages can do on the JVM, and taking advantage of as many new OpenJDK features as we can,” Nutter said on September 28. “We continue to support all compatible JVMs on a broad range of platforms, and are still the best way to scale Ruby and Rails applications to enterprise levels.” Nutter co-leads the development of JRuby with Thomas Enebo.
Ruby itself, meanwhile, has entered a utilitarian phase of life, still powering a large number of new applications in startups but not making big headlines anymore, Nutter said. As a result, more and more companies are turning to JRuby, which benefits from features of the JVM that support the building of desktop applications, mobile apps for Android, and deployment of exotic operating systems, Nutter said.
The current release of JRuby is version 9.4.8.0. Headius Enterprises was launched in July 2024 to provide continuing support for the JRuby project and users. The company has taken over the development of JRuby from Red Hat, which no longer sponsors the project, Nutter said.
Large language models hallucinating non-existent developer packages could fuel supply chain attacks 30 Sep 2024, 9:28 pm
Large Language Models (LLMs) have a serious “package hallucination” problem that could lead to a wave of maliciously-coded packages in the supply chain, researchers have discovered in one of the largest and most in-depth ever studies to investigate the problem.
It’s so bad, in fact, that across 30 different tests, the researchers found that 440,445 (19.7%) of 2.23 million code samples they generated experimentally in two of the most popular programming languages, Python and JavaScript, using 16 different LLM models for Python and 14 models for JavaScript, contained references to packages that were hallucinated.
The multi-university study, first published in June but recently updated, also generated “a staggering 205,474 unique examples of hallucinated package names, further underscoring the severity and pervasiveness of this threat.”
The problem has its roots in the popularity of numerous Python and JavaScript libraries and packages which developers use to help them quickly assemble programs from lots of smaller parts.
Many popular repositories already have a problem with malicious code. The researchers note that one 2023 study discovered 245,000 malicious packages in open-source repositories alone.
Hallucination nightmare
Unfortunately, the current study suggests that the arrival of AI is only going to make things worse through LLM “package hallucination.”
LLMs are already notorious for hallucinations — making up nonsense answers to queries. This also happens in coding: the developer inputs a coding prompt into the LLM and occasionally receives back a nonsense answer.
In the case of package hallucination, the LLM goes one stage further and recommends or generates code including a package in a software repository that doesn’t exist.
Normally, this would cause any code referring to it to fail. However, a second possibility is the “package confusion” attack in which attackers generate hallucinated packages before seeding them with malware to bring them into existence.
The next stage would be to trick developers into downloading them so that they are eventually included inside larger legitimate programs. They could even make the code legitimate to start with, to increase trust before unleashing a payload later on.
“Unsuspecting users, who trust the LLM output, may not scrutinize the validity of these hallucinated packages in the generated code and could inadvertently include these malicious packages in their codebase,” say the researchers.
“This resulting insecure open-source code also has the potential of being included in the dependency chain of other packages and code, leading to a cascading effect where vulnerabilities are propagated across numerous codebases.”
Open sesame
If none of the LLMs on test were immune to the problem, some were noticeably worse than others.
“GPT-series models were found four times less likely to generate hallucinated packages compared to open-source models, with a 5.2% hallucination rate compared to 21.7%,” the study noted.
Python code was also less susceptible to the phenomenon than JavaScript, the study found.
Package confusion attacks on repositories have been around for years, usually involving typosquatting (exploiting name similarity) or brandjacking. On top of that are more conventional attacks where criminals upload malicious packages to repositories, or simply corrupt legitimate packages.
Hallucination could supercharge this, the researchers argue. Earlier in 2024, researcher Bar Lanyado of Lasso Security sent a small shudder through parts of the developer world when he discovered that several large companies, including ecommerce giant Alibaba, were using or recommending a Python software package called “huggingface-cli”.
The package was completely hallucinated. When he uploaded an empty package with the same name to a repository to test its popularity, it was downloaded more than 30,000 times in a three-month period.
In other words, large numbers of developers were downloading an imaginary software package because an LLM had at some point hallucinated its existence to solve a specific programming task.
So far, no live package confusion attacks have been detected, but the fact that the possibility is now widely known suggests that a real-world incident is only a matter of time.
Is there a solution?
The authors discuss a variety of mitigations for the hallucination problem. One solution that they believe would not work is cross referencing generated packages with some kind of master list; that might detect bogus packages, but wouldn’t stop them from becoming active threats in the same way that other software threats operate.
A better solution, they say, would be to first address the underlying issue of why LLMs generate hallucinations in the first place. This might involve better prompt engineering and the use of Retrieval Augmented Generation (RAG) to generate narrower responses from specialized data sets.
In addition, the LLMs themselves could be fine tuned to improve output on tasks more likely to generate hallucinations. For that sort of difficult improvement, the world will need the LLM developers themselves to act.
But nobody should hold their breath.
“We have disclosed our research to model providers including OpenAI, Meta, DeepSeek, and Mistral AI. As of this writing we have received no response or feedback,” the authors noted in a recent update to the study.
Page processed in 0.038 seconds.
Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2024, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.