Microsoft's Flagship Database Is Now Built for AI: SQL Server 2025
Advancing the 'state of the art' in relational databases with Web-scale vector search
Welcome to the Cloud Database Report. I’m John Foley, a long-time tech journalist who has also worked in strategic comms at Oracle, IBM, and MongoDB. Now I’m an independent tech writer. Connect with me on LinkedIn.
If there’s one thing to know about Microsoft’s forthcoming SQL Server 2025 database, it’s this: AI support is now built into the database.
What does that mean? One big step is an advanced vector index, developed by Microsoft Research, that has been integrated into the database. So customers will be able to run AI workloads — semantic search, RAG, AI agents — with the development, performance, and management advantages of built-in AI. And there’s more, including integration with AI frameworks, improved model management, and the ability to build and deploy AI agents.
Microsoft released SQL Server 2025 for public preview at its annual Build conference in mid-May along with dozens of other announcements, including a new coding agent for GitHub Copilot, new language models in Azure AI Foundry, general availability of Azure AI Foundry Agent Service, and data agents in Microsoft Fabric. You can see the top 25 Azure announcements made at Build here.
To get a better understanding of what’s new and noteworthy in SQL Server 2025, I reached out to Shireesh Thota, Corporate VP of Databases at Microsoft. I first met Shireesh a few years ago when he was at SingleStore. He’s a Microsoft boomerang, having spent 15 years in Redmond prior to SingleStore. Shireesh is deeply entrenched in Microsoft database tech and was an original member of the team that developed Microsoft’s NoSQL database, Cosmos DB.
It’s been three years since SQL Server has had a major upgrade. The current release is SQL Server 2022. SQL Server is an on-premises database, and Azure SQL is the cloud version that’s built using the same database engine. Because Azure SQL is a cloud database service, new features are typically introduced there first.
In fact, Azure SQL is more than just a cloud database. It’s a range of platform- and infrastructure-as-a-service offerings, and customers can run SQL Server in a virtual machine as a managed service on Azure SQL. So, while SQL Server is best described as an “on-premises database,” another way to think about it is that it’s part and parcel of a hybrid architecture.
“This is our mechanism to advance the state of the art in terms of the relational engine,” says Thota. “We make sure that developers and customers get the best relational engine no matter where they’re running it.”
Other building blocks of Microsoft’s data architecture include Azure Cosmos DB, Azure DB for PostgreSQL and MySQL, a SQL database in Microsoft Fabric, Azure Data Lake and OneLake, Azure Data Factory (for data transformation), Synapse, Azure Data Studio, Azure AI Foundry, and Azure Agent Service. Microsoft Fabric is the unifying analytics layer for it all.
Vector search at Web scale
Microsoft is touting SQL Server 2025 as its “AI release,” which should be cause for celebration by the SQL Server faithful. Of course, there’s a lot more to it, including advances in security and encryption, resource management, concurrency, and failover and recovery, along with fine-grained performance tweaks. [In separate but related news, Microsoft this week released software patches for SQL Server security vulnerabilities.]
A few of the enterprise-class enhancements that Thota highlighted for me are optimized locking for improved concurrency, TempDB space management, and distributed availability groups for data replication.
There are also a boatload of improvements for developers, including a free standard developer edition of SQL Server 2025, native support for JSON data, and GitHub Copilot for AI-assisted coding. These and other enhancements make SQL Server 2025 the “most significant release for SQL developers in the last decade,” according to Microsoft.
That may be true, but AI steals the spotlight. “SQL Server 2025 is the first release where we’re basically putting AI capabilities right into the heart of the engine,” says Thota. Topping the list is integrated support for vector data types, functions, and indexing.
A note about vectors: Vector data, a specialized data type used by scientists and researchers for years, has entered the mainstream with AI. Vectors are long strings of numbers representing documents, images, and other data. Pinecone and Weaviate offer purpose-built vector DBs, while established players such as AWS, Google Cloud, and Oracle have added vector support within their existing platforms.
With SQL Server 2025, Microsoft joins the camp of vendors with integrated databases that support vectors along with other data types (objects, spatial, XML, graph, etc.). “We are very much on the integrated train,” says Thota. “The reason is we believe it’s easier for customers to embrace GenAI applications in conjunction with their operational workloads.” It also results in less complexity for customers, he adds.
With vectors built into the database, developers and IT teams can build and support use cases that employ semantic search, text search, RAG, and LLMs within the same system used for operational data (sales, inventory, etc.). The advantages of this approach include versatility in data types and workloads and the ability to leverage an organization’s existing skills vs. having to learn a special-purpose vector database.
Microsoft’s vector-indexing technology is called Disk Approximate Nearest Neighbor, or DiskANN, which it describes as “vector search for Web scale search and recommendation.” DiskANN is already used in Microsoft’s Azure for PostgreSQL and Cosmos DB databases, Bing, and Office. It’s designed to be memory efficient, low latency, high throughput, and precise.
Some of the other AI capabilities in SQL Server 2025 are…
Integration with AI frameworks, including LangChain, Semantic Kernel, and Entity Framework Core
Enhanced model management that allows for deploying Azure OpenAI and other AI models on premises or in the cloud
Ability to build and deploy retrieval-augmented generation (RAG) and AI agents using T-SQL (Microsoft’s SQL extension)
The chart below, which I pulled in from a Microsoft blog post, provides a few more details on what’s new in SQL Server 2025.
Petabytes in corporate data centers
Why does Microsoft continue to pour resources into SQL Server, an on-premises database, when so many organizations are migrating their data to the cloud?
The answer is that there are still many petabytes of business data running in on-premises systems for reasons that include data-residency requirements, security, cost, or the fact that companies simply haven’t gotten around to it yet. Complex data migration projects can take months or years. Perplexity, citing Gartner, reports that 60% of business data now runs in cloud DBs and 40% on prem.
Which is to say, many Microsoft customers want and need hybrid data solutions. “The discussion is not so much on-premises versus cloud,” Thota says. “It’s an opportunity for us to give the best relational engine and continuing to push on that.”
In a recent podcast interview with Redmond Magazine, Thota said the team working on SQL Server and Azure SQL is “the largest engineering team in Azure Data,” which is the Microsoft group that develops databases, analytics, and related technologies.
[At an industry event this spring, I heard Arun Ulag, Corporate VP of Azure Data, say his org manages 30 exabytes of data. Here’s a good overview by Arun on how the Azure Data pieces fit together with Microsoft Fabric and AI.]
Thota, in a recent webinar on building a data foundation for AI, shared two bullet points that get to the heart of the opportunity — for both Microsoft and its customers.
60% to 70% of enterprise data is unused. If you believe, as I do, that there’s latent value in many kinds of business data, then that high percentage of unused data suggests unrealized potential.
72% say data problems are the most likely factor to jeopardize AL/ML goals. This is consistent with other survey results I’ve seen that show a strong correlation between data inputs and AI outputs. For more on that, see my post, “Good AI vs. Bad Data: The Fight to Get It Right.”
From humble beginnings to a market leader
SQL Server started in 1989 as little more than a gleam in Bill Gates’ eye, as a joint project between Microsoft, Sybase, and Ashton-Tate. That was a big year in tech with the introduction of Intel’s 32-bit 486 chip and the World Wide Web.
A few years later, with the emergence of Windows NT, Microsoft licensed SQL Server from Sybase and took over development of its own relational database. I’ve been covering SQL Server since SQL Server 7.0, code-named Sphinx, in 1998.
At the time, SQL Server 7.0 was spec’d to support 1 terabyte of data, which I noted in an InformationWeek article was “10 times its previously recommended limit.” 1 TB is minuscule by today’s standards. SQL Server 2025 is rated to 524 petabytes, more than 500,000 times greater than SQL Server 7.
Thota doesn’t get distracted by my bigger-is-better line of thought. “The more pressing question really is about the quality of data,” he says. “Especially with generative AI applications, it becomes all the more important to make sure that the quality is good. And we want to make sure your queries are running really, really fast.”
SQL Server’s baby steps into enterprise data centers all those years ago have paid off. Today, Microsoft leads all other database management system providers in total revenue worldwide for relational systems, according to Gartner. And Microsoft is #2 behind AWS in overall DBMS market share, per Gartner.
SQL Server also scores highly in DB-Engines’ popularity contest, ranked #3 behind Oracle and MySQL.
What’s next? There’s more in the pipeline. At Build, Microsoft announced a preview of SQL database and Azure Cosmos DB in Microsoft Fabric and a PostgreSQL Extension for Visual Studio Code.
SQL Server 2025 is due for GA by the end of the year.