Database Roundup: Data Intelligence; Hallucinations; Vector Search Debate
News from Databricks, Microsoft, Neo4j, Oracle, Pinecone, and more
Welcome to the Cloud Database Report. I’m John Foley, a long-time tech journalist who worked in strategic comms at Oracle, IBM, MongoDB, and now Method Communications. If you received this newsletter, you’re a subscriber (thank you!) or someone forwarded it to you. Subscriptions are free.
Hello friends and welcome to our new subscribers! It’s been a busy few weeks, including a half-marathon trail run on Long Island, a long walk through the neighborhoods of Brooklyn NY, and my conversations with several tech-industry CEOs. There has been a lot of news, including the drama at OpenAI, which seems to have resulted in the OpenAI board effectively firing itself.
Here’s my roundup (in alphabetical order) of what’s been happening in the database industry in recent weeks.
1. Cockroach Labs
As I reported in my recent interview with Cockroach Labs’ CEO Spencer Kimball, CockroachDB is now available on AWS, Azure, and Google Cloud, which puts it solidly in the category of multi-cloud databases. Cockroach Labs has just released its annual cloud report, based on a survey of 300 cloud architects and engineers. Half of the companies surveyed have multi-cloud deployments. As Cockroach Labs points out, multi- and hybrid clouds are now the norm for many organizations.
More interesting is that the report delves into the different types of multi-clouds: two or more clouds; hybrid clouds; multi + hybrid clouds; interclouds (which is when a single app runs across multiple clouds); and single-workload multi-clouds. The survey found that multi-cloud adoption is driven more by business concerns—things like regulatory compliance and avoiding vendor lock-in—than by technical issues such as the need for resilience. Here’s a link to the report including some good charts.
2. Databricks
There’s a lot going on at lakehouse extraordinaire Databricks—including new Series I investors and the acquisition of data-replication specialist Arcion—but it’s a big push around Databricks as an “intelligence platform” that grabbed my attention. For those of us who have been following “business intelligence” for 30+ years (yes, it’s been that long), this sounds familiar, but the data lakehouse model (structured, unstructured, semi-structured, raw, or curated data, etc.) is what makes this different. That and AI.
Databricks CEO Ali Ghodsi told Fortune that the industry is at the beginning of “the intelligence revolution.” On that theme, Databricks has announced an AI-powered Data Intelligence Platform that features natural language queries, based on its acquisition of MosaicML. And separately but related, Databricks says its platform is now available as a turnkey solution in the AWS Marketplace for the U.S. Intelligence Community. For more, see the article by SiliconAngle.
3. Microsoft
Microsoft has made a bunch of database-related announcements, some highlighted at the company’s recent Ignite conference. Shireesh Thota, Microsoft VP of Azure Databases, provided an overview in a blog post. I know Shireesh and find him to be very insightful, so I’m using his blog post as a point of reference. In sum, Microsoft has reduced fees on its Azure SQL Database Hyperscale service by up to 35%; announced GA of its Azure SQL Managed Instance service; introduced a more resource-loaded freebie version of its Azure SQL Database service; unveiled updates to Azure for PostgreSQL, including an extension for Azure AI; and boosted the performance of Azure for MySQL.
4. MotherDuck
I typed “MotherDuck” into Google search and two types of stories came back. The first was about the rescue of ducklings from a drainage pipe thanks to “a quick-thinking mother duck.” The others were news stories about database startup MotherDuck’s $52.5 million Series B funding round, bringing total investment in the cloud data warehouse startup to $100 million.
MotherDuck is still getting its webbed feet on the ground. The company was founded last year to commercialize the open source DuckDB database. I know CEO Jordan Tigani from his days at SingleStore. Tigani explains that, unlike cloud database providers that are focused on large-scale data warehouses, MotherDuck is geared to those that are less than a petabyte in size, or “a cloud data warehouse for the rest of us.” The founding team are well credentialed, with backgrounds at Snowflake, Google Cloud, Databricks, Neo4j, Facebook, and Microsoft, among others.
5. Neo4j
Graph database specialist Neo4j is collaborating with AWS to reduce so-called AI hallucinations, which, as the term indicates, are nonsensical or inaccurate outputs derived from large language models. The technologies enabling more accurate, also called “grounded,” results are knowledge graphs, vector search, and Amazon Bedrock, an AWS service that provides access to foundation models.
There’s a good explanation of how these technologies work together on the AWS blog. To quote, “Grounding minimizes hallucinations, eliminates bias, and provides explainability and data access controls. This is where Neo4j brings value, as knowledge graphs can be a memory layer for an LLM and enable factual data retrieval in real-time.” Neo4j also announced availability of Neo4j Aura Professional, the company’s fully managed graph database cloud service, in AWS Marketplace. Unfortunately, I was unable to talk to Neo4j’s Chief Product Officer, Sudhir Hasbe, about the company’s latest news, but I plan to catch up with Sudhir sometime soon.
6. Oracle
Oracle and Microsoft continue to advance their surprising partnership. On Nov. 7, the companies announced that Microsoft will use Oracle Cloud Infrastructure for inferencing AI models used by Microsoft’s Bing conversational search. “Conversational” is the key word here. Bing was launched in 2009, and Microsoft is using generative AI to transform Bing from a standard search tool into an AI copilot with natural language, or what Microsoft calls the “new Bing.” You’ll recall that in mid-September Oracle CTO Larry Ellison flew to Microsoft’s HQ to jointly announce with Microsoft CEO Satya Nadella an expanded partnership to provide Oracle database services in Microsoft’s Azure cloud.
7. Pinecone
Following its $100 Series B funding round in April, Pinecone Systems has progress to report. First, Pinecone’s vector database is now available in Microsoft’s Azure Marketplace. And second, Pinecone has attained HIPAA compliance on AWS, Azure, and Google Cloud. The latter makes it feasible for healthcare orgs to use Pinecone for generative AI application development.
Pinecone has also thrown down the gauntlet in the debate over native vector databases, such as Pinecone’s, and the growing number of non-vector databases that add vector search capabilities using the pgvector extension and other methods. “Bolted-on vector indexes are inherently unable to handle the memory, compute, and scale requirements that real-world AI applications demand,” Pinecone claims in a blog post. You can be sure we haven’t hear the end of this argument.
8. ScyllaDB
MongoDB has its limits. That’s the marketing message of ScyllaDB, a MongoDB alternative that announced $43 million in funding to “take on MongoDB at scale.” ScyllaDB bills itself not just as a NoSQL database, but as “monstrously fast and scalable” NoSQL database, which runs on AWS and Google Cloud. Scylla offers favorable performance comparisons to MongoDB, Amazon DynamoDB, and the open source Apache Cassandra. ScyllaDB was originally rewritten from Cassandra nearly 10 years ago.
9. TileDB
In October, TileDB revealed $34 million in Series B funding. I first talked to TileDB CEO Stavros Papadopoulos two years ago when he re-introduced me to a database concept that I first wrote about back in the 1990s—the universal database. TileDB database uses multi-dimensional arrays to support a variety of data types for scientific use cases, such as genomics and geospatial analysis.
TileDB has an impressive list of investors, including Intel Capital, Lockheed Martin, NTT Docomo, and Verizon. The company has a strong story to tell around the data types it supports (i.e. single-cell data, genomics, LiDAR, automatic identification system (AIS), etc.) and industry solutions such as earth observation, maritime tracking, and biomedical imaging. Here’s further reading on where TileDB fits into the conversation around generative AI and LLMs: “Why TileDB Is a Vector Database” by CEO Stavros Papadopoulos.
That’s my news wrap. With AWS re:Invent getting underway this week, more news is sure to follow. As I write this, Couchbase just announced columnar service for its Capella DB as a service. Stay tuned, the database market continues to evolve at a remarkable pace.