Carnegie Mellon's Awesome 'Database of Databases'
A soup-to-nuts guide to 800 DBMSs developed over the past 50 years
As someone who has followed the database industry for 25+ years, it sometimes feels like I’ve seen it all:
Adabase, Aurora, Bigtable, BigQuery, Greenplum, Cloudera, Cockroach, Couchbase, DB2, DataStax, DocumentDB, DynamoDB, EDB, Essbase, Fauna, Firebase, Firestore, HANA, Illustra, Informix, Intersystems, MarkLogic, MongoDB, MySQL, Neo4j, NoSQL, Pervasive, Pinecone, Postgres, Redis, Redshift, SingleStore, Splunk, SQL Server, Sybase, Teradata, TileDB, Timescale, TimesTen, Vertica, Yellowbrick (and Red Brick!), Yugabyte.
I’ve covered them all and more. And yet, I’ve really only scratched the surface.
There are hundreds and hundreds of database management systems in the world, including commercial, open source, and research databases.
My go-to source for these systems is the “Database of Databases,” an online compendium of nearly 800 different DBMSs that has been compiled by Carnegie Mellon University’s Database Group.
You can view it here: The Database of Databases.
From Mainframes to the Cloud
This past weekend, I had an opportunity to talk to Andy Pavlo, an associate professor of “databaseology” at Carnegie Mellon, who helped spearhead the project to build and maintain the DB of DBs.
It’s something of a side project for Pavlo, who’s also busy with his research work (into transaction processing, analytics at scale, and autonomous databases), teaching, and his recently launched startup, OtterTune, which has developed auto tuning software for MySQL and Postgres on Amazon RDS.
Pavlo created the Database of Databases because, he says, “it was really hard for me as a researcher to keep track of everything.” It’s also a helpful way to demonstrate to students that the database concepts they learn in class are more than “abstract ideas in a textbook.”
The Database of Databases is a soup-to-nuts guide, covering everything from mainframe-era databases to recently introduced cloud-native systems. At the long-ago end of the spectrum is IBM’s System R relational database, which according to Wikipedia was the very first implementation of SQL, circa 1974. At the more recent end of the spectrum is Firebolt, a high-performance data warehouse platform born in 2020.
You can search by compatibility, languages, licenses, data models, query interfaces, storage formats, system architecture, and more. The website does not, however, provide search by cloud platforms, which would be a nice next step.
Although the Database of Databases was not conceived as a resource for IT teams, it no doubt can be useful in lots of different ways. To illustrate, here are a few fast facts that I gleaned from the website.
By the Numbers
The top countries for database development are the US (with 406), Germany (55), and China (49).
The leading database programming languages are C++, Java, and C.
27 databases are compatible with MySQL—putting it at the top of the compatibility chart—followed by Postgres and Redis.
There are 473 open source databases, of which 142 are key value, 126 relational, 78 document/XML, and 28 graph.
And 421 commercial databases, of which 163 are relational, 65 key value, 62 document/XML, and 29 graph.
234 databases run on Linux and 148 on Windows.
There are two dozen databases from the 1960’s and 70’s, including Adabas, dBase, DMS, IDS, IMS, Ingres, Model 204, System R, and last but not least—Oracle.
“Hobby” databases include PickleDB, a lightweight key-value DBMS written in Python; and MangoDB, a database that is a parody of MongoDB and is based on the goofy “MongoDB is Web scale” cartoon that has nearly 700,000 views on YouTube.
It’s Academic
Many databases spring forth from academic environments. The DB of DBs identifies 77 academic databases, such as RDBMS from MIT, and Postgres and Ingres from UC-Berkeley.
Interesting twist: Computer scientist and database guru Michael Stonebraker, among his many other ventures, co-founded Ingres Corp. in 1980 to build a commercial version of Ingres. Now, Prof. Andy Pavlo is working with Stonebraker on research, including a paper on the relational data model vis-a-vis other data models.
In addition, there has been a tremendous amount of database innovation that originated in “hyperscale” web environments, only to become available as cloud services. Examples include Google Cloud Spanner and Cassandra, which started at Facebook. (In fact, Meta/Facebook has an entire database website of people, news, datasets, and published research articles.)
Carnegie Mellon’s Database of Databases is chock-full of such facts, figures, and details gathered from 50 years of data management history.
Anyone with an interest in databases should bookmark the Database of Databases. With that in mind, I’ve added a link to it on the Cloud Database Report homepage.