Is ChatGPT Poised to Disrupt the Database Industry?
OpenAI's chatbot could become an intuitive query engine, a DBA assistant, or even a data scientist.
ChatGPT has the world’s attention. From its modest beginnings as an AI chatbot for things like customer service or copy writing, ChatGPT is quickly taking on wider scope. That includes potential uses in database management—for everything from database queries and administration to infrastructure decisions.
Consider these possibilities:
ChatGPT could have a role within the data software stack, for example as an intuitive query engine.
It might be used to help with database administrator (DBA) tasks, such as writing scripts and generating tables.
It could serve as an expert advisor on products and services, such as determining whether to use MySQL or Postgres for a particular project.
Ideas about how to use ChatGPT are bubbling up as more people give it a try. Just two months after ChatGPT’s debut, 30 million people were using it, according to an article by New York Times writer Kevin Roose. A Time magazine headline called ChatGPT “the fastest-growing Web platform ever.”
Yet, despite this burgeoning interest—or maybe because of it—OpenAI’s text-generating bot has gotten off to a shaky start. At first, the platform couldn’t keep up with demand as new users tried to log in.
And then, some early adopters were shocked to see the dark side of ChatGPT and of Microsoft’s ChatGPT derivative, Bing. People have described their experiences as “unnerving,” “creepy,” and “unhinged.” Roose wrote that a conversation with Bing left him “deeply unsettled.”
ChatGPT as query engine
Let’s just say that a few kinks (maybe some big ones) remain to be worked out. But for the purposes of this blog post, I want to discuss a few of the ways that ChatGPT might be used for database management, assuming OpenAI’s engineers are able to get it right.
Let’s start with the potential for using ChatGPT as a front-end query engine. Google reportedly declared a “code red” when execs realized the threat that ChatGPT posed to its search and advertising business.
But the possibilities don’t end there. You can well imagine ChatGPT as a front end for analytics, business intelligence, and other kinds of database queries.
Marie Truong, an analytics engineer, wondered “who writes better SQL”—a data scientist or ChatGPT? So she put ChatGPT through a series of challenges that grew progressively harder.
See the post here: “Can ChatGPT Write Better SQL than a Data Analyst?”
Truong was “impressed” and “amazed” by ChatGPT, but ultimately determined that her own skills beat the bot. She reassures data scientists that ChatGPT is not ready to replace them. On that point, the modest chatbot agreed: “I do not have the ability to perform the role of a data analyst,” ChatGPT responded when asked. Not yet, at least.
You can find similar exploratory articles by other data analysts/engineers.
ChatGPT vs. Oracle DBA
What about ChatGPT as a DBA? Database consultant Jerome Dubar wanted to see if ChatGPT could be useful to an Oracle DBA. So he asked the bot to write a backup script for an Oracle 19c database.
The result? “Not bad,” he determined.
“ChatGPT will not replace (for now) an Oracle DBA with years of experience, but it definitely can help beginner DBAs or DBAs not used to Oracle databases,” Dubar writes. “ChatGPT’s answers are correct, scripts are clean and it always reminds you to test on a non-production database and warns that data may be lost. And this is already quite good! Consider ChatGPT as a helpful assistant that may save your precious time.”
Here’s Dubar’s full blog post: “ChatGPT vs. Oracle DBA”
ChatGPT as database consultant
OK, so ChatGPT may be able to help DBAs with the busy work of database administration, but what about the bigger question: Which database to use?
Here too, OpenAI’s precocious chatbot has something to say. Tianzhou, co-founder and CEO of Bytebase, a database DevOps platform provider, posed a question to ChatGPT: “Which database should I choose—MySQL, PostgreSQL, Mongo, SQLite, or others?”
Tianzhou also asked about cloud providers—AWS, Azure, or Google Cloud—and about which databases that could support for both OLAP and OLTP workloads.
From there, the Q&A went slightly off track, and ChatGPT failed to provide a definitive answer to the initial query. Nevertheless, you can see the potential to use it for architecture and infrastructure planning.
Tianzhou concluded ChatGPT doesn’t yet pose a threat to the jobs of database industry analysts. But he acknowledged it may only be a matter of time.
“As ChatGPT progresses, I expect it could give database suggestions as good as any database veterans,” Tianzhou writes. “And we may all retire at that point regardless of whether willingly or not.”
See the post on Bytebase.com: “Ask ChatGPT - How to Choose the Right Database for Your Next Project”
Cloud database vendors weigh in
Not surprisingly, cloud database vendors are paying very close attention.
Thomas Been, CMO of DataStax, made the following request of ChatGPT: “Write a business plan for a massively scalable database that can power popular Internet sites.”
See Been’s blog post: “ChatGPT Knows a Thing or Two about Databases”
Likewise, Neo4j ran ChatGPT through its paces.
Can ChatGPT scale to meet demand?
It’s fair to say that ChatGPT is not ready for prime time. But you can see ChatGPT’s potential to transform and disrupt the database industry in many areas—for queries, database optimization, platform assessment, and more.
However, all of this model-building, user-prompting, text-generating activity is sure to generate lots of data and big workloads. And so far, the platform has been hard-pressed to keep up. When I first tried to access ChatGPT, like many others, the system was maxed out and inaccessible.
Given the tech industry’s massive investment in hyperscale clouds, with seemingly limitless scalability, it’s hard to understand why ChatGPT was unable to meet demand.
That raises a question about its underlying tech infrastructure. Here’s what we know:
A) OpenAI has partnered with Microsoft: “Microsoft and OpenAI Extend Partnership”
B) OpenAI and Microsoft have a big, new supercomputer: “Microsoft’s New AI Supercomputer Is One of the Top 5 in the World”
Those are, no doubt, some impressive technical underpinnings. Even so, it’s seems clear to me that ChatGPT’s infrastructure will need improved scalability and elasticity as the number of users, the amount of data, and the volume of queries continue to grow.
ChatGPT talks data processing and analysis
I was curious to know what kind of database management system OpenAI uses to build its AI platforms—the ChatGPT text generator, Codex code generator, Dall-E image geneator, and Whisper voice generator. So I asked ChatGPT, and here’s how it replied:
“The specific database management systems used by OpenAI are not publicly disclosed, as they may vary depending on the specific needs of the organization and its projects. However, it is known that OpenAI uses a wide range of technologies for data storage and management, including both traditional relational databases and newer NoSQL databases. Additionally, OpenAI also employs a number of techniques for data processing and analysis, such as distributed computing and machine learning frameworks, to extract insights from its large data sets. Overall, OpenAI likely uses a variety of database management systems and related technologies to support its research and development efforts.”
That’s interesting, but it’s also the kind of info that I typically write about here in the Cloud Database Report. It may only be a matter of time before ChatGPT replaces me, so subscribe while you still can!