AWS CEO Selipsky on 'Vast, Unfathomable, Extreme' Data Challenges
At re:Invent, a big-picture message about the possibilities of modern data management at scale in the cloud.
Hello again everyone and a big welcome to all of our new subscribers! The Cloud Database Report has a global audience of business and tech people—cloud architects, software engineers, business development, client leads, sales, strategy, database management, and more. The one thing we all have in common is a deep interest in the fast-changing world of data management. Here’s my latest report.
AWS’s 11th annual re:Invent conference is underway in Las Vegas. CEO Adam Selipsky, in his opening keynote, offered a sweeping view of the complex data challenges facing businesses today—and of the unlimited possibilities. He talked about the incredible scale of today’s data environments—exabytes of data and billions of AI-driven predictions. And he introduced more than a dozen new products and cloud services.
It was a wide-ranging soliloquy—punctuated with customer testimonials—that went on for more than two hours. In fact, I was ready to pull the plug on Selipsky’s livestream because it ran within a few minutes of the kickoff of the World Cup soccer match between the US and Iran national teams. In which case, I would have had little choice but to switch to the big game.
Selipsky was a big-picture storyteller on the re:Invent stage—he talked of a septillion stars in the sky, the deepest reaches of the ocean, and the icy extremes of Antarctica in providing context to the world of data.
“Customers constantly tell us that this is what they need—the ability to gain insights across vast and complex troves of data; the confidence to go to unfathomable depths knowing that you’re secure enough to withstand the pressure and have the ability to see what’s around you; the capabilities to perform in the face of the most extreme conditions; and the imagination to look at the existing and envision unlimited possibilities,” he said. “We’re continuing to hammer on all of these areas, delivering what you need to thrive in any situation.”
That sounds like hyperbole, but I believe it’s actually a fair description of the data challenges that market leaders across industries face. By my calculation, today’s largest enterprise data stores are a million times bigger than those of 25 years ago.
“It’s almost impossible to comprehend just how much data there is,” Selipsky said. “But just as remarkable is how fast it’s growing.”
This data explosion—from terabytes to petabytes to exabytes—requires new platforms, tools, processes, and strategies. AWS announced many new database capabilities and services—here’s the long and growing list. Rather than cover each of them, I will put the “what’s new” into the context of 6 major trends.
1. The cloud is cheaper, sometimes
There have been rumblings that the cloud is not as cheap and cost effective as some business and IT people think it is. To wit, Datanami reports that “the unexpectedly high costs of cloud computing are increasingly becoming a problem.” And Insider describes the cloud pay-as-you-go model as “a risky proposition.”
The other side of the coin, so to speak, is that some businesses are in fact saving money in the cloud. Selipsky pointed to Carrier Global reducing the cost of running its ERP system by 40%, and Gilead expecting to save $60 million over five years. More broadly, he said, some customers are saving 30% or more.
As always, the cloud vs. on-premises cost equation depends on several variables: the fee structure and consumption; data ingress/egress; number of users; server optimization; and so on. But AWS clearly wants to get the message across that cost savings can be realized in many situations.
2. Scalable to millions, billions, trillions
With 1.5 million customers and more than 600 different compute instances to offer, AWS operates at scale. And some of the workloads that run on AWS are at the high end of the chart. Examples from re:Invent:
Expedia Group processes 600 billion AI predictions per year, powered by 70 petabytes of data
Samsung’s 1.1 billion users make 80,000 requests per second
Pinterest stores an exabyte (1 million terabytes) of data on S3
Riot Games generates a half-million events per second
Nielsen processes hundreds of billions of advertising measurements per day
These are extreme workloads compared to the databases of a generation ago. See my “Scalability Day” article below as a point of comparison. (Earlier this year, I asked Microsoft to talk about the scalability advances of SQL Server, but Microsoft declined.)
3. Transactions plus analytics
One of the key trends we’re seeing is new ways to combine analytics with transactional data. Oracle did it with MySQL HeatWave, Snowflake did it with Unistore, and Google Cloud did it with AlloyDB, as well as with federated queries for BigQuery.
AWS is jumping into the action with fast-and-easy integration between its Amazon Redshift data warehouse and Amazon Aurora transactional database. The name of this capability is a mouthful: “Amazon Aurora zero-ETL integration with Amazon Redshift.” And it’s only available in limited preview in AWS’s US East region.
But the ability to make Aurora data available within seconds for analysis in Redshift is sure to appeal to some customers. Selipsky reminded everyone that Aurora is the fastest-growing service in the history of AWS, used by hundreds of thousands of customers.
4. Serverless services do more while you do less
Serverless data platforms have been a hot trend in 2022, and, as the year winds down, AWS put an exclamation point on it with the announcement of OpenSearch Serverless. It was the last of AWS’s analytics services to get this automated capability to scale up and down. “Now we have serverless options for all of our analytics services,” Selipsky said.
OpenSearch is an Apache 2.0 open source program (and a fork of the popular Elasticsearch search engine) that is used to aggregate and analyze things like log files and web search. It’s not unusual for these workloads to have spikes in usage, and they can grow to petabytes.
So auto-scaling will definitely appeal to some customers. In fact, Amazon Redshift Serverless became generally available in July and already thousands of customers are using it, Selipsky said.
5. ETL: Extinct Tools from Long Ago
Extract, Transform, and Load (ETL) has been a painstaking aspect of data preparation for nearly 50 years. Now, AWS is talking about a “zero ETL future.” However, it’s an open question whether that future is 5, 10, or 20 years from now. Data warehousing is slow to change.
How does AWS plan to make ETL obsolete? Though prebuilt integrations that eliminate the manual effort of building data pipelines. The new capability mentioned above—Amazon Aurora zero-ETL integration with Amazon Redshift—is one step in this direction. Another is Amazon Redshift integration with Apache Spark, also announced at re:Invent. Both “make it easier to generate insights without having to build ETL pipelines, or manually move data around,” said Selipsky.
Few tears would be shed if ETL disappeared. ETL “strikes dread into the hearts of even the sturdiest of engineering teams,” Selipsky quipped. And, as if it weren’t already complicated enough, the ETL process has been flipped on its head in the form of ELT, where data transformation (cleansing, standardization, formatting, etc.) becomes the final step in the process.
FYI, here’s an explainer I wrote on ETL/ELT.
6. Governance is goodness
I get on my soapbox for data governance, because you can have the latest data platforms and tools, but it doesn’t matter if your organization doesn’t have well-conceived governance practices, as well. Even minor omissions in data governance can be costly—financially, legally, and in customer relationships.
The newly introduced Amazon DataZone serves as a kind of middle ground for collaboration between data producers and data consumers. It’s comprised of a portal, projects, and a catalog of data resources, with self-service capabilities and access controls. DataZone can be used to discover, share, and govern data across an enterprise. The idea is to unlock data in a collaborative, systematic, and secure way.
More re:Invent—and World Cup games
AWS faces some big challenges of its own—from the complexity of its myriad infrastructure services to an ongoing hiring freeze to fast-growing competitor Google Cloud to the high costs and headaches associated with database migrations. The solutions laid out by Selipsky will help in some of these areas, but no doubt AWS will continue to be under pressure on all of these fronts and others.
Having heard from Selipsky on Nov. 29 and VP of Data & ML Swami Sivasubramanian on Nov. 30, the next keynote speaker to take the stage at re:Invent will be AWS CTO Werner Vogels.
I have interviewed Werner several times, and he’s always interesting. I would note that Vogels is originally from The Netherlands and, as luck would have it, the US men’s soccer team will be squaring off against the Netherlands’ national team in the World Cup knockout round on Saturday, Dec. 3. If anyone has an algorithm to predict the outcome, it would be Werner.