Adam Smith

Amazon S3's Pricing Model is Arbitragable, and the Future of Cloud Storage

March, 2010

Here’s what you do.

Amazon S3 charges money for each gigabyte that you store. You go to their top 50 customers, and tell them: we’ll dedup your data with the other top customers, resulting in overall cost savings. We’ll split those cost savings with you 50/50.

You're basically stealing the economies of scale of redundancy elimination from Amazon.

You could also get sophisticated with delta encoding blocks of data that aren’t duplicates but are close. Here’s a paper that goes detail on redundancy elimination.

You’d have to get people to trust you to not lose their data. Maybe you use best practices, hire cperciva to write it, and get an insurance company to provide a $100 M policy against data loss that’s your fault – but it’d still be hard to pull off.

This line of reasoning raises questions about the future of cloud storage. Mostly: does broad scale redundancy elimination matter?

If it does matter, we will see economies of scale in cloud storage. Does that mean there will be one or only a few major providers, or will redundancy elimination be provided as a layer on top that’s operated by someone else? Will cloud storage providers be forced to change their pricing models?

These are the questions that make our field so exciting!

Just to sweeten the pot, Data Domain, a data deduplication company, got bought for $2.4B a couple years ago, though they were certainly approaching a different market.

What do you think: will large scale redundancy elimination matter?

← Home