The megabyte, gigabyte, and terabyte are well known. They are volumes of one million, one billion, and one trillion bytes, the basic unit of storage in computing. According to an old popular belief, all the words spoken by human beings since the dawn of time would fit into 5 exabytes, or 5 million terabytes.
In 2012, American linguist Mark Liberman recalculated the figures. If all human speech in history were digitized in high quality, the resulting data would be 42 zettabytes, or 42,000 exabytes. The previous estimate likely only considered written text. The spoken word is significantly larger.
And already, 42 zettabytes is outdated. In 2020, at the height of the pandemic, humanity created, copied, and consumed 64.2 zettabytes of digital data. In a single year, we produced more digital information than all the human speech accumulated over millennia!
Since then, the pace has accelerated, IBM recently observed: in five years, global data creation has reportedly tripled. By 2025, it is expected to exceed 180 zettabytes. It will continue to rise “at meteoric speed,” IBM adds.
By 2030, the annual volume is expected to reach one yottabyte. One yottabyte is equivalent to one thousand zettabytes. Once this threshold is crossed, we will need to scale up. We will then aim for the first quettabyte, which is equivalent to one million yottabytes.
It won’t stop there. Last year, the American tech giant Salesforce estimated that within five years, connected cars will produce some 25 gigabytes of data per hour each, perhaps even up to 1 terabyte of data per day. In five years, the automotive industry expects to have approximately 2 billion connected vehicles on the road.
Two billion cars generating a trillion bytes every day is a lot. And we’re only talking about cars, not other connected devices…
Tom Soderstrom is a former NASA chief technology officer who now works as a futurist for Amazon, among others. He cites other examples of drivers of this exponential growth. “Think of space exploration,” he says.
Artificial intelligence (AI) will play a central role in this nascent industry. However, in space, there are no real data centers accessible via Wi-Fi. In a space probe, a robot cannot rely on the cloud at all times. AI will process the data locally and synchronize with it later.
But it goes further, adds Tom Soderstrom. How do you maintain contact with systems that have no constant internet connection?
This is where digital twins come in. A virtual copy of cities, factories, electrical grids, road networks, the Earth and space “will allow us to simulate how these robots, with whom we have lost contact, manage,” he says.
Companies are already starting to use digital twins to predict all sorts of things, including where various natural disasters might occur. This is what Voxelis, a Vancouver-based startup, is doing by trying to predict where the next forest fires will break out in Canada, in order to fight them as quickly as possible.
The answer comes from the cross-analysis of billions and billions of data points. And that’s not all. To become autonomous, AI will need its own means of payment.
Soon, Google will offer you the option of using an AI agent to book a table for you at the best Italian restaurant in Montreal. Google envisions the restaurant owner having their own AI agent to automate the reservation process. The two agents might even pay the bill in advance.
Google predicts that these transactions between two AI agents will be done using a stable cryptocurrency, like the one the Bank of Canada would like to issue soon.
Google announced at the end of September that it had completed its first transaction between two AI agents. All of this, of course, generates even more data, since a digital receipt will be required for each of these automated transactions.
“Each payment will leave an audit trail that will ensure the authenticity of the transaction,” explains Google. Like your bank receipts on the corner of the counter, these digital receipts will pile up somewhere.
It may not be the feared avalanche. We realized afterward that only 2% of the data created in 2020 survived into 2021. Much of the information is ephemeral, sometimes superfluous.
But that won’t stop us from producing even more. The era of the quettabyte can begin.