yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Big Data - Tim Smith


4m read
·Nov 9, 2024

Translator: Andrea McDonough
Reviewer: Jessica Ruby

Big data is an elusive concept. It represents an amount of digital information which is uncomfortable to store, transport, or analyze. Big data is so voluminous that it overwhelms the technologies of the day and challenges us to create the next generation of data storage tools and techniques. So, big data isn't new. In fact, physicists at CERN have been grappling with the challenge of their ever-expanding big data for decades.

Fifty years ago, CERN's data could be stored in a single computer. OK, so it wasn't your usual computer; this was a mainframe computer that filled an entire building. To analyze the data, physicists from around the world traveled to CERN to connect to the enormous machine. In the 1970s, our ever-growing big data was distributed across different sets of computers, which mushroomed at CERN. Each set was joined together in dedicated, homegrown networks. But physicists collaborated without regard for the boundaries between sets; hence needed to access data on all of these. So, we bridged the independent networks together in our own CERNET.

In the 1980s, islands of similar networks speaking different dialects sprung up all over Europe and the States, making remote access possible but torturous. To make it easy for our physicists across the world to access the ever-expanding big data stored at CERN without traveling, the networks needed to be talking with the same language. We adopted the fledgling internet working standard from the States, followed by the rest of Europe, and we established the principal link at CERN between Europe and the States in 1989, and the truly global internet took off!

Physicists could easily then access the terabytes of big data remotely from around the world, generate results, and write papers in their home institutes. Then, they wanted to share their findings with all their colleagues. To make this information sharing easy, we created the web in the early 1990s. Physicists no longer needed to know where the information was stored in order to find it and access it on the web, an idea which caught on across the world and has transformed the way we communicate in our daily lives.

During the early 2000s, the continued growth of our big data outstripped our capability to analyze it at CERN, despite having buildings full of computers. We had to start distributing the petabytes of data to our collaborating partners in order to employ local computing and storage at hundreds of different institutes. In order to orchestrate these interconnected resources with their diverse technologies, we developed a computing grid, enabling the seamless sharing of computing resources around the globe. This relies on trust relationships and mutual exchange.

But this grid model could not be transferred out of our community so easily, where not everyone has resources to share nor could companies be expected to have the same level of trust. Instead, an alternative, more business-like approach for accessing on-demand resources has been flourishing recently, called cloud computing, which other communities are now exploiting to analyze their big data.

It might seem paradoxical for a place like CERN, a lab focused on the study of the unimaginably small building blocks of matter, to be the source of something as big as big data. But the way we study the fundamental particles, as well as the forces by which they interact, involves creating them fleetingly, colliding protons in our accelerators, and capturing a trace of them as they zoom off near light speed. To see those traces, our detector, with 150 million sensors, acts like a really massive 3-D camera, taking a picture of each collision event - that's up to 14 million times per second. That makes a lot of data.

But if big data has been around for so long, why do we suddenly keep hearing about it now? Well, as the old metaphor explains, the whole is greater than the sum of its parts, and this is no longer just science that is exploiting this. The fact that we can derive more knowledge by joining related information together and spotting correlations can inform and enrich numerous aspects of everyday life, either in real-time, such as traffic or financial conditions, in short-term evolutions, such as medical or meteorological, or in predictive situations, such as business, crime, or disease trends.

Virtually every field is turning to gathering big data, with mobile sensor networks spanning the globe, cameras on the ground and in the air, archives storing information published on the web, and loggers capturing the activities of Internet citizens the world over. The challenge is on to invent new tools and techniques to mine these vast stores, to inform decision-making, to improve medical diagnosis, and otherwise to answer needs and desires of tomorrow's society in ways that are unimagined today.

More Articles

View All
How To GET SMARTER In 2023
How to get smarter in 2023 the Alux way. Hello, Alexers! We hope you had a wonderful time during the holidays and don’t worry, the Alex lady will be back this week. But some of you might be already familiar with my voice from the Alux app. Now, back to t…
First Native Congresswoman Elected in America | National Geographic
[Music] To win this election, I think it would mean the world to across the country. In the Congress, there have been roughly 12,000 people elected to 1789, and of that number, about 300 Native Americans and yet never a woman. Why you and why now? Why me…
Celebrating Earth Month—and Jane Goodall’s 90th Birthday | ourHOME | National Geographic
Hey, everybody. Bertie Gregory here… Hey, everybody! From the Greek Theatre in Los Angeles. We’re here to celebrate Jane Goodall’s 90th birthday. Hey, Jane. How are you doing? I’m here with my friend Andy. Hello! And we’ve got a couple more friends out he…
This Journal Keeps Me Productive (& Maybe You Too)
This is the theme system journal. It’s something I helped design for me and maybe for you to help improve my life in a practical way. It’s a very flexible tool; there are intentionally almost no labels of what has to go where, so it can be adaptable. But …
HOT BOBAS! -- IMG! #34
Cacti are perpetual victims and kitten heart. It’s episode 34 of IMG! When these guys get busy, you get this. And when Adobe adds the force to Photoshop, you get Adobe WanKenobi. If other brands did the same we’d have Jedi - Do or do not; Sith puma; Hunt …
How he bought a Lamborghini Huracan: Chatting Real Estate with Bryan Casella
What’s up you guys? It’s Graham here. So if you guys watch any sort of real estate YouTube videos, I’d say like 99%, you’ve seen Brian Kinsella, which by the way, I think when you type in real estate in YouTube, Brian Kinsella is like one of the first res…