Here's What You Need to Know About Big Data

Illustration for article titled Heres What You Need to Know About Big Data

What the hell is "big data," and why does everybody keep saying it will change the future? If you want to understand the answer, you've got to get beyond the buzzword and back to the term's real meaning.

Advertisement

In a recent post on UC Berkeley's Data Science blog, Jenna Dutcher asked several people who work with big data or analyze it to define the term. All the answers are interesting, but two stood out.

Advertisement

UC Berkeley School of Information lecturer Annette Greiner said:

Big data is data that contains enough observations to demand unusual handling because of its sheer size, though what is unusual changes over time and varies from one discipline to another. Scientific computing is accustomed to pushing the envelope, constantly developing techniques to address relentless growth in dataset size, but many other disciplines are now just discovering the value — and hence the challenges — of working with data at the unwieldy end of the scale.

Joel Gurin, author of Open Data Now added:

Big data describes datasets that are so large, complex, or rapidly changing that they push the very limits of our analytical capability.It's a subjective term: What seems "big" today may seem modest in a few years when our analytic capacity has improved. While big data can be about anything, the most important kinds of big data — and perhaps the only ones worth the effort — are those that can have a big impact through what they tell us about society, public health, the economy, scientific research, or any number of other large-scale subjects.

Advertisement

Read more at UC Berkeley's Datascience blog

Share This Story

Get our newsletter

DISCUSSION

So, lower case big data is massive data sets that push the limits of analytics. When people use proper noun Big Data, they're talking about a subset of IT tools meant to store, transform, or analyze those big data sets. Big Data technologies look very different from traditional technologies. They are highly scalable. They store data in different ways. They are 'open' via APIs in ways that old school enterprise technology is not.

Interestingly, Big Data technologies often utilize hardware that has been shunned by the old enterprise tech. For example, you'll mostly see Big Data with tons of small, identical blades with lots of local drives vs. big servers with shared SAN. Highly scalable, highly resilient, easily replicated - that's the kind of infrastructure Big Data technologies want.

If you are just getting into IT now, focus on these technologies. In the finance industry, they are still fairly niche because the security is not there. But we're very swiftly coming up on data sets that are so large that we must begin using these tools and their vendors are going to have to come into the real world of regulatory processing. Become an expert on this and make big cash in the next 3-5 years.