Big Data’s Biggest Problem: It’s Too Hard to Get The Data In
April 11, 2016
Shah Sheikh (1294 articles)
Share

Big Data’s Biggest Problem: It’s Too Hard to Get The Data In

Most companies are swimming in more data than they know what to do with. Unfortunately, too many of them associate that drowning phenomenon with big data itself. Technologically, big data is a very specific thing–the marriage of structured data (your company’s proprietary information) with unstructured data (public sources such as social media streams and government feeds).

When you overlay unstructured data on top of structured data and use analytics software to visualize it, you can get insights that were never possible before–predict product sales, better target customers, discover new markets, etc.

Big data is no longer suffering from the lack of tools that plagued it just a few years ago, when doing big data meant having data scientists on staff and messing with open source tools like R and Hadoop.

Today, there are tons of companies vying with each other to help you visualize big data–from specialists like Tableau, Qlik, TIBCO, and MicroStrategy to end-to-end players like Microsoft, IBM, SAP, and Oracle.

But, according to the IT executives at the Midmarket CIO Forum / Midmarket CMO Forumlast week in Orlando, one of the biggest issues that companies are having with all of these analytics platforms is ingesting data into them.

One CIO said, “Our biggest problem in IT is how do we get data into it. That’s where these things are really a pain.”

Fittingly, this assertion is backed up by data.

According to a study by data integration specialist Xplenty, a third of business intelligence professionals spend 50% to 90% of their time cleaning up raw data and preparing to input it into the company’s data platforms. That probably has a lot to do with why only 28% of companies think they are generating strategic value from their data.

The data cleansing problem also means that some of the most widely sought after professionals in the tech field right now are spending a big chunk of their time doing the mind-numbing work of sorting through and organizing data sets before they ever get analyzed.

That’s obviously not very scalable and it severely limits the potential of big data. And as we get better and better at collecting more data–with the help of the Internet of Things–the problem only gets worse.

There are three potential solutions to the issue:

1. The big data analytics software gets better–Since many of these companies have been investing heavily in big data for the past five years, it’s unlikely that there will be breakthrough in the tools any time soon that will ease the burden on data cleansing, but we should expect incremental improvements.

2. Data preparers become the paralegals of data science–In the same way that paralegals assist lawyers by taking over important, lower-level tasks, data preparers could do much the same for data scientists. We’re already seeing this to a degree. Read the TechRepublic article, Is ‘data labeling’ the new blue-collar job of the AI era?

3. AI will help cleanse data–The other possibility is that software and algorithms will be written to clean up, sort, and categorize the data. That’s most definitely going to happen, but we should also expect that it won’t be a silver bullet. Microsoft, IBM, and Amazon are investing in using humans to do data labeling that software can’t handle–and those are three of the world champions of automation and algorithms.

ZDNet’s Monday Morning Opener is our opening salvo for the week in tech. As a global site, this editorial publishes on Monday at 8am AEST in Sydney, Australia, which is 6pm Eastern Time on Sunday in the US. It is written by a member of ZDNet’s global editorial board, which is comprised of our lead editors across Asia, Australia, Europe, and the US.

Source | ZDNET