Hadoop's rise: Why you don't need petabytes for a big data opening
Some people see the term 'big data' as just broad-brush marketing coated with hype. But even those taking the big-data concept at face value need to overcome certain misconceptions.
Gartner thinks the hype can make it harder to choose the right course of action in this area and has done little to dispel some of the myths that still persist.
These fallacies include ideas such as 80 percent of data is unstructured — it isn't — and that advanced analytics is just a more complex form of normal analytics — again, not true, according to the analyst firm.
In an attempt to establish more of the facts relating to big data, Gartner has published two reports, covering myths about big data's impact on analytics and on information infrastructure. Here are the top five mistaken beliefs.
Myth 1: Everyone is ahead of us in big data
So people are wrong to worry that competitors are forging ahead with big data. In fact, only 13 percent of those surveyed had actually deployed any related technology.The stages of big data adoption, 2013 and 2014. Source: Gartner September 2014
"The biggest challenges that organisations face are to determine how to obtain value from big data, and how to decide where to start," Gartner said.
"Many organisations get stuck at the pilot stage because they don't tie the technology to business processes or concrete use-cases."
Gartner concludes: You're not too late. Build strategy on real tasks and involve IT and the business.
Myth 2: There's so much data, little flaws don't matter
It's true that each individual flaw may have a much smaller impact on the whole dataset than it did when there was less data, but there are more flaws than before because there is more data.
"Therefore, the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organisations use in a big-data context comes from outside, or is of unknown structure and origin," Gartner said.
"This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data."
Gartner concludes: Devise new approaches to data quality and choose data quality levels. Follow the core principles of data quality assurance.
Myth 3: Big data will eliminate data integration
However, in reality most users rely on schema-on-write, where data is described and content prescribed, and there is agreement about the integrity of data.
Myth 4: No point using a data warehouse for advanced analytics
Also, new data types may need to be refined to make them suitable for analysis. Furthermore, decisions have to made about which data is relevant, how to aggregate it, and the level of data quality necessary.
Gartner concludes: Use data warehouses where possible as a set of curated data for advanced analytics.
Myth 5: Data lakes will replace the data warehouse
The technologies behind data lakes lack the maturity and breadth of features found in established data warehouse technologies: "Data warehouses already have the capabilities to support a broad variety of users." Firms don't have to wait for data lakes to catch up.
Gartner concludes: Use data lake technologies such as Hadoop alongside existing data warehouses. Data lakes won't deliver business value without investments in metadata management skills, tools and training.
The two Gartner reports are called Major myths about big data's impact on analytics and Major myths about big data's impact on information infrastructure.