Accidentally I came over recently an Apache server technology for big data centralization and analysis. Hadoop is a mix and match technology, open source, which allows companies to write their own big data analytical tools in a quick and efficient manner.
What puzzled me mostly was the adoption of this type of open-source solution. You would think that the serious developers (such as SAS, IBM or Pentaho) would be reluctant to adopt open-source solutions. After all, look at what happened with the Goldman Sachs flash trading code (if you have not read the book “Flashboys”, maybe it is time to give it a try :)). But no, they are embracing it! I do not know if this is because they want a continuously developeable solution, a community pooling benefit or just something off-the-shelves, but here we go. Big data tools are not only very lean, but also based on a global creativity streak. Isn’t this interesting?
It is quite rare that somebody admits they were wrong about a major trend in IT which was overseen in the past. Quite rare. Fortunately, Thomas Davenport is not that kind of person – on the contrary. In the preface of his new book (“Big Data At Work) published by Harvard Business Review Press, he actually admits that he initially dismissed the concept as being just another technology hype. And you can hardly blame him – there are many gurus or specialists or journalists who still think that the “big data” concept represents another form of selling clound and analytical services. Promoted, of course, by the big IT companies who happen to endorse the concept quite actively.
From this perspective, Harvard Business Review Press has done some justice to the hype surrounding the concept. “Big Data at Work” was in a sense a long waited for book – people were maybe familliar with the concepts, but wanted maybe to know more about:
– how big data is implemented and used by various companies (the famous “case study” approach patented by the Harvard Business Review (one of the biggest business case studies publishers in the world by the way); Read more
Judging by the amount of information available on big data, I would say that paradoxically that the concept is poorly supported. More of a giant with feet of clay. Let us do for example a Google keyword analysis: “big data” search reveals 829,000,000 results in 0.44 secs. (which means that Google has spend some time about it). Apparently a lot.
The problem reveals after you start browsing the pages. The first 10 pages of results show in all cases either definitions and white papers about big data (very vague and fluffy), either selling links. Virtually there was no value added info on the concept itself (except maybe for the Wikipedia article, which is showing some well structured info at the introduction level).
And then we go. All resources which are published up to page 100 are consisting in what I call “meta-information”, information about information which reduces the meaning below a value-adding level. Actually, the more you read about big data over the web, the less you are likely to be convinced about the topic. I know this sound very semiotically or Foucault like, but my perception became a bit one of a frustrated librarian who cannot put the finger on the concept. Read more
Here we go with some of my reflections on the fast emerging concept of big data – more of aphorisms then everything else:
Big data is a programming concept re-packaged for the experienced MS Office user.
Big data is a tool, not a concept. It can be applied to small data or to billions of users.
Big data means nothing without a clear understanding of the analysis goal. And this is paradoxical.
No data can also revelate big data (to quote Ernest Hemingway).
Bigger data cannot be reduced to big data without deleveraging its meaning.