So, did I miscalculate 2 years ago about how big ‘Big Data‘ was? From my previous post we saw, Big Data is BIG. And it is going to get bigger with our reliance and dependence on the Cloud. It is getting complex and very hard to analyze this information.
Why? Because we’re attempting to make sense of petabytes of data, which may or may not have absolutely any significance to what we want. Tools of until a few years ago are incapable of handling this.
But is this needed? The answer would be contextual. Large players in the Healthcare or Finance domain may have immediate far-reaching consequences. A smaller company might really want to ask itself – “What is the implication of my financial investment into Big Data?“. Yes it is a trend, but if the return on your investment to initial investment (CAPEX) and operating expenses (OPEX) is something you don’t know, then think again. There is a shortage of skill-set on those who can actually understand how to make sense of this Big Data. Hiring and keeping them won’t come cheap – even if the IT does.
Next question, once I get this information how am I going to use it? Is someone else also going to need this kind of data? How competitive can I be when I get some information? I may face some backlash from people saying, “If I knew what to expect, would I tread in this unknown path. It’s because I don’t know I am doing this. Maybe I will get something”.
We don’t know the answers, we probably don’t know the questions and we don’t know if we have data to either of them! That’s what makes Big Data so interesting.
In short – It is time-consuming, expensive to set up and manage, you need to find the right people with the skill-set, procure the necessary hardware. So how do we mange that? One possibility – system integrators. Companies such as Capgemini and Accenture for instance, have ramped up investments and capabilities in this area by Developing solutions, making strategic partnerships with the important names in the Big Data space.
Now, let’s look at the ‘Hadoop’ phenomenon. Hadoop, the open-source, Java-based programing framework based on MapReduce by Apache Software, is open-source and the big threat to Oracle and IBM (claim to fame – relational database)!
So how are these companies positioning themselves? I see them doing two things –
1) Milking the cash cow – Hadoop is not the answer for everything and nor is it going to replace RDB’s completely anytime soon. This is understood and accepted by a large majority of people, so the high margin license fees will continue.
2) Going ‘proprietary’ on the stack – However, to counter a loss in market share, new hardware systems are being developed to support Hadoop. And a very smart game is being played here. Hadoop is being pushed as the basic underlying principle / infrastructure / framework. But any data which requires analysis, needs a BI logic (if I may call it that), a BI tool. And these solutions are being developed and customized to fit the framework of Hadoop (and logic of MapReduce). This is where, I think, the war on Big Data will be fought.
FYI – Oracle ‘owns’ Java, Google ‘owns’ MapReduce. Perhaps the SUN acquisition may actually prove decisive as hardware starts making itself relevant again?
I think we can look at the Big Data life cycle in 3 phases – Production/Gathering, Management, and, Exploitation. Take for instance Facebook, Google and Amazon. Three different types of companies with technical knowledge and tools all developed in-house. They are the producers/gatherers of big data, managers of big data as well as exploiters of big data! I think very few companies in the world today can claim such a position and least of all envy them.
Until the next post, before we dive deeper into the world of Big Data, let’s answer two questions – As a company, where would you find yourself in these categories? What is the compelling need for the Big Data investment?