Here’s the paradox facing the consumption and analysis of #BigData: the cost of data collection, storage and distribution may be decreasing, but the effort to turn data into unique, valuable and actionable insights is actually increasing – despite the expanding availability of data mining and visualisation applications.
One colleague has described the deluge of data that businesses are having to deal with as “the firehose of information”. We are almost drowning in data and most of us are navigating up river without a steering implement. At the risk of stretching the aquatic metaphor, it’s rather like the Sorcerer’s Apprentice: we wanted “easy” data, so the internet, mobile devices and social media granted our wish in abundance. But we got lazy/greedy, forgot how to turn the tap off and now we can’t find enough vessels to hold the stuff, let alone figure out what we are going to do with it. Switching analogies, it’s a case of “can’t see the wood for the trees”.
Perhaps it would be helpful to provide some terms of reference: what exactly is “big data”?
First, size definitely matters, especially when you are thinking of investing in new technologies to process more data more often. For any database less than say, 0.5TB, the economies of scale may dissuade you from doing anything other than deploy more processing power and/or capacity, as opposed to paying for a dedicated, super-fast analytics engine. (Of course, the situation also depends on how fast the data is growing, how many transactions or records need to be processed, and how often those records change.)
Second, processing velocity, volume and data variety are also factors – for example, unless you are a major investment bank with a need for high-frequency, low-latency algorithmic market trading solutions, then you can probably make do with off-the-shelf order routing and processing platforms. Even “near real-time” data processing speeds may be overkill for what you are trying to analyze. Here’s a case in point:
Slick advertorial content, and I agree that the insights (and opportunities) are in the delta – what’s changed, what’s different? But do I really need to know what my customers are doing every 15 seconds? For a start, it might have been helpful to explain what APM is (I had to Google it, and CA did not come up in the Top 10 results). Then explain what it is about the resulting analytics that NAB is now using to drive business results. For instance, what does it really mean if peak mobile banking usage is 8-9am (and did I really need an APM solution to find this out?) Are NAB going to lease more mobile bandwidth to support client access on commuter trains? Has NAB considered push technology to give clients account balances at scheduled times? Is NAB adopting technology to shape transactional and service pricing according to peak demand? (Note: when discussing this example with some colleagues, we found it ironic that a simple inter-bank transfer can still take several days before the money reaches your account…)
Third, there are trade-offs when dealing with structured versus non-structured data. Buying dedicated analytics engines may make sense when you want to do deep mining of structured data (“tell me what I already know about my customers”), but that might only work if the data resides in a single location, or in multiple sites that can easily communicate with each other. Often, highly structured data is also highly siloed, meaning the efficiency gains may be marginal unless the analytics engine can do the data trawling and transformation more effectively than traditional data interrogation (e.g., query and matching tools). On the other hand, the real value may be in unstructured data (“tell me something about my customers I don’t know”), typically captured in a single location but usually monitored only for visitor volume or stickiness (e.g., a customer feedback portal or user bulletin board).
So, to data visualisation.
Put simplistically, if a picture can paint a thousand words, data visualisation should be able to unearth the nuggets of gold sitting in your data warehouse. Our “visual language” is capable of identifying patterns as well as discerning abstract forms, of describing subtle nuances of shade as well as defining stark tonal contrasts. But I think we are still working towards a visual taxonomy that can turn data into meaningful and actionable insights. A good example of this might be so-called sentiment analysis (e.g., derived from social media commentary), where content can be weighted and scored (positive/negative, frequency, number of followers, level of sharing, influence ranking) to show what your customers might be saying about your brand on Twitter or Facebook. The resulting heat map may reveal what topics are hot, but unless you can establish some benchmarks, or distinguish between genuine customers and “followers for hire”, or can identify other connections with this data (e.g., links with your CRM system), it’s an interesting abstract image but can you really understand what it is saying?
Another area where data visualisation is being used is in targeted marketing based on customer profiles and sales history (e.g., location-based promotion using NFC solutions powered by data analytics). For example, with more self-serve check-outs, supermarkets have to re-think where they place the impulse-buy confectionary displays (and those magazine racks that were great for killing time while queuing up to pay…). What if they could scan your shopping items as you place them in your basket, and combined with what they already know about your shopping habits, they could map your journey around the store to predict what’s on your shopping list, thereby prompting you via your smart phone (or the basket itself?) towards your regular items, even saving you time in the process. And then they reward you with a special “in-store only” offer on your favourite chocolate. Sounds a bit spooky, but we know retailers already do something similar with their existing loyalty cards and reward programs.
Finally, what are some of the tools that businesses are using? Here are just a few that I have heard mentioned recently (please note I have not used any of these myself, although I have seen sales demos of some applications – these are definitely not personal recommendations, and you should obviously do your own research and due diligence):
For managing and distributing big data, Apache Hadoop was name-checked at a financial data conference I attended last month, along with kdb+ to process large time-series data, and GetGo to power faster download speeds. Python was cited for developing machine learning and even predictive tools, while DataWatch is taking its data transformation platform into real-time social media sentiment analysis (including heat and field map visualisation). YellowFin is an established dashboard reporting tool for BI analytics and monitoring, and of course Tableau is a popular visualisation solution for multiple data types. Lastly, ThoughtWeb combines deep data mining (e.g., finding hitherto unknown connections between people, businesses and projects via media coverage, social networks and company filings) with innovative visualisation and data display.
Next week: a few profundities (and many expletives) from Dave McClure of 500 Startups