The typical data project starts with the BA or systems architect asking: “fast, cheap or good – which one do you want?” But in my experience, no matter how much time you have, or how much money you are willing to throw at it, or what features you are willing to sacrifice, many initiatives are doomed to fail before you even start because of inherent obstacles – what I like to refer to as the 3L’s of data projects.
Reflecting on work I have been doing with various clients over the past few years, it seems to me that despite their commitment to invest in system upgrades, migrate their content to new delivery platforms and automate their data processing, they often come unstuck due to fundamental flaws in their existing operations:
This is the most common challenge – overhauling legacy IT systems or outmoded data sets. Often, the incumbent system is still working fine (provided someone remembers how it was built, configured or programmed), and the data in and of itself is perfectly good (as long as it can be kept up-to-date). But the old applications won’t talk to the new ones (or even each other), or the data format is not suited to new business needs or customer requirements.
Legacy systems require the most time and money to replace or upgrade. A colleague who works in financial services was recently bemoaning the costs being quoted to rewrite part of a legacy application – it seemed an astronomical amount of money to write a single line of code…
As painful as it seems, there may be little alternative but to salvage what data you can, decommission the software and throw it out along with the old mainframe it was running on!
Many data projects (especially in financial services) focus on reducing systems latency to enhance high-frequency and algorithmic securities trading, data streaming, real-time content delivery, complex search and retrieval, and multiple simultaneous user logins. From a machine-to-machine data handover and transaction perspective, such projects can deliver spectacular results – with the goal being end-to-end straight through processing in real-time.
However, what often gets overlooked is the level of human intervention – from collecting, normalizing and entering the data, to the double- and triple-handling to transform, convert and manipulate individual records before the content goes into production. For example, when you contact a telco, utility or other service provider to update your account details, have you ever wondered why they tell you it will take several working days for these changes to take effect? Invariably, the system that captures your information in “real-time” needs to wait for someone to run an overnight batch upload or someone else to convert the data to the appropriate format or yet another person to run a verification check BEFORE the new information can be entered into the central database or repository.
Latency caused by inefficient data processing not only costs time, it can also introduce data errors caused by multiple handling. Better to reduce the number of hand-off stages, and focus on improving data quality via batch sampling, error rate reduction and “capture once, use many” workflows.
Which leads me the third element of the troika – data governance (or the lack thereof).
In an ideal world, organisations would have an overarching data governance model, which embraces formal management and operational functions including: data acquisition, capture, processing, maintenance and stewardship.
However, we often see that the lack of a common data governance model (or worse, a laissez-faire attitude that allows individual departments to do their own thing) means there is little co-operation between functions, additional costs arising from multiple handling and higher error rates, plus inefficiencies in getting the data to where it needs to be within the shortest time possible and within acceptable transaction costs.
Some examples of where even a simple data capture model would help include:
- standardising data entry rules for basic information like names and addresses, telephone numbers and postal codes
- consistent formatting for dates, prices, measurements and product codes
- clear data structures for parent/child/sibling relationships and related parties
- coherent tagging and taxonomies for field types, values and other attributes
- streamlining processes for new record verification and de-duplication
From experience, autonomous business units often work against the idea of a common data model because of the way departmental IT budgets are handled (including the P&L treatment of and ROI assumptions used for managing data costs), or because every team thinks they have unique and special data needs which only they can address, or because of a misplaced sense of “ownership” over enterprise data (notwithstanding compliance firewalls and other regulatory requirements necessitating some data separation).
One way to think about major data projects (systems upgrades, database migration, data automation) is to approach it rather like a house renovation or extension: if the existing foundations are inadequate, or if the old infrastructure (pipes, wiring, drains, etc.) is antiquated, what would your architect or builder recommend (and how much would they quote) if you said you simply wanted to incorporate what was already there into the new project? Would your budget accommodate a major retrofit or complex re-build? And would you expect to live in the property while the work is being carried out?
Next week: AngelCube15 – has your #startup got what it takes?