Deconstructing #Digital Obsolescence

Posted on July 7, 2015 by Content in Context

Remember the video format wars of the 1980s? At one point, VHS and Betamax were running neck and neck in the consumer market, but VHS eventually won out (although the also-ran V2000 was technically superior to both). Since then, we’ve had similar format battles for games consoles, video discs, computer storage, CD’s and e-books. It’s the inevitable consequence of operating platforms trying to dominate content – a continuing trend which has probably reached its apotheosis with the launch of Apple’s Beats 1 streaming service. This convergence of hardware and software is prompting some contrary trends and, if nothing else, proves our suspicion of hermetically sealed systems…

Trevor Jackson embarks on a format frenzy….

1. Digital Divergence

Earlier this year, UK music producer Trevor Jackson released a collection of 12 songs, each one pressed on a different media format: 12″, 10″ and 7″ vinyl; CD and mini-CD; cassette; USB; VHS; minidisc; DAT; 8-track cartridge; and reel-to-reel tape. Of course, he could have also used 78 rpm shellac records, digital compact cassettes, Digital8 tapes, 3.5 and 5.25 inch floppy disks (still available, I kid you not) or any of the multitude of memory cards that proliferate even today.

While Jackson’s “Format” project might seem gimmicky, it does demonstrate that many digital formats are already obsolete compared to their analogue counterparts (and until very recently, I could have played 8 of the 12 formats myself – but I’ve just donated my VHS player to our local DVD store).

As I have blogged previously, there is an established body of digital/analogue hybrids, especially in data storage, and I can only see this continuing as part of the creative tension between operating systems and content formats.

2. Digital Archeology

Each new hardware/software upgrade brings a trail of digital obsolescence (and a corresponding amount of e-waste). It’s also giving rise to a new discipline of digital archeology, combining forensics, anthropology and hacking.

Back in 2002, it was discovered that a 15-year old multimedia version of the Domesday book was unreadable* – yet the hand-written version is still legible, and available to anyone who can read (provided they can decipher 1,000-year old Norman English). Apparently, it has taken longer to decrypt the 1986 video disc than it took to create it in the first place.

More digital archeologists will be needed to mine the volumes of data that reside in archival formats, if we are to avoid losing much of the knowledge we have created since the advent of the personal computer and the public internet.

3. Digital Provenance

We’re used to managing our data privacy and computer security via password protection, network protocols and user authentication. If we think about it, we also question the veracity of certain e-mails and websites (phishing, scamming, malware, trojans etc.).

A while ago I blogged about the topic of digital forgeries, and the associated phenomenon of digital decay. Just as in the art world, there is a need to establish a method of digital provenance to verify the attributes and authenticity of content we consume.

We are already seeing this happen in the use of block chains for managing cryptocurrencies, but I believe there is a need to extend these concepts to a broader set of transactions, while also facilitating the future proofing and retrofitting of content and operating systems.

4. Digital Diversity

In response to closed operating systems, sealed hardware units and redundant formats, there are several interesting and divergent threads emerging. These are both an extension of the open source culture, and a realisation that we need to have transferable and flexible programming abilities, rather than hardwired coding skills for specific operating systems or software platforms.

First, the Raspberry Pi movement is enabling richer interaction between programming and hardware. This is especially so with the Internet of Things. (For a related example, witness the Bigshot camera).

Second, Circuit Bending is finding ways to repurpose otherwise antiquated hardware that still contain reusable components, processors and circuit boards.

Third, some inventive musicians and programmers are resuscitating recent and premature digital antiques, such as Rex The Dog‘s re-use of the Casio CZ-230S synthesizer and its Memory Tapes to remix their first single, and humbleTUNE‘s creation of an app that can be retrofitted to the original Nintendo Gameboy.

These trends remind me of those Radio Shack and Tandy electronics kits I had as a child, which taught me how to assemble simple circuits and connect them to hardware. (And let’s not forget that toys like LEGO and Meccano started incorporating motors, electronics, processors and robotics into their kits many years ago.)

5. Salvaging the Future

Finally, as mentioned above, built-in digital obsolescence creates e-waste of the future. A few recycling schemes do exist, but we need to do a better job of reclaiming not just the data archives contained in those old disks, drives and displays, but also the materials from which they are made.

* My thanks to Donald Farmer of Qlik for including this in his recent presentation in Melbourne.

Next week: #FinTech – what’s next?

Personal vs Public: Rethinking Privacy

Posted on March 24, 2015 by Content in Context

An incident I recently witnessed in my neighbourhood has caused to me to rethink how we should be defining “privacy”. Data protection is one thing, but when our privacy can be compromised via the direct connection between the digital and analog worlds, all the cyber security in the world doesn’t protect us against unwanted nuisance, intrusion or even invasion of our personal space.

Scenario

As I was walking along the street, I saw another pedestrian stop outside a house, and from the pavement, use her smart phone to take a photograph through the open bedroom window. Regardless of who was inside, and irrespective of what they were doing (assuming nothing illegal was occurring), I would consider this to be an invasion of privacy.

For example, it would be very easy to share the picture via social media, along with date and location data. From there, it could be possible to search land registries and other public records to ascertain the identity of the owners and/or occupants. And with a little more effort, you might have enough information to stalk or even cyber-bully them.

Privacy Law

Photographing people on private property (e.g., in their home) from public property (e.g., on the street outside) is not an offence, although photographers must not cause a nuisance nor interfere with the occupants’ right of quiet enjoyment. Our current privacy laws largely exclude this breach of privacy (unless it relates to disclosure of personal data by a regulated entity). Even rules about the use of drones are driven by safety rather than privacy concerns.

Since the late 1990’s, and the advent of spam and internet hacking, there have been court decisions that update the law of trespass to include what could be defined as “digital trespass”, although some judges have since tried to limit such actions to instances where actual harm or damage has been inflicted on the plaintiff. (Interestingly, in Australia, an act of trespass does not have to be “intentional”, merely “negligent”.)

Apart from economic and financial loss that can arise from internet fraud and identity theft, invasion of privacy via public disclosure of personal data could lead to personal embarrassment, damage to reputation or even ostracism. (In legal terms emotional stress falls within “pain and suffering”).

Data Protection Law

The Australian Privacy Principles contained within the 1988 Privacy Act apply to government agencies, private companies with annual turnover of $3m or more, and any organisations trading in personal data, dealing with credit information or providing health services. There are specific provisions relating to the use and misuse of government-derived identifiers such as medical records and tax file numbers.

The main purpose of the privacy legislation is to protect “sensitive” information, and to prevent such data being used unlawfully to identify specific individuals. At a minimum, this means keeping personal data such as dates of birth, financial records or hospital files in a secure format.

Some Practical Definitions

The following are not legal definitions, but hopefully offer a practical framework to understand how we might categorise such data, and manage our obligations towards it:

“Confidential”

Secret information that must not be disclosed to anyone unless there is a legal obligation or permission to do so. (There are also specific issues and exceptions relating to “classified information”, public interest matters, whistleblower protection and Freedom of Information requests.)

“Private”

Information which is not for public or general consumption, although the data itself may not be “confidential”. May still be subject to legal protection or rights, such as the right of adopted children to discover the identity of their birth parents, or the right of someone not to be identified as a lottery winner.

“Personal”

Data that relates to, or can specifically identify a particular individual. An increasing issue for Big Data, because data that otherwise resides in separate locations can now be re-connected using triangulation techniques – scrape enough websites and drill down into enough databases, and you could probably find my shoe size.

“Public”

Anything that has been published, or easily discoverable through open search or public database retrieval (but, for example, does not include my past transactions on eBay unless I have chosen to disclose them to other users). My date of birth may be a matter of record, but unless you have authorised access to the relevant database or registry, you won’t be able to discover it and you certainly shouldn’t disclose it without my permission.

Copyright Law

One further dimension to the debate is copyright law – the ownership and related rights associated with any creative works, including photographs. All original content is copyright (except those works deemed to be in the “public domain”), and nearly all copyright vests with the person who created the work (unless they have legally assigned their copyright, or the material was created in the course of their employment).

In the scenario described above, the photographer would hold copyright in the picture they took. However, if the photograph included the image of an artwork or even a framed letter hanging on the wall, they could not reproduce the photograph without the permission of the person who owned the copyright in those original works. In some (limited) situations, a photograph of a building may be subject to the architect’s copyright in the design.

Curiosity is not enough justification to share

My personal view on all this is that unless there is a compelling reason to make something public, protecting our personal privacy takes precedent over the need to post, share or upload pictures of other people in their private residence, especially any images taken without the occupants’ knowledge or permission.

Just to clarify, I’m not referring to surveillance and monitoring by the security services and law enforcement agencies, for which there are understandable motives (and appropriate safeguards).

I’m saying that if we showed a little more respect for each others’ personal space and privacy (particularly within our homes, not just in cyberspace) then we might show a little more consideration to our neighbours and fellow citizens.

Next week: It’s OK to say “I don’t know”

The 3L’s that kill #data projects

Posted on March 3, 2015 by Content in Context

The typical data project starts with the BA or systems architect asking: “fast, cheap or good – which one do you want?” But in my experience, no matter how much time you have, or how much money you are willing to throw at it, or what features you are willing to sacrifice, many initiatives are doomed to fail before you even start because of inherent obstacles – what I like to refer to as the 3L’s of data projects.

Image taken from "Computers at Work" © 1969 The Hamlyn Publishing Group

Reflecting on work I have been doing with various clients over the past few years, it seems to me that despite their commitment to invest in system upgrades, migrate their content to new delivery platforms and automate their data processing, they often come unstuck due to fundamental flaws in their existing operations:

Legacy

This is the most common challenge – overhauling legacy IT systems or outmoded data sets. Often, the incumbent system is still working fine (provided someone remembers how it was built, configured or programmed), and the data in and of itself is perfectly good (as long as it can be kept up-to-date). But the old applications won’t talk to the new ones (or even each other), or the data format is not suited to new business needs or customer requirements.

Legacy systems require the most time and money to replace or upgrade. A colleague who works in financial services was recently bemoaning the costs being quoted to rewrite part of a legacy application – it seemed an astronomical amount of money to write a single line of code…

As painful as it seems, there may be little alternative but to salvage what data you can, decommission the software and throw it out along with the old mainframe it was running on!

Latency

Many data projects (especially in financial services) focus on reducing systems latency to enhance high-frequency and algorithmic securities trading, data streaming, real-time content delivery, complex search and retrieval, and multiple simultaneous user logins. From a machine-to-machine data handover and transaction perspective, such projects can deliver spectacular results – with the goal being end-to-end straight through processing in real-time.

However, what often gets overlooked is the level of human intervention – from collecting, normalizing and entering the data, to the double- and triple-handling to transform, convert and manipulate individual records before the content goes into production. For example, when you contact a telco, utility or other service provider to update your account details, have you ever wondered why they tell you it will take several working days for these changes to take effect? Invariably, the system that captures your information in “real-time” needs to wait for someone to run an overnight batch upload or someone else to convert the data to the appropriate format or yet another person to run a verification check BEFORE the new information can be entered into the central database or repository.

Latency caused by inefficient data processing not only costs time, it can also introduce data errors caused by multiple handling. Better to reduce the number of hand-off stages, and focus on improving data quality via batch sampling, error rate reduction and “capture once, use many” workflows.

Which leads me the third element of the troika – data governance (or the lack thereof).

Laissez-faire

In an ideal world, organisations would have an overarching data governance model, which embraces formal management and operational functions including: data acquisition, capture, processing, maintenance and stewardship.

However, we often see that the lack of a common data governance model (or worse, a laissez-faire attitude that allows individual departments to do their own thing) means there is little co-operation between functions, additional costs arising from multiple handling and higher error rates, plus inefficiencies in getting the data to where it needs to be within the shortest time possible and within acceptable transaction costs.

Some examples of where even a simple data capture model would help include:

standardising data entry rules for basic information like names and addresses, telephone numbers and postal codes
consistent formatting for dates, prices, measurements and product codes
clear data structures for parent/child/sibling relationships and related parties
coherent tagging and taxonomies for field types, values and other attributes
streamlining processes for new record verification and de-duplication

From experience, autonomous business units often work against the idea of a common data model because of the way departmental IT budgets are handled (including the P&L treatment of and ROI assumptions used for managing data costs), or because every team thinks they have unique and special data needs which only they can address, or because of a misplaced sense of “ownership” over enterprise data (notwithstanding compliance firewalls and other regulatory requirements necessitating some data separation).

Conclusion

One way to think about major data projects (systems upgrades, database migration, data automation) is to approach it rather like a house renovation or extension: if the existing foundations are inadequate, or if the old infrastructure (pipes, wiring, drains, etc.) is antiquated, what would your architect or builder recommend (and how much would they quote) if you said you simply wanted to incorporate what was already there into the new project? Would your budget accommodate a major retrofit or complex re-build? And would you expect to live in the property while the work is being carried out?

Next week: AngelCube15 – has your #startup got what it takes?

Content in Context

Exploring the information age…

Category Archives: Data Management