The four V’s of big data – volume, velocity, variety and veracity – are well known. But, just as important, is a lesser known V, according to Richard Self, Senior Lecturer at Derby University: value.
What is the value of the data for the customers, the suppliers, and the business partners?
In this interview, Self discusses how big data is changing businesses, the challenge of finding the right data skills and asks the question, are Chief Information Officers still relevant in the big data era?
This interview was featured in PEX Network’s “
Deriving Value from the Data Deluge” on how companies are adopting and implementing big data strategies. To download a full copy of the report, for free, please click
here.
PEX Network: According to our survey, the seniority of staff under which Big Data program are managed varies wildly from one organization to the next. Is there a growing need for dedicated data analysts? Is there a risk that the C-Suite can mishandle or misinterpret large-scale data?
Richard Self: It seems to very much depend on the size of the organization. What’s becoming quite clear is that a great debate is raging about whether there is still a role for a chief information officer or a fixed chief technology officer. Should we now have chief data officers instead? The real debate comes down to technology versus data.
The most enlightened companies seem to be beginning to realize that they’re going back to the old way – seeing the data as the value of the company, essentially the intellectual property. It goes back to what Nicholas Carr was writing about in 2003-2004 when he wrote Does IT Matter? It discussed the creation of sustainable value, whether it’s the technology itself or what we do with the technology? Carr was saying, very clearly, that the technology is just a utility now; everybody has access to it and we can’t get sustainable competitive advantage just through the use of technology because we’ve all got it.
When you start looking at the ‘data side’ of big data analytics, you have to break down all of the silos, you have to crosslink them, and you then have to start thinking about how you do that. We, then, get into the next round of debate, which is whether we need a central business intelligence or data analytics group.
I’ve come across companies with both, but they are often working for the chief data officer or someone of that ilk, who will be charged with developing the mechanisms – be it through technology or practice – that allow the connection of all the internal operational and tactical data together with anything that’s of value from outside. That could mean social media, weather data, or various other interesting sets of data, which combined creates an ocean of information to be investigated. This can provide smart insights that certainly can translate to distinctive, competitive advantage.
So, is there a risk that the C-suite could mishandle large-scale data? Yes! It’s often the magic pixie dust, as far as they’re concerned. If you go back to around 2004, Professor Angell from the London School of Economics pointed out that one of the problems with IT is that the C-suite considers IT to be something you can sprinkle liberally over a problem and cause it to vanish. There is a problem there and it’s exactly why we need discussion on helping all levels of management to understand the capabilities, the limitations, the ethics and the governance of big data.
PEX Network: Many experts we have heard from recently have been underscoring the idea it is the people over the technology that matters. How can organizations know if you’ve got the right people involved if you’re not relying on the C-suite? And should they be sourcing those people?
Richard Self: If we look at reports from last year on future projections, they were suggesting that by 2020, in the UK, something like 56,000 jobs will require highly specialist data scientists. By the same time, 500,000-550,000 of today’s existing employees are going to need to be upskilled to understand what can and can’t be done so they can intelligently guide the analysts and scientists into creating the right models or approaches. Smart decisions need to be made to add value to the organization and avoid the vulnerabilities and risks that can emerge if you go about it in the wrong way, as happened with the recent Target data breach.
PEX Network: In that sense, there does seem to at least be a swing in the direction of affirming that the role of the Chief Data Officer is becoming necessary. There are certainly more this year than last and there seems to be an understanding that responsibility and accountability are critical. However, according to our survey, there remains an issue of convincing not just the upper tier to invest monetarily, but also getting the entire organization on board and moving in the same direction…
Richard Self: I think that reflects a couple of things. One is the fact that big data analytics is still riding the hype curve that Gartner are famous for, and although there are quite a lot of statistics being poured out by the industry in general that illustrate big data analytics will improve your bottom line, these are looked on very sceptically by many, many organizations. This is because we have seen these claims time and time again about different types of technologies.
There’s also a recognition that while we hear many examples of success, we don’t hear many of failure – or, as Standish Group call them, ‘challenged projects’. The evidence seems to be that these big data analytics projects are no different at all from any other IT-related project, which means 35 percent will be successful on time, budget and quality. There’ll be something like 35-40 percent that are challenged, or in other words, late, over budget, and delivering only a small proportion of the expected benefits. Then there’s the usual 30-35 percent, which are outright failures and never deliver.
So we’re hearing about a 30-35 percent success rate. I think the only failure we’ve heard much about is the one with Target, which, at a technical level, was a stunning success. They could identify women as being incredibly early into their pregnancy. That’s great as a technical capability but it ended up being catastrophic in terms of public image. It’s very much a problem of being one of the latest fads, but is also something that, in principle, can deliver very, very strong results in terms of sustainable, competitive advantage. That does mean that those who are successful will potentially do very well, but it doesn’t necessarily mean that everybody has to do it because people’s intuitions, particularly in small organizations, are often pretty good already. That’s why they’re successful at starting up a new business.
PEX Network: The allure is perhaps understandable, but there is also that fear of knowing where to start, from a technological perspective. It could mean overhauling everything that’s been in place to begin with, or trying to integrate legacy systems into the process. How do you even know where to start?
Richard Self: You really need to start with asking yourself a whole range of questions about the governance of the project. By governance, I’m talking about the broadest level – whether you are going to do the right action in the right way at the right time.
Most in this field are familiar with the four Vs of big data: volume, variety, velocity, and veracity. My students have taken that a little bit further – up to 13 or 14 terms beginning with V that are all really important considerations for big data analysis. Things like ‘value’. What’s the value in it for me and for the various stakeholders, for the customers, the suppliers, the partners, and so on. Then what sort of value does that entail? Is it always monetary or is it better service? By prompting creativity, it’s not so difficult for them to work out whether they should do it, and then, once they’ve identified a pilot project, they can start thinking about the roles and skills needed for their techies, and about upskilling their own internal management.
Where will they source the right people? Well, that’s going to be really fun because the 56,000 skilled big data analytics and data science graduates is a real problem. Universities can’t tool up fast enough. There aren’t enough kids coming out of school who actually recognise the wondrous nature of analytics and data science to get us to the point we would need by 2020.
[eventPDF]
PEX Network: Looking at 2020 and beyond, no doubt there will continue to be an adoption of Big Data, regardless of whether there is enough of a workforce to take it to its limit…but how do you see this actually impacting the markets? Do you have an idea as to what the landscape will look like?
Richard Self: We’re already seeing various experiments in the use of things like social media, monitoring live discussions. Some of these experiments are going overboard in trying to get more than is probably realistic, given that the machine understanding of language is not particularly good. Computers struggle to understand irony, for instance. It’s similar to the problems we, as humans, sometimes have in reading meaning in emails if they’re not carefully crafted.
Santander provides a rather lovely example. They use social media not to find out problems and react, but to instead receive early warning of potential issues, which can then be investigated and fixed before they go very wrong. That relies on the understanding that people who tweet and Facebook about something going wrong will usually communicate it to ten other people, whereas, on average, people who have had a good experience will possibly tell as many as three. So there’s a huge imbalance in social media anyway when it comes to good versus bad experiences. Santander were therefore, in my judgement, extremely wise to use social media as a source of customer intelligence.
Within social media, there are many other applications. For example, people are using it to design or redesign new products because people tweet about products they’d like to have.
Aside from social media is the extremely fast rise of the connected world and IoT. That’s going to generate staggering amounts of data, streaming at huge velocities. It will impact industries like the automotive industry, where we’ve heard that the likes of GM, Ford and BMW are all busy developing mechanisms on top-end cars to notify back to base how the vehicle is being driven. It’s effectively what the aerospace industry has been doing since the mid-80s, generally so that mechanics on the ground can be notified of problems. This raises some interesting questions in terms of data governance and personal identifiable information because, of course, a car is attached to an owner by name, hence data protection requirements are required.
All this leads to questions over who owns the data. Is it the driver? Is it the owner of the car? Or the manufacturer? Or, for that matter, the insurance company, who is collecting the data? As we move into the world of autonomous, self-driving cars, over the next five to ten years, will present bigger issues about liability and so on.
We’ve seen this already with things like cognitive computing, advisory systems like IBM Watson and the Watson Oncology. IBM are sensibly telling people who are going to use it that you, the human, have to make the decision. The worry however is that, as with many other forms of technology, the early adopters may understand the issues but the junior staff coming through the training system will simply see the seniors using it and accept the recommendations almost all of the time without understanding how the senior person is connecting or comparing the advice with their own field of expertise.
Ultimately – and I’ve seen this happen several times within different industries and different areas of technology – the juniors don’t go on to build their own internal model; they instead become totally reliant on the technology, with no overriding knowledge of how things work. It doesn’t matter whether it’s in the field of oncology or insurance policy, this problem poses a great danger as we move into the future.