In this blog I am going to describe a scenario where a Big Data stack is introduced to provide a business advantage to an established information ecosystem. The company in this scenario is in biopharmaceutical and it involves dark data.

For years this company has invested in and maintained an enterprise content management to store all research related and operational documentation.

It’s been working well for many years. You know when a system is well received. The user community use it as simply as a kitchen appliance.

The only time it’s noticed is when there is a failure.

These particular systems have reached an enviable operation record. In the last year the unscheduled downtime has been 35 minutes spread over the year. That’s pretty good going for a system with 800 concurrent users and a total global community of 5000 users.

This system supports the development of new drugs, speculative research, marketing, finance, buildings operations in fact almost everything.

As it’s a single large repository, the security mechanism is incredibly granular, insuring information is dished out on a need to know basis.

All looks well but what lies beneath are some serious issues.

The escalating costs  of running this system are becoming difficult to justify.

It’s expensive to maintain. There are on going license, support, hardware and staff costs.

The knee jerk reaction is to switch to a system that’s cheaper to run.

Well let’s look at that for a moment.

To shift to an entirely new ecosystem is a massive cost in itself. It also carries a great deal of risk.

What if there is data loss? What if what is delivered has a poorer operational record?

IT ain’t stupid here. They know if they screw up, the scientists who create the value in this company will be out for their blood.

If upper management are prepared to deal with couple of thousand scientists, that’s fine. Like who listens to geeks anyway.

However when outages affect the pipeline to new drugs coming onto the market, that will affect share price. T

hat will get senior management closer to the executioners block! Which bit shall I chop off first????

So what are the alternatives to migrating to a new ecosystem?

Well augment what you have already.

This company is at least lucky that their current stack is extensible.

You are able to bolt on other technologies that can leverage their existing repository.

So let’s ask the question, “what additional features would your users like that you aren’t offering?”

The quick answer is collaboration. They don’t have spaces where they can collaborate across continents.

I mean the ability to facilitate knowledge creation through a synthesis of joint document authoring, review, publishing and audio/video conferencing. Okay now we are going off track!

This isn’t the analytics problem we are looking for.

However this is exactly what this company is investing in. They are doing it because it’s going to bring back some added value and also it’s something they can understand.

What I am proposing is something akin to sorcery! And it gives shiver down my spine. I am not talking about the feeling you get reading about You-Know-Who in the Harry Potter world created by the amazing JK Rowling.

I am taking about the creepy feeling you get when reading Lovecraft or Crowley.

The bump in the night that freaks you out when reading., “The Tibetan Book of the Dead”.

I am taking about going after dark data!

The information that exists in large repositories that is inaccessible due to non existent meta data. I am taking about metrics on fluctuations in dark data in close to real-time as we can.

The term dark data is not new. Here is the Gartner definition.
Dark data is the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.

In the context of the biopharma, dark data is content whose value goes unrealised.

For example that graduate students promising research that goes unnoticed.

If only a few of these ideas are realised for a pharma company it could be the make or break of a new drug. It could literally be worth billions.

Dark data is often untagged or at best the metadata applied to it gives no clue of what the content relates to. So how do we get value?

We have to go in and retrieve the semantic meaning from the text. We need to retrieve the concepts and create social graph.

Once we have that we can see bring the dark into the light and see the kind of information assets we have, whose created them, when and the distribution of dark data in our information repository.

Now the question is how? How can we do this?

This is where the tools we have applying to Big Data analytics can help. We can trawl through vast quantities of information using cluster processing powering semantic meaning & concept extraction. Then visualise what we have found out and assist the data scientist to uncover new value.

That’s the dream and it’s not far off…..