It is no coincidence that data is the bedrock of the Data, Information, Knowledge and Wisdom (DIKW) pyramid made famous by Rusell Ackoff in 1989. This expresses the common conviction that data is the irreducible foundation upon which all knowledge is built.
However, this is not how data and knowledge works. It’s not even that there is a more basic layer under the data. Rather, the shape of the pyramid itself reinforces a model that has outgrown its purpose. The pyramid is very stable, very linear, and very one way in its direction. It is one of those industrial age paradigms where raw materials – data – are refined and transformed into a usable product.
In an age where data is exponentially increasing, and our new technology, especially machine learning, can’t get enough of it, we need to flip the pyramid. Then we need to carefully rethink what the data is, the context in which we want to use it, and how to get the most value from it.
In the old DIKW pyramid, the line between data and information is actually a repository – yet another relic of the industrial age. Data warehouses have served the predictable needs of companies since the 1950s when the modern idea of data became prevalent in business.
Under this ancient definition, data are atomic items of knowledge, isolated according to the department and application that developed them, and tagged with metadata that reflects how useful that data was expected to be. The value extracted from it after its delivery is likely to be lost forever.
This was enough as companies moved slowly and steadily into a stable and predictable ecosystem – a pace determined in part by slow data operations.
Now, of course, data has been transformed in just about every dimension. It amount It has grown exponentially. Thanks to the Internet, the bonding Even faster. Machine learning systems have made the value of data, including data that once seemed too insignificant to be recorded.
Companies now want data to help them gain deep, dynamic, and even real-time insights into their customers, markets, and supply chains, often via machine learning models. They want to provide better predictions for both daily realities and long-term trends. They want data to help them create better unique experiences for their customers. They need rapid access to data to support the rapid innovation that will allow them to thrive in the new, highly competitive environment. Companies can no longer afford to wait for the wheels of their old data mills to spin.
Fortunately, they can now move quickly. But this not only requires installing new programs and procedures. It requires a new fundamental understanding of data.
The new data model
Insight, forecasting, and support for rapid innovation aren’t new goals for companies, but what they’re asking for data was out of the question even a few years ago. To meet today’s needs:
- The time to create and deliver data sets should range from weeks or months to minutes or seconds.
- The data should be seamlessly interoperable across all legacy silos, including application silos.
- She should be able to retrieve it in ways that literally no one expected.
- It must support the most stringent security and compliance requirements in a global and often inconsistent regulatory environment.
So what does the data supporting these requirements look like?
Let’s say a company wants to figure out how to stock its store shelves for a big holiday sale. Traditionally, the manager gets a report on past holiday sales, broken down by product category and store location. But many contextual factors influence selling days. The manager wonders how these factors might be related to the purchase of local advertising? With changes in average income by region? With changes in free time? What’s the damage, with fuel prices, the weather, how the local sports team did, and all the small remaining causes that can have big impacts.
The traditional data report discouraged active participation in such questions. It was a glimpse through the fence segments. But now, the manager doesn’t want a report on the data so much as a conversation with the data, including the ability to ask questions that the data warehouse isn’t set up to answer. And, of course, the director wants all this as soon as possible.
Or perhaps that manager decided that machine learning might provide a way to find hidden statistical relationships that could make more accurate predictions about products that might be in surprisingly high demand. A machine learning model will need a lot of data to analyze, and data that cuts across many areas, including application data. Model training is highly iterative and often involves many requests for data. Any impediments directly slow down the pace of innovation, and can in turn seriously reduce business competitiveness.
But the topic is more than speed here. For example, the data must be able to interoperate With data from any other silo, from applications, from the Internet, from anywhere – not just in a common format, but to be conditionally transformed to satisfy the incoming demand. The data store starts to look less like a computer’s RAM and more like a massive computer.
In fact, this is not just an analogy. At the core of this new architecture is an API that can retrieve and compile data, draw conclusions from it, and present it in whatever form will be most useful to the person requesting it, whether it is a pdf file, a JSON file designed for a machine learning system, an audience website or an interactive tool For further inquiry and exploration.
This computational power can also ensure that the data delivered meets all relevant requirements for privacy and security. As the regulatory environment is highly fragmented, complex, and ever-changing, automating compliance through a programmable interface is critical.
Since this is an API that interacts between data and its uses, it can even make data smarter by enabling feedback loops that allow the entire data set to learn how to use it. It enables the data to talk to data from every other source, and learn from what that data is saying to each other.
Perhaps most importantly, this API-centric approach allows us to get more out of all our data because it’s so responsive and in real time that we aren’t tied into yesterday’s predictions of what we think will be important to the company today.
All of this is lost in the image of our information environment as a pyramid. In this old model, data is treated as a passive resource, not as an active business driver and agent of change. An organization gets the full value out of its data only when it embraces its connectivity, optimization, amplification, focus and application across projects and industries, increasing its intelligence with each use.