As you move forward with countless data-driven projects in your organization, pay attention to the quality of that data. Whether the data is collected in a fraud detection analytics database or in a data lake for four to five different projects, this data needs to meet quality standards that make it fit for purpose.
Data quality is an elusive topic that can defy measurement and yet be critical enough to derail any project. It is easy to make overly optimistic assumptions about the effectiveness of the data. Focusing on Data Quality is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data for the benefit of the enterprise. Simply put, it is an element of competitive strategy.
However, many are asked to specify the addition of data quality business and software to projects.
Although many benefits come from improving data quality, many of these benefits are immeasurable. Benefits such as improved speed of solutions, one copy of the truth, improved customer satisfaction, improved morale, improved corporate image, and consistency between systems accrue, but the organization must selectively choose the benefits that will lead to further analysis and turn them into hard dollars as they must be measured. Hard dollar return on investment.
Measuring ROI for data quality requires a software approach to data quality. Improving data quality is not just another technology being implemented. Investing in technologies as well as in organizational changes is necessary to reap the full rewards. Data quality is right at the “perfect point” of modern business goals realizing that whatever business a company is in, it’s also in the data realm. Companies with more data, cleaner data, more accessible data, and ways to use that data will appear.
Tangible ROI on data quality
Quality abstraction in a set of agreed-upon databases and measurement of occurrence of quality violations provides returns in ROI for data quality. The main steps you can take and take to help you achieve a tangible ROI on any data quality additions begin with defining the data quality rules.
Data quality can be defined as a lack of intolerable defects. There is a limited set of possibilities that can constitute data quality defects that classify all data quality bases, such as data presence, referential integrity, expected uniqueness, expected cardinality, exact computations, data within expected limits and only valid data. The rules created in this step are the ones you want your data to conform to. These rules can be applied wherever there is important data.
The next step is to determine the quality of the data through a process of data identification and prioritization. Usually no one can tell how clean or dirty a company’s data is. Without this measure of hygiene, the effectiveness of activities aimed at improving data quality cannot be measured.
Measuring data quality begins with the inventory. By taking into account important data across the many tangible factors that can be used to measure the quality of the data, you can start translating vague feelings of squalor into something tangible. In doing so, focus can be placed on those actions that can improve important quality elements. In the end, the data quality improvement is performed against a small subset of the data items, where most of the items already conform to the standard. The subset must be chosen carefully, though. Another way to put this is that data quality initiatives are not comprehensive across all elements of a company’s data and all possibilities.
Data profiling can then be performed using software, or queries against data that show the spread of data in affected columns and check for rule compliance. Once the rules have been defined and the data categorized, data quality logging should be performed. Scoring represents the data quality status of that rule. System points are the sum of the base points for that system and the overall score is a relative sum of system points.
data quality scores
ROI is about aggregating all returns and investments from project construction, maintenance, business activities and associated technology to the desired end result – all while considering the potential outcomes and the likelihood of them occurring. Every project is different, but all things being equal, data quality scores for a system lead to different outcomes for the system, and therefore different ROI.
The return on investment is achieved not only by the intellectual determination of how the data will look, but also by the cost of the functionality of the system if the data is lacking in quality.
Therefore, we have to improve the quality of the data to improve the expected return on the project. But at what cost? In the last step, you detail the data quality procedures and cost application to reach the ideal data quality level. You can block the data with violations, fix it, report it, fix it from the source, etc.
Willingness to spend on data quality improvements should be determined entirely based on the ability to enhance data quality scores, which correlated with project return.
There is no doubt that giving value to data collected in enterprise systems is a difficult proposition. Organizations demand tangible returns on their investments, and data quality is no exception.
We are at an exciting time in history. Organizations are beginning to wake up to the fact that the data they collect and manage should be seen as the company’s assets. Ultimately, the quality of your data can be either an advantage or a disadvantage for projects.