Saturday, March 23, 2013

Data quality Measurement Domain wise - I

Thought of writing this , long time before. But its never let, unless we do it.
Right now i am working for Data quality project for one of our Corporate client. Data quality is totally contextual thing. It cant be teach ,until or unless , you go through this. I read lots of thing about data quality on internet, i didn't get much help from all those articles. Besides normal data quality concepts like Data-duplication and column level profiling, there are lots of stuff, which you only face , when you are in the situation. De-duplication and column level profiling are easy things to do. And also you can not create a good business case or proposal based on this only.

While working, The most important thing, which i felt is the domain knowledge. Domain can be anything like,
Suppliers, customer, healthcare or insurance. Business user and Developers or analyst come from totally different background or experience.  But to create a good proposals or business case you need to have good domain knowledge through out the system , in or out.

The final objective is to create a good business case. And we developers only think in terms of total records or Duplicate records or Profiling stuff. But all these things are not enough to create a good proposals.

For good data quality project proposals , we need to create Business KPI and we need to link our analysis with these KPI because senior executive understands  cost, revenue, profitability, procurement, logistics, products, customers, suppliers and other important assets. we have to make them identify the processes supporting these KPIs, the data required for these to operate effectively and the quality of that data enables organizations to determine the impact of poor quality in tangible terms. Everything depends on ROI. Until unless one sees the profit, no one will come forward to seal the deal.... :)

I came across so many dead stops , while data analysis. Like after finish this, what is the next thing i have to do. where i have to take this analysis. In those times, only client helped me out in terms of domain knoweldge. Let me tell you guys, my client is one of picky one. With so much year of experience in single domain, he keep things at right place, like different pieces of puzzle. Even if we going though million records just for sanity check of my work. he just hit at the right spot, pointing out exact functional error in my analysis. These errors are not result of my SQL query, instead they are wrong data or wrong functional scenario, which was not seen by anyone else before. So, here my point is Domain knowledge. Where ever we feel the need the feel of data quality, those systems are quite old and contains lots of issues like wrong scenarios, same repetitive information. So , the bullet point is, its not work of only one kind of person. A developer needed to do play with data with all possible slicing and dicing way and on the other hand a business user also needed to look into that to find out issues.

I will come up with some of these kind of scenarios in my next post.
But for this post i just want keep theortical point of veiw at data analysis or data quality.

Lets talk about measure, how to stream line your analysis to make a good proposals. Below are some measure : Dimension to look at the data or compartmentalize your analysis.

After identifying which data to produce metrics on, the next step is to define which of the many aspects of its quality to measure. These dimensions might include:
  • Structure: Is the data in the right format for it to be usable?
  • Conformity: Does it comply with critical rules?
  • Accuracy: Does it reflect the real world?
  • Completeness: Is business-required information present?
  • Timeliness: Is it sufficiently current?
  • Uniqueness: Are duplicate records creating confusion?
  • Consistency: Is the data the same, regardless of where it resides?
  • Relevance: Is it useful to the business in its pursuit of objectives?
Defining which dimensions are important, prioritizing them and producing data quality metrics that are meaningful for business owners is typically the job of one or more data stewards. Data stewards are individuals who understand the key business processes, the role of data in those processes and the intricacies of what makes good data.

Above metrics can help demonstrate what risks or issues might be presented by any decline in data quality levels as well as what opportunities might be gained by investing in improvement. Metrics also support objective judgment and reduce the influence of assumptions, politics, emotions and vested interests.

It is important to note that there’s no point in measuring and reporting on all of an organization’s data or every aspect of that data -- be selective. A metric showing that 60 percent of Vendor records in a Vendor list lack a email-id  is likely of little consequence to KPIs.  But if 20 percent are missing a Tax code, then it could be of some importance, because if 10k of vendors are active and have deal with , then 50,000 would be returned. And if the metric referred to a contract or billing database, it could make a strong business case for a data quality project, because invoices worth millions of dollars might not be reaching customers – thus delaying or even threatening receipt of revenue.

It is also important to attach the analysis with reald world business scenarios, what are the effect of wrong data on them.Having determined the data to be measured and which dimensions to measure by, it is then possible to build a set of data quality rules against which to profile the data and compute compliance metrics. For example, if repeat sales from customer service representatives are key, and customer contact information must be present and accurate for the CSRs to make sales calls, then perhaps completeness of fields at original purchase is important, together with accuracy and structure of telephone numbers. If customers are waiting too long for the delivery of goods that are supposed to be in stock at time of order, then data consistency should be measured  because there may be a disparity between product codes in the order processing system and the warehouse stock and dispatch system.
In producing metrics, it’s best to be very focused at first, concentrating on just a few areas where data appears critical to business performance. It’s also better initially to generate just a small number of metrics on important characteristics that have real meaning to business managers in their roles and responsibilities.

Indicating and proving that the business cannot be confident in the data it relies on for certain important processes, decisions or compliance reports should justify further investigation. Success in one area can then be used as a reference to help communicate the value that could be won from metrics in other areas of the organization.







0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home