Data Analysis Case Study
RJ Corporate has initiated a program to drive a standard
program to activate/inactivate irrelevant data in four system, which is XXX, YYY,
ZZZ, DDD source systems. The data in these systems would be further be profiled
and cleaned to maintain a single source of truth for all the supplier data
received from these source applications.
The current source Applications has limitations in providing
the necessary platform to maintain clean and single place for true supplier
data. In addition, the existing data quality issues in the current application
require that a strong DQ solutions need to be built which can full fill the
required program objectives.
This requirement specifications provides data quality rules
that needs to be created and maintaining clean supplier.
Process:-
We started with Source system ZZZ, which is one of the most
updated and important source system, I suppose, because activation and
deactivation of other records from different system depends on this ZZZ system.
So, I am the sole owner for all data analyst activity for
this project, there are other guys, who is also working but, those are just
plain activity,
First I decided the target table structure for ZZZ, because
I was getting all the 13 Lakh records from a report generated by GE. I created
a SP to load the target table SI from the table, where we dumped all the
records from SI generated reports, and applied some of the validation rules and
cleansing rules, and also some rejection criteria, so that we can have a good
look at the SI_EXCEPTION_TABLE and come up with idea, what to do with these
records, after loading the main DQ_SI table. We looked into the SI_EXCEPTION
table and decided some more strategy and moved some of the exception records
into main table after some cleansing and transformation. And we left some
records into the exception records, because they were not able to pass any
validation criteria.
Same thing we did for XXX, ZZZ and DDD.
Now we are working on the inter comparison between these
four system for activation and inactivation of records between all the source
system.
Tool :- Oracle 11g.
Analysis Process :- The final objective of this analysis
process is to find out those records in XXX, which can be deactivated based on
SI status flag. But there were single standard column, on which we could decide
to do this activity. So we started the series of analysis based on different
combination of fields between YYY and XXX like their likely key combination
comparison, then their supplier name comparison, their address, city , and
country combination.
Below is the process step by step.
- First we included extra column named MATCH in XXX system to just find out how many records are matching between YYY and XXX based on different key columns comparison. [There were total 10 scenarios like that]. Why we included this extra column, because if we would have joined those table between YYY and XXX, we would not get the exact number of records.
- After doing this, we extracted all the records for all the scenario based on the joined condition in an excel sheet.
- Then we did some more comparison to find out the exact match, which was totally functional requirement like for some records their KEY was matching but their supplier name was not matching. So did extra comparison in the sheet to find out more number of matching records.
- After this step, we have taken all the KEY columns value from all the different scenario sheet into single sheet, Which was our total final number of KEY column, which we are going to deactivate in the XXX system.
1)
If we are dealing with more than
millions of records in Oracle. Then it is really tough to update millions of
records in small amount of time. Because Update operation is always time
consuming. For our 3rd system ZZZ , we needed 6 extra column to
include in the main table, which was coming from all 6 different table. Since
this requirement came at the later point of update. So there were no
alternative to use update other than Update.
Insert into load_attr(ven_id,
ven_si_id, org_id, amount, amount_date,payment)
Select ven_id, ven_si_id, org_id,
max(amount), max(amount_date), max(payment)From (
Select ven_id, ven_si_id, org_id,
Amount,
‘’ as amount_date,
‘’ as payment from Invoice_1
UNION ALL
Select ven_id, ven_si_id, org_id,
‘’ asAmount,
amount_date as amount_date,
‘’ as payment from Invoice_2
UNION ALL
Select ven_id, ven_si_id, org_id,
‘’ asAmount,
‘’ as amount_date,
payment as payment from Invoice_3
) group by ven_id, ven_si_id, org_id
Then the problem was to update
the main master table with the newly created table. Since we had some millions
records in the master table, so again we had the same problem with update. So
again we did the left outer join between the Master table and newly created
table and inserted the resulted records in new table which had the same
structure as the master table and again it worked well and all the load
completed in just few seconds… JJ
So the bottom line is…never do
update on millions of records in relational database such as oracle or SQL
server. Instead, try to figure out how to perform that operation by insert
statement.
Create cursors or stored
procedure only if those operations can not be performed by normal PL/SQL
queries. By this way you can optimize the performance and reduce the overall
time by great extent.
Plzzzzzzz let me know for any new scenarios and views
Happy Analysis.......
Rohit...
2 Comments:
I actually enjoyed reading through this posting. Thanks for your posting
http://www.tekclasses.com/informatica-online-training-best-informatica-training-in-bangalore-informatica-training/
http://www.tekclasses.com/informatica-training-in-bangalore-informatica-training/
hi you have gathered a valuable information on Informatica...., I am looking for content like this and i am much impressed with the information and nice course content, thanks a lot for the Information regarding Informatica
http://www.tekclasses.com/informatica-online-training-best-informatica-training-in-bangalore-informatica-training/
http://www.tekclasses.com/informatica-training-in-bangalore-informatica-training/
Post a Comment
Subscribe to Post Comments [Atom]
<< Home