Data cleansing

Issues:

Data cleansing helps you control and improve the quality of your data by comparing it to the integrity and management rules of its system or of the system you wish to migrate to.

Data quality is often overestimated and can be altered for many ―sometimes very good― reasons:

  • Obsolescence of the system:
    • Lack of control over certain standardized structures, for instance postal addresses that are not in accordance with post office regulations and require to be standardized
    • No referential integrity of the data model
    • Lack of application controls
  • Duplicates created unintentionally or deliberately: Duplicates are quite common for natural persons or legal entities. It is often unintentional / users often create them unintentionally but it can be deliberate because it is a way to make up for the failings of an application
  • Misuse of certain structures by users to manage new information
  • Data discrepancy due to application bugs corrected late
  • Information missing, forcing information, bypassing controls, etc.

Poor data quality eventually leads to significant costs.

Direct costs:

  • High postal fares due to poor address quality, or multiple duplicate mailing
  • Application crash
  • Approximate or even false statistics
  • Impossibility of consolidating information which is sometimes regulatory
  • Data reliability is necessary when creating a new application or system
  • Etc.

Indirect costs:

  • Loss of image
  • Loss of productivity
  • Etc.

Our Offer:

A data reliability project can be launched  independently or as part of the migration to a new system.

In the first case, you should use the business and integrity rules of the system you are running the data on to check the quality of the data. In order to be efficient, we advise you to integrate the means of control you developed into a recurrent data quality measurement process.

If you migrate to a new system, you should control (source) data with the integrity rules of the target system and start the data reliability project as soon as possible. Data reliability is critical and you may have to perform some long operations yourself, which can heavily affect the global planning of the project.

In any case, we prefer using automatic data reliability to make operations cost-effective.

Our Tools:

Thanks to our tools, we can automate many operations. Our ‘Recode’ system analysis tools can generate control modules from:

  • the physical data model
  • programs
  • real data
  • use cases

You will be able to use them on a regular basis.

The reports include general indicators that make it possible to measure the progress of the reliability improvement project as well as the status of the work, including the reasons for rejections classified by department and frequency of occurrence.

Details on malfunctions or incidents are listed and enriched with the data concerning the case to help you find it in source and target applications.

Our workshops can almost completely automate processes and provide you with quickly available results.

Applications:

The controls can be classified into 3 types:

Format controls: Verification of the conformity of a data with its type (Date, numeric, List of values, …) and verification of the authorized range of values. It is necessary to take into account the sentinel values ​​(example: date in the year 9999) in order not to produce false anomalies.

Integrity Checks: Verification of the cardinalities of the CDM. Example: Check that there are no invoices without the corresponding customer.

Application controls: Data verification in accordance with application management rules. Examples: control of overlapping or holes in periods, Verification of a calculated key (RIB key, Key N ° SS), postal code control / INSEE common code, …

In order to allow cross-checks with other applications (Accounting, CRM, etc.) counts are also carried out to add up the number of occurrences of a functional case (Example: Number of customers) or cumulative values ​​(Ex Total by customer and overall total).

Environments:

We work in the following IT environments:

OS: MVS, DOS VSE, VM, GCOS 7, GCOS 8, VMS, ICL, UNIX, AS400, WINDOWS, HP3000…

DBMS: DB2, ORACLE, SYBASE, SQLserver, SQL, INFORMIX, DL1, IDMS, IDS2, TOTAL, ADABAS, DATACOM, IMAGE…

We have strong functional skills and many references in the following areas: banking, insurance, pension systems, life insurance, mass retail, human resources…

Business Cases

Business Cases

Logo Generali (Carré)
GENERALI
GENERALI - Mainframe migration from z/OS-BD2 to Unix-Oracle
Learn more
Logo de SAB (Version petit rectangle)
SAB
Banking software publisher - Toolmaker partnership
Learn more
MACIF
IBM Mutual Migration to Cegedim Activ Infinite
Learn more
logo-direct-energie
DIRECT ENERGIE
Data Migration to SAP
Learn more
INSURANCE BROKER
Data Audit and Cleansing
Learn more
Logo Banque Populaire (Rectangle)
Banque Populaire
Migration and Mergers Service Center
Learn more
ANFH
Migration and Deduplication of the Agent repository
Learn more
Logo AGIRC ARRCO (Rectangle)
AGIRC ARRCO
Data loader and Migration Kits
Learn more
Logo AG2R La Mondiale (Rectangle)
AG2R
Data Loader Supplemental Pension
Learn more