Data Cleansing

Data Cleansing issue

Data Cleansing is the process of checking and improving the quality of data in an information system, in relation to its integrity and management rules, or those of a system to which you wish to migrate.

The quality of a system’s data is often overestimated, and there are many, sometimes legitimate, reasons for compromising it:

  1. System age:
    • Lack of control over certain “standardized” fields. Example: postal addresses, which often do not comply with the postal reference system and require an “address standardization” project
    • Lack of referential integrity in the data model
    • Lack of application controls
  2. unintended or deliberate duplicates: The presence of duplicates is frequent, particularly in the case of natural or legal persons. It is often accidental, but can also be deliberate by users, who have managed to compensate for functional deficiencies in the application.
  3. Hijacking of certain zones by users to manage new information
  4. Data anomalies caused by late-corrected application bugs
  5. Incomplete information, forcing of information, bypassing of controls…

Data cleansing is often an indispensable operation, as the costs generated by a lack of data quality are far from negligible:

Direct costs:

  • Legal reporting obligations,
  • fuzzy or even false statistics
  • Impossibility of consolidating information that may be regulatory.
  • Application crashes
  • Reliability imperative when implementing a new application or system
  • Higher postal rates due to poor quality addresses, or multiple mailings with duplicates
  • Etc.

Indirect costs:

  • Loss of image
  • Loss of productivity
  • Etc.

Our Data Cleansing offer

A Data Cleansing project to make data more reliable can be launched independently or as part of a migration to a new system.

In the first case, data quality needs to be checked according to the business and integrity rules of the system on which the data is used. To be effective, it is advisable to integrate the means of control developed into a recurring data quality measurement process.

In the case of migration to a new system, it is preferable to check the data (sources) against the integrity rules of the new system (target), and to set up as soon as possible the reliability project, which is on the critical path and may involve considerable manual effort, with a strong impact on the overall project schedule.

In all cases, automatic data cleansing is preferred to reduce the cost of these operations.

Our data cleansing tools

Our highly-equipped approach to these operations enables us to automate a large number of the operations required for data cleansing.

Our Recode system analysis tools use:

  • physical data model,
  • programs,
  • real data,
  • use cases,

…. to generate control modules that can be run on a regular basis.

Outputs include general dashboards to measure the progress of the reliability project, as well as business reports listing the reasons for rejection, classified by service and frequency of occurrence.

Anomaly detail lists are enriched with the functional identification of the file, enabling the user to find it in the Source and Target applications.

Our workshops enable advanced automation of this work, with results rapidly available.

Data cleansing process

Use cases

A Data Cleansing project involves a number of checks, which can be divided into 3 categories:

  • Format checks: Checking that data conforms to its type (date, numeric, value list, etc.), but also checking the permitted range of values. It is necessary to take sentinel values into account (e.g. date to year 9999) to avoid producing false anomalies.
  • Integrity checks: Verification of MCD cardinalities. Example: Check that there are no invoices without the corresponding customer.
  • Application checks: Verification of data in accordance with application management rules. Examples: checking for overlaps or gaps in periods, checking a calculated key (RIB key, SS No. key), checking postal code / INSEE commune code, etc.

To enable cross-checks with other applications (Accounting, CRM, etc.), we also perform counts to totalize the number of occurrences of a functional case (Example: Number of customers) or accumulate values (e.g. Total per customer and grand total).

Technical environments

Our technology enables us to operate in all technical environments.

Use cases

Background The SNCF’s purchasing department managed its operations on an MVS, COBOL, DB2 platform with an outdated client-server Easel front. The

The Project was based on the development of a suite of tools dedicated to migrate the SiPo application from a

Archiving of data and documents from AG2R Group’s Information System Applications intended to be decommissioned Background For many years, the AG2R La

Scroll to Top