31 Jan Clean Data = Clean Migration
Organizations periodically face the need to replace obsolete enterprise systems as they reach end of life, face the loss of vendor support, or fail to keep up with evolving business requirements.
One important step in the replacement process is the mapping and migration of the obsolete system’s data to the new system. Ideally, this migration process should be a straightforward extract-transform-load (ETL) exercise where existing data is:
- Extracted from the old system
- Transformed to a format supported by the replacement system
- Loaded into the new system
ETL success is dependent on the accuracy, completeness, and standardization of the source system’s data. Unfortunately, data in long-lived enterprise systems often fails these standards due to a number of factors that can include:
- The number of administrators and data contributors (users) over time
- Multiple system ownership changes (e.g. owning department changes) with varying business needs and motivations
- Geographically dispersed operations
- Evolving business drivers and regulatory environments
- Acquisitions and mergers that contribute dissimilar data to the system
- Lack of consistent governance, configuration standards, and user training
These issues often result in a costly and time-consuming remediation process.
Metadata: Garbage In…Garbage Out
This client was looking to replace their obsolete physical records management system (RMS), implemented in the early 2000’s, which managed approximately 25M records. During the life of this system, multiple migrations took place, oftentimes without data validation. One recent migration involved a massive relocation of records from multiple locations to a centralized facility, and a substantial portion of the records’ metadata was not correctly updated to reflect the move. As a result, the system shows incorrect or invalid metadata.
In preparing for the migration of 25M records to the new RMS, the client identified an estimated 8.5M records they suspected had invalid or missing metadata that would prevent them from migrating. Issues included:
- Duplicate barcodes
- Missing or incorrect retention codes
- Missing review dates
- Invalid storage locations
- Litigation holds
Planning to remediate only one-third of the migrated records was unacceptable so this client allotted time and resources to find and fix these problem records.
Efforts to Clean the Data Before Migration
Rather than risk ETL failure or incur the expense of a prolonged post-ETL remediation, the client chose an alternative course of action – cleanse the data prior to ETL, ensuring a smooth transition into the replacement system.
This process is illustrated below.
Our client, a global energy company with operations in over 40 countries, trusted Access Sciences to cleanse their data prior to ETL. Our successful approach included:
- Parsing problem record sets according to issue, and assigning to a dedicated team
- Utilizing tools to manage the large data sets needed to identify and resolve issues across the issue response teams.
- Using a combination of resources and techniques to resolve issues:
- Information Management best practices
- Data gathering
- Calculating abilities of software
- Coffee and elbow grease
Our work processes revealed an additional 4M records requiring corrections, raising the overall total to 12.5M records. This represented a 47% increase in workload, which we were able to complete successfully on time and under budget.
New System Starts With Solid Records Foundation
After the data cleanse work, this client was confident to move forward with the migration to the new system. In six months, we were able to shepherd this company’s 25 million records from 50% migration ready, to 99.78% migration ready. The remaining 0.22% will require further research by the client.
The table below illustrates the level of effort required as a prerequisite to the ETL process.
Of the files under review, our team identified 1.4M records that were missing, destroyed, permanently withdrawn, or had invalid locations. These records were excluded from the ETL scope, eliminating the need to research the issues and make the necessary corrections. In addition to the prescribed scope of work, we:
- Identified and resolved 226k boxes that had been erroneously imported as discretionary records
- Recognized large data sets that should not be archived, even though their locations were incorrect, due to our knowledge gained from prior projects
- Identified a more accurate process for notifying record center operations staff of legal holds and hold releases
- Cleaned and consolidated existing legal matters within the old RMS application
- Corrected retention schedules and retention codes