In a typical
organization, most databases are built with data keyed in by different
people. For example, a CRM database will be updated by several sales
reps and each of these reps will have his/her own way of entering
data. It can be ‘Road’ for some, while others may prefer ‘Rd’.
Similarly, it can be Drive or Dr, and CA or California. Such variations slowly corrupt the sanity of
the data and can make it unsuitable for reliable analysis & reporting.
The situation is made all the more worse if the data is aggregated
from multiple data sources where each system follows its own
conventions and processes. To make such data uniform and
credible, periodic data cleansing becomes necessary. Only clean,
sanitized data can provide you reliable inferences when analyzed.…


