What is meant by Data cleansing?

The term "Data Cleaning" refers to the process of identifying and correcting errors or inconsistencies in a dataset to improve data quality. The goal of data cleaning is to ensure that the data is accurate, consistent, and complete, enabling reliable analysis and informed decision-making.

Typical software functions in the area of "Data Cleaning":

Error Detection: Identifying faulty, incomplete, or inconsistent data.
Duplicate Detection: Finding and merging duplicate records to avoid redundancy.
Data Validation: Checking data against predefined rules or standards, such as format checks or plausibility controls.
Error Correction: Automatically or manually fixing errors, such as incorrect values or formatting issues.
Data Normalization: Standardizing data formats and values, such as converting to uniform units or formats.
Data Completion: Filling in missing information through data enrichment or other sources.
Consistency Checking: Ensuring that data is consistent across different datasets, such as matching reference data.
Batch Data Cleaning: Performing cleaning processes on large volumes of data through automated batch processing.

Examples of "Data Cleaning":

Removing Duplicate Entries: Merging records that represent the same entity to avoid redundancy.
Correcting Typos: Fixing spelling errors in text fields, such as names or addresses.
Standardizing Address Formats: Aligning addresses to a uniform format, such as postal codes or street names.
Validating Email Addresses: Checking if email addresses are valid and correctly formatted.
Completing Missing Values: Filling in missing values with plausible assumptions or data enrichment.
Normalizing Product Categories: Standardizing product categories and labels to ensure consistency in the data.