Introduction to cleaning data with R

Introduction to cleaning data with R: Cleaning data involves transforming raw data into consistent, easy-to-understand data. Data-driven statistical statements are filtered based on content and reliability based on the data. Moreover, it improves your data quality and overall productivity by influencing statistical statements based on the data.

Various steps are involved in this process, from the initial raw data to consistent and highly efficient data that can be implemented as per requirements and produce highly precise and accurate statistical results. Since the steps vary from data to data, the user should know which date he/she is using. Depending on the data used by the user for analysis, there are a number of characteristics and symptoms of messy data.

Introduction to cleaning data with R
Introduction to cleaning data with R

Characteristics of messy data:

  •   Special characters (e.g. commas in numeric values)
  •   Numeric values stored as text/character data types
  •   Duplicate rows
  •   Misspellings
  •   Inaccuracies
  •   White space
  •   Missing data
  •   Zeros instead of null values vary.

Notes to the reader
This tutorial is aimed at users who have some R programming experience. That is, the reader is expected to be familiar with concepts such as variable assignment, vector, list, and data.frame, writing simple loops, and perhaps writing simple functions. More complicated constructs, when used, will be explained in the text.

Comments are closed.