Dealing with Messy Data

Organizer: Hyunwoo Park (Ohio State University)

Empirical researchers increasingly rely on curating a unique dataset by using a variety of secondary data sources. One of the critical prerequisites to any such empirical research is to clean up data sources. This workshop will explain general data management strategies, introduce typical situations where messy data arises, and provide practical tutorials on dealing with messy data. By the end of this workshop, you will be able to (1) identify when and why messy data arises, (2) understand why cleaning data is essential for analysis, (3) implement strategies that will help you with cleaning data using text mining tools and cloud-based services.