Ever since I started working with data, I had this feeling that I am really more of a janitress of dirty data than an engineer. Really! So much time is devoted to cleaning data than making awesome visualizations out of it. And there are actually two modules in my online education that attest to the fact that I am meant to help sweep the digital debris into more sound data analysis.
There is so little use to deriving and making algorithms pop on your PC if your input data is garbage or cluttered.
My most recent “Data Sayangtist to Data Scientist” project’s exploit is the Data Cleaning course at Coursera.
A week before the new year kicked in, I had a non-technical primer from Data Journalism module on dealing with messy data for making compelling stories:
Apparently there are so many people harnessing the ease of using the Internet and a few souls are committed to keeping it clean and tidy on the backend or at least on the analysis end. 🙂 It’s a good place to work on because few people are willing to do it. That’s always my thing; I like going to places that nobody wants, work-wise. I took a course not familiar to many people my age. I engaged in projects that few people think as a suicide mission. And I make unconventional choices. It has never failed me. Being thrown on the deep end of the learning curve is high stress, but the returns are fantabulous!
On the practical side, I tried working on a disaster management information management system project and most of my time was spent scripting primitively encoded Excel sheets into database-friendly, geocoded csv format. I also did some work on exploring or researching standardization options in government datasets and BOY, this continues to consume much of my time this year. It will continue to eat my life as we speak. Yep, janitress life, hello hello!
Cleaning is not fun in itself but the possibilities that happen after you clean the data are reason to keep myself motivated. 🙂
And there is so much data worth cleaning online and offline. My wish list is really bordering on storing huge amounts of analyzable data than expensive objects or frequent out of town trips. I am actually content staying behind my computer and studying these things 80-90% of my time this year. Of course, having a vacation occasionally won’t hurt. 😉 That’s why I am also cooking some travel things up to balance my innate introverted nature compounded by my choice of work.
Data cleanliness is really next to data analysis godliness! 😀