Over the past few years we’ve witnessed a profusion of open data made available for citizen to explore and take action on. However, the tools available for end users haven’t fully caught up. Spreadsheets offer entry level interface to the data but are time consuming and don’t scale, while languages like R or ipython offer flexibility but have a steep learning curve for the non technical person.
Domain experts and citizen need powerful yet easy-to-use interfaces to explore new data sets, normalize them and process them via innovative services often available via an API only. OpenRefine offers the best of both worlds with a self service agile and iterative interface for data discovery and preparation and an easy-to-learn scripting language.
OpenRefine (formerly Freebase Gridworks and Google Refine) is a five year old project that has gained traction with various domain experts including librarians and researchers, data journalist, open data enthusiasts, and semantic web professionals.
During this session we will grab some data from Toronto Open Data portal and start blending and refining them.
Key Take Aways
- Get starting with OpenRefine interface
- Explore and gain insight from open data
- Clean duplicate and typo
- Reformat and normalize your data
- Process data against API without a line of code