@thedhmn : Data to the People! Google Refine and @petewarden ‘s Data Science Toolkit.
Back in Nov, super tech-evangelist Bob Waldron shared a news story on Google Refine with us. Today, my pal Mark introduced me to an open source product called Data Science Toolkit.
Seems it was successfully used to scrape data from 220 million Facebook user profiles. After he had this massive amount of data, he spent $100 for 10 hours of Amazon services to crunch it down to a usable database for analysis reporting.
Pete Warden, creator, released it under GPL.
You can download it, or use their webservice, which includes APIs like Street Address to Coordinates, File to Text, IP Address to Coordinates, HTML to Text and HTML to Story (extracting just the story from an HTML doc).
What couldn’t we learn, between the power of apps like Google Refine and Data Science Toolkit, given the public data of the internet?
Data to the People!