HNScrape

Some time last week, I decided that whether they were accepted or not, I was going to write up each of the talks I submitted to PyCon as a series of blog posts. As a first step towards one of my talks, "Enough Machine Learning to Make Hacker News Readable Again", you need to scrape articles off of Hacker News so you have a set of data to start doing machine learning with.

So I went ahead and cleaned up and published HNScrape, a library that will fetch the first two pages from Hacker News, and return a list of the articles as an array of objects. The README on github has more details.

Installing should be as simple as

pip install git+https://github.com/njl/hnscrape.git#egg=hnscrape

I should point something out; don't be a jerk! Don't run this more than once every couple of minutes. Respect the resource that is Hacker News.