These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. If you’re already an experienced programmer, you might learn about a new library or tool you haven’t tried yet.
If you are a complete novice and have no short-term plan to learn how to code, it may still be worth your time to find out about what it takes to gather data by scraping web sites
With the exception of Adobe Acrobat Pro, all of the tools we discuss in these guides are free and open-source.
Google Refine  (formerly known as Freebase Gridworks) – A sophisticated application that makes data cleaning a snap.
Firebug  – A Firefox plug-in that adds a host of useful development tools, including the tracking of parameters and files received from web sites you plan to scrape.
Ruby  – The programming language we use the most at ProPublica.
Nokogiri  – An essential Ruby library for scraping web pages.
Adobe Acrobat  – Can (sometimes) convert PDFs to well-structured HTML.