Orange County Chapter

Program, January 2005

Web Scraping Without a Browser

Monday, January 10, 7-9 pm

Kenytt has made available a copy of his slides.

NOTE: Due to circumstances beyond our control, the Smoothwall demonstration previously scheduled for this month has been postponed. Instead ...

At the January meeting of UUASC-OC, UUASC's own Kenytt Avery, who presented a popular tutorial on avoiding poor Perl scripting in October 2004, returns to show us a variety of techniques for retrieving web pages and extracting the information they contain without opening a browser.

Kenytt's alternative methodology ranges from writing simple shell scripts with wget to building HTML parse trees in Perl.

Kenytt will explain what happens behind the scenes when we click the "Submit" button, and along the way, he'll demonstrate a number of Perl modules, write a web proxy server or two, explore the Google API, dip into a bit of Python and RSS, and reveal why a text-mode browser may be our best friend. Examples will include saving documents offline, checking for updated pages, and grabbing popular news photos.

Kenytt taught computer science at CSUF until he realized that he was spending more time in front of a root prompt than in front of a white board. Since then, he has been a programmer in a variety of languages (most of them starting with the letter "P"), a UNIX sysadmin, a training manager, and a consultant. He is looking for a job, so if you like his presentation you should hire him.

Back to the UUASC Orange County overview