Initializing the spider

Down near the bottom of the program is this statement:

All the code after that statement runs only when you start the program from the command line.

TECHNICAL In Python programs, this code often contains testing instructions. If STUFF ours did, typing python spider.py at the command prompt would cause spider.py to test itself. In our program, the code processes a command-line argument. You can start the program from the command prompt and give it a URL to examine:

% python spider.py http://pythonfood.com/spider-test/

Between the top (the import statements) and the bottom (the command-line code) lies the meat of the module. The spider.py script contains these blocks of code:

• Three functions: log_stdout(), get_page(), and find_links()

If you look back down at the bottom, you notice the following line:

spider = Spider(startURL)

This line creates an instance of the Spider class. This means the Spider class is the heart of this module.

REMEMBER When trying to understand what a Python script does, it's best to start at the bottom and work your way up because classes and functions have to be created and named before they are called (before the program asks them to run). So in a Python program, any def statements (which create functions) and class statements (which create classes) always come before statements that actually run the code inside the functions and classes they define.

Was this article helpful?

0 0

Post a comment