Designing for expansion

To understand another design decision in the spider.py program, look at the url_in_site() method. It checks whether the link is part of the Web site.

Making url_in_site() a separate method doesn't make it easier to read the program, because the method has only one line of code. We decided to make a separate method for another reason: doing so makes it easier to subclass spider. Subclassing is making a new class that shares most of the features of a class but either overrides or extends some of its functionality. (Chapter 13 has more about subclassing.) One way to subclass spider is to extend the behavior of url_in_site() so that it works with both HTTP and HTTPS URLs at the same time. (The current program handles only one type at a time.) Having url_in_site() separate from process_page() means that such a subclass is easier to write. This illustrates another good programming practice.

REMEMBER When you're designing a program, think about the possibility that someone might want to customize or subclass your program in the future. (But don't overdesign or add features before they're needed. Python programs are easy to change, so you can quickly add features when necessary.) TECHNICAL The process_page() method combines the URLs returned from STUFF find_links() with the base URL specified by url. We could have put this functionality into the find_links() function. But leaving it out makes find_links() more generally useful—we (or others) can reuse it in programs that work with relative URLs in a different way from this one. This is an example of the design philosophy described in the preceding paragraph.

Was this article helpful?

0 0

Post a comment