In the data mining and statistics gathering area, it is difficult to come up with a single application that suits the requirements of multiple users. Let's take the analysis of Apache web server logs as an example. Each request that is received by the web server is written to a log file. There are several different data fields written in each log line, along with the timestamp when the request came in.
Let's imagine you've been asked to write an application that analyzes the log files and produces a report. This is the extent of a typical request that comes from the users who are interested in the statistical information. Obviously, there is not much you can do with this request, so you ask for more information, such as what exactly the users want to see on their report. Now the hypothetical users are getting more involved in the design phase, and they tell you that they want to see the total amount of downloads for a particular file. Well, that's easy to do. But then you get another request that asks for per-hour statistics of the site hits. You script that in. Then there's a request to correlate the time of the day with the browser type. And the list goes on and on. Even if you're writing the tools for one particular organization, the requirements are too diverse and impossible to capture at the requirement-gathering phase. So what should you do in this situation?
Wouldn't it be nice to have a generic application that could be extended with modules that specialize in extracting and processing the information? Each module would be responsible for performing the specific calculations and producing the reports. These modules could be added and removed as and when required, without affecting the functionality of other modules, and more important, without requiring any changes to the main application. This type of modular structure is often referred to as the plug-in architecture.
A plug-in is a small piece of software that extends the functionality of the main application. This technique is very popular and used in many different applications. A good example is the web browser. Most of the popular web browsers on the market support plug-ins. A web page may contain an embedded Adobe Flash movie, but the browser itself doesn't know (and doesn't need to know) how to handle this type of file. So it looks for a plug-in that has the capability to process and display the Adobe Flash file. If it finds such a plug-in, it passes the file object to it for processing. If it can't find a plug-in, the object is simply not displayed to the end user. The absence of the appropriate plug-in does not prevent the web page from being displayed.
We'll use this approach to build the application for analyzing Apache logs. Let's begin with the requirements for the particular statistical analysis tasks for the application.
Was this article helpful?