In abstract language the algorithm for this function would look like the following:
• If the second line does not match the date stamp pattern, add it to the result string.
• Keep reading lines in and appending until the date stamp pattern is matched.
• Repeat until there is no more data in the file.
As you can see in Listing 7-6, using a generator function here is an obvious choice, because I need to preserve the internal function state after the function returns the resulting string that contains a potential exception stack trace. The function itself accepts another generator function, which it uses to retrieve the lines of text. Using this approach it is possible to replace a file-reading generator with any line> g.next()
other generator that is capable of generating log lines. For example, this might be a database-reading function, or even a function that listens and accepts syslog service messages.
Listing 7-6. A generator function to detect potential exceptions def get_suspect(g): line = g.next() next_line = g.next() while 1:
if not (TS_RE_1.search(next_line) or TS_RE_2.search(next_line)): suspect_body = line while not (TS_RE_1.search(next_line) or TS_RE_2.search(next_line)): suspect_body += next_line next_line = g.next() yield suspect_body else: try:
Obviously this can be replaced with a function that has more advanced logic and a better hit-to-miss ratio, but it is equally effective and lightweight.
Here are a couple of ideas you might want to experiment with:
• Instead of using two predefined patterns for time stamp detection, try defining a list with precompiled patterns that would match the majority of popular formats. Then, as the function runs, it would count successful matches and rearrange the list on the fly so that most popular match gets match first.
• If you have a large number of multiple-line log entries, this simple approach will fail. Try generating hashes of the first line in the log body and store them in a separate data structure. The real exception validator function would update this table with True/False values depending on whether the guess was correct. This function can then check hashes against this table, so it will know which repeating log entries are not really exceptions although they may look like ones.
Was this article helpful?