So far my application can detect unique exceptions and categorize them appropriately. This is quite useful, but there are some issues:
• As with any heuristic algorithm, the current implementation is really naïve in its way of detecting and comparing exceptions. It does a decent job but may struggle even with really simple cases such as a File Not Found exception. If the exception is raised in different parts of your Java application it will produce completely different output, and essentially same type of exception will be logged multiple times. One might argue that this is expected behavior and you really need to know where the exception has been raised, and that would be a valid comment. In other situations you don't really care about these details and would like to combine all File Not Found error messages into one group. At present this is not possible.
• The naming convention is really confusing; all your exception groups are going to have unreadable names such as unrecognised_6c2dc65d7c0bfb0768ddff8cabaccf68.
• If the exception details contain time- or request-specific information, this algorithm is going to see those exceptions as different, because there is no way of knowing that "File Not Found: file1.txt" and "File Not Found: file2.txt" are effectively the same exception. To verify this behavior, I generated over a thousand exceptions in which the requested file name is the same and a similar number of error messages with unique filenames. The result of running the application against this sample log file was one group with over a thousand instances and over a thousand different groups with one or two instances in them. The reality is that all exceptions are of the same type.
• Although I am not comparing large pieces of text, calculating an MD5 hash and then comparing has strings is still relatively slow.
In light of those issues, I am going to modify the application so that it allows me to define how I want my exceptions detected and categorized.
As you already know, each exception is split into three parts: log line, header and stack trace body. I am going to allow to users define a regular expression for any of those fields and then use that regular expression to detect exceptions. If any of the defined regular expressions is a match, then the exception will be categorized accordingly; otherwise, it'll go for further processing by the heuristic algorithm I implemented earlier. I am also going to allow users define any grouping name that they like, so it will be more meaningful than the unrecognised_6c2dc65d7c0bfb0768ddff8cabaccf68 strings.
Was this article helpful?