2/22/22

Planning for Error Logging

The purpose of error logging is to log information about errors in a way which makes it easy to diagnose and troubleshoot issues when they arise. While error logging is necessary for all applications, it is indispensable when it comes to large distributed applications. 

In the past, I have written some error logging code but I had never worked on a extensive error logging requirement, that too, from scratch. Last month, as I started work on error logging for a large batch project, I quickly realized that a lack of planning and insight can lead to logs that are hard to track, analyze and troubleshoot and could have disastrous consequences on the outcome of the project. 

Here are some questions and thoughts to keep in mind while planning and coding for it:

  • Questions
    • Why do we want to log errors?
    • What do we want to do with the logs?
    • Are we going to just search and view them in a tool like Splunk?
    • Are we going to build dashboards on top of the logs? if yes, what types of statistics do we want to show on the dashboards?
    • what are the different types of errors that need to be logged to account for the various stats that we are interested in?
    • What information needs to be logged with each error?
    • If we are going to build dashboards, we need to be able to query the logs. So what format makes sense for querying?
  • Thoughts
    • Ensure that the format and structure of the data is concise and meaningful
    • Ensure that the messages and formats are consistent across the board
    • Try to use common libraries and classes to centralize the access to logging code for consistency and ease of maintenance.
    • Create specific exception classes for each scenario, so that we can track, analyze and troubleshoot errors faster.
    • Write extensive unit tests with excellent coverage including tests on specific error messages because it is very difficult to test all scenarios when code changes are made to the error logging code.
    • If you are dealing with microservices, make sure to log all information required to track errors that span several services. RequestId is an example of one such piece of information that can be used to connect a request across various services and help troubleshoot issues.


No comments: