Saturday, November 12, 2016

Go Log Centralization/Pipeline

Go concurrency is silly.  It's so effective, and easy to use, that programs can quickly become a jumble of go routines spawning go routines, and channels being distributed throughout  the application.   I feel like static typing, combined with the go tool chain, provides a strong degree of confidence that overly concurrent code compiles, runs and is reasonably correct.

Go log ecosystem looks to be pretty fractured.  The standard library `log` package is very different than what I would expect a standard library to provide, as I'm coming from a more "traditional" python/java logging ecosystem, where log levels, handler, formatters are the norms.  There are some go logging packages which have adopted the same concepts.  The issue is I had already begun development on an MVP and had standard library logging functions in the code, when I realized that file logging or syslog, or multiple handlers based on configuration options, would be helpful.

If I chose to use a logging package in go it would require:

- Passing around a log.Logger instance (which is threadsafe)
- Log specific formatting / text representation becomes the responsibility of the file, and is a tight coupling.

Because of go's asynchronous nature, log often contain important information that can help test validate operations/transitions are occurring correctly.  If a test would like to make assertions on logger, using a "traditional" logging library, a logger interface would have to be created, and a mock version would have to be provided by the test.  Go promotes many concurrently executing tasks which makes code more prone to be non-deterministic because of the asynchronous nature.   If state changes to an object were logged, and a test wanted to make sure certain asynchronous state changes occurred in response to an input, then the test would have to provide a mock logger, which recorded all calls to it, and make assertions on based on that.

A pattern I found, which addresses the above issues, is having each object submit any log-worthy events to an event channel.  For my first event, there is a single global even channel that is initialized at the main level.  It is then passed to all lower functions.  Each object writes to it:

With centralizing logging (or even per package logging) log representation is easily configurable, ie: json, syslog, stdout/err, file, etc.  Instead of the log representation being defined through the log library api (perhaps as a handler), the application gets to control that logic.  Having a centralized application based logger also allows for application level logging.  This could be to do something like write each event of a certain type to a specific file.

- Logging is decoupled from the application, corner cases, and unforeseen errors could create conditions where events leading up to an application crash are not fully represented in the log.
- Blocking channel is used by default, if there are multiple publishers to the channel, a buffered channel should probably be used.  It may be difficult to reason about what the buffer size should be since one channel may be shared across an application.

Everything proposed above is easily achievable using more traditional logging libraries (python, java) and even some go 3rd party logging libraries, like logrus.  But having a centralized event stream is a lightweight alternative to using a 3rd party library, and allows for easier testing.