17 March 2021

Thoughts on Python logging

How to use logging in your Python project without being annoyed by libraries that do logging

During development, or when dealing with complex software, you often put logger.debug() statements in your code.  (This assumes you've run logger = logging.getLogger("blah") or equivalent.)  This method allows you to see what's going on deep in your program to discover what isn't working as expected.  Unlike print statements, it has the advantage of being able to change the log level on the fly, change formatting and even use log files easily; the Logging HOWTO talks about this.

However, this approach has a problem, and it has to do with libraries.  One of the great strengths of Python is that it's very easy to install external libraries (either with pip or your OS's package manager) and add them to your program.  However, library developers need to debug their code too and that's where you can run into conflicts with logging.

For example when developing a program that uses the boto3 library to call the AWS API, this is one of the many messages you'll see if you run logging.basicConfig(level=logging.DEBUG) so you can see your code's own debug messages:

DEBUG:botocore.loaders:Loading JSON file: /usr/lib/python3/dist-packages/boto3/data/ec2/2016-11-15/resources-1.json

There are also stacks of messages from modules such as botocore.endpoint and urllib3.connectionpool including all HTTP requests and responses.  This is a potential security risk as well as being annoying.

Explainer -- numeric log levels

Each of the level constants and the corresponding convenience functions from the logging module represent a numeric value.  This is similar in concept to syslog priorities from Unix.


LevelFunction
Value

DEBUG
10

INFO
20

WARNING
30

ERROR
40

CRITICAL
50

If the current logging threshold (default WARNING) is numerically greater than that of the function you called, the message will get ignored; if it's less than or equal, the message will be processed.  Note that the functions are available in the logging module as well as any logger object you create.

Alternative approach -- module-based threshold tweaking

This approach is a bit finicky; it involves bumping up the logging threshold of various libraries so that their own debug messages get filtered out.  This takes advantage of the Python logging module's hierarchical nature, which is that submodules inherit their parent modules' logging threshold.  (It actually works via the name of the logger so technically it's a separate string-based hierarchy with levels separated by dots.)

For example:

_bl = logging.getLogger("botocore")
_bl.setLevel(logging.DEBUG + 1)

You can think of this as raising a wall around this module (and its submodules) slightly higher than the wall that represents the global logging threshold, so debug messages can't get out.  Other modules can still emit debug messages if the global logging threshold is set to logging.DEBUG .  All info messages will get out regardless, because their numeric value is higher than logging.DEBUG + 1 .

However, this approach falls down when you find yourself playing whack-a-mole with all the different module hierarchies of all the libraries your project uses.

Smart logging

Instead of using logger.debug() you can use logger.log(level, msg) .  You would normally pass logging.DEBUG as the level for example, because a given log message's level shouldn't vary.  (Hence why you probably use the convenience functions instead.)  In the event of an error condition, for example, you'd just use conditional logic in your code emit a different message with a different level.

The essence of this approach is that once you start using logger.log()you can define your own log level constants and use these instead of the ones in the logging module.  For example, in the root of your package, you could create a file called logging as follows:

import logging

DEBUG = logging.DEBUG + 1
INFO = logging.INFO + 1
NOTICE = logging.INFO + 5
# etc.

Then your code can first use import xyz.logging (assuming your package is called xyz) and then use statements like logger.log(xyz.logging.DEBUG, msg) .  This way your messages' level will be above the global logging threshold and so will be emitted if you have run logging.basicConfig(level=xyz.logging.DEBUG), unlike debug level messages from libraries.  Personally, I think replacing all logger.debug() statements in your code is less of a pain than playing setLevel whack-a-mole with libaries.

This also works at different metaphorical levels (regarding intent rather than numerical value).  For example, boto3.resources emits INFO level messages describing each API request.  This is a bit of a pain because it essentially includes the JSON of the request body.  But if you use xyz.logging.INFO for both your info messages and the global logging threshold, these somewhat annoying Boto messages get filtered out.  If you set the global logging threshold to xyz.logging.DEBUG instead, Boto's INFO level messages implicitly get treated more like debug messages and its annoying debug messages get filtered out.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home