木鸟杂记

大规模数据系统

Python Logging Library Analysis

python-learn.pngpython-learn.png

Introduction

Before using Logging, let’s first sort out our common logging output requirements. As the saying goes, design without considering requirements is hooliganism.

  1. Be able to locate the origin (code file & line number) and generation time of an Event, for debugging and tracing.
  2. A single log can be simultaneously sent to multiple destination outputs.
  3. Log output can be filtered by different levels or finer-grained conditions.
  4. Third-party module log output can be conveniently controlled.
  5. While achieving all of the above, configuration/setup should be as simple as possible.

Python’s Logging module perfectly achieves the above five points through its magical modular design, organized in a tree structure.

Author: Woodpecker Miscellany https://www.qtmuniao.com, please indicate the source when reposting

Module Design

Python’s logger abstracts several subordinate meta-concepts—Logger, Handler, Filter, and Formatter—as the cornerstone of the logger implementation.

  1. Loggers provide the interfaces that application code can directly use.
  2. Handlers dispatch (logger-generated) log records to appropriate outputs.
  3. Filters provide fine-grained control over which logs are output.
  4. Formatters determine the final appearance of log records in the output.

Loggers

The Logger class serves as the basic module of the logging library and can do three small things:

  • It is the class called in user code for log output, exposing interfaces such as debug(), info(), etc.;
  • Based on the log level and filter class, it decides whether to create a log record (LogRecord) and pass the log output event when the user calls the class (logger.info(msg));
  • As a node in the logging module tree, it passes the event upward toward the root. All Handlers added to all logger nodes on the propagation path (this logger -> parent logger -> … -> root logger) will respond to the event. Of course, setting a logger’s propagate = False or having no parent logger can prevent event propagation.

Handlers

Handler is responsible for dispatching log messages that meet the level requirements to various target outputs specified by the handler. Logger instances can add zero or more Handlers via the addHandler method. Common Handler subclasses that can be added include StreamHandler, FileHandler, BaseRotatingHandler, etc.

Imagine a scenario: an application wants to output all level log information to a file, display error level and above logs on standard output, and send all severity level and above logs via email alerts. Then three handlers can be set up, each responsible for capturing messages of different levels and sending them to different target outputs.

The most commonly used methods of Handler:

  • setLevel() — Like the same-name method of logger, it sets the minimum effective level. The difference is that a logger’s behavior is to send log messages above this level to all its handlers, while a handler’s behavior is to output log messages above this level to the corresponding target output.
  • setFormatter() — Selects the layout and format of this handler’s output.
  • addFilter() and removeFilter — Add or remove filtering rules to determine whether a certain log message should be filtered out.

Applications should not directly use the Handler class; this is just a base class that defines some interfaces and default behaviors.

Formatters

Unlike Handler, user programs can directly instantiate and use the logging.Formatter class:

1
logging.Formatter.__init__(fmt=None, datefmt=None, style='%')

Where fmt specifies the message layout and format, datefmt specifies the date display format, and style specifies the variable parsing method in the fmt string.

A few points worth mentioning:

  • fmt can not only specify the format; when outputting to a terminal, ColoredFormatter can be used to set a color scheme, making command-line output look cool.
  • style is one of %, '{' or '$', with the default being %. However, Python 3 recommends using the str.format style for string formatting, so you can change it to style = '{'.

Filters

Fine-grained log filtering control won’t be covered here; advanced players can refer to the official documentation themselves.

Tree Organization

The Logging library leverages the naturally existing tree-like hierarchical structure among Python Modules to build a logging system with Loggers as tree nodes. This is manifested in:

  1. Loggers and module_name have a one-to-one correspondence.
  2. A logger’s “effective level” is determined by traversing upward in the tree.
  3. Loggers pass events (writing a certain log) upward along the tree structure, and all logger nodes containing handlers will respond.

In addition, the logging module has a built-in root node, which provides a default implementation for all module log output behaviors. It can be obtained via root_logger = Logging.getLogger(), i.e., module_name = None. This root_logger has a default effective log level of WARNING, and has a default handler added, which is bound to a simple formatter.

That is, users don’t need any configuration; as long as they obtain the root_logger via root_logger = Logging.getLogger(), they can perform log output. Users with additional needs can then finely configure the output location and layout format for different modules through the aforementioned mechanisms. This out-of-the-box + incremental fine-tuning design is a common technique in program frameworks, which can simultaneously cater to the different needs of beginners and advanced players.

One-to-One Correspondence

Each logger is a singleton corresponding one-to-one with a Python module_name. That is, in the same process, the smap_logger obtained from different locations via:

1
smap_logger = logging.getLogger()

is the same instance.

Effective Level

Loggers have the concept of an effective level. When a logger has not been explicitly set to a log level (setLevel), it will use its parent logger’s log level. If its parent logger’s log level has also not been explicitly set, it will continue to look at its grandparent node. This repeats until a logger node with an explicitly set effective level is found. Of course, no matter what, this search process will end at the root node, which has a default level of WARNING.

When a logger receives a log output event, it determines whether the event will be dispatched to all handlers added to that logger based on its effective level.

Event Propagation

Child loggers pass log events along the path in the tree toward their ancestors. Therefore, it is not necessary to bind handlers to all loggers; just adding handlers to a few appropriate key top-level nodes can cause all log nodes on their child branches to perform log output according to that handler’s behavior.

Of course, you can also prevent a node from propagating log events upward by setting that logger’s propagate attribute to False.

The log output flow can be illustrated with the following diagram:

Logging FlowLogging Flow

Practice

An Example

A most basic example provided by the official documentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import logging

# create logger
logger = logging.getLogger('simple_example')
logger.setLevel(logging.DEBUG)

# create console handler and set level to debug
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)

# create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# add formatter to ch
ch.setFormatter(formatter)

# add ch to logger
logger.addHandler(ch)

# 'application' code
logger.debug('debug message')
logger.info('info message')
logger.warning('warn message')
logger.error('error message')
logger.critical('critical message')

In addition, like Java logging frameworks such as log4j, these configurations can be done via configuration files. Including basic format and YAML format. The official documentation covers all of this.

My Own Example

When I was writing the smap (slice-map, a computing framework that shards datasets and distributes them to multiple machines for map execution) Python library, I wanted to implement a feature that would allow users to dynamically set the library’s log level when using it. At the time, I went to great lengths to pass the log level as a parameter into the framework, and then set the module’s logger to that effective level before each use.

If I had known earlier about the logging library’s module-level singleton and tree organization design, then basically nothing would need to be done to implement this feature; I would just need to tell users that I am using the logging library. Users could then simply do:

1
2
3
4
import logging

smap_logger = logging.getLogger()
smap_logger.setLevel(logging.INFO)

And that’s it.

Notes

Tree: A commonly used data structure in computers, used to simulate a data set with tree-like structural properties. It looks like an upside-down tree with the root facing up and the leaves facing down. See Wikipedia for details.

Parent Node, Ancestor Node」: Both are concepts within the scope of the tree data structure.

Log Event: Generated by API calls such as logger.info(), sent to the logging system, allowing the system to decide how to respond to the event and output it at an appropriate location with an appropriate layout and format. Other terms such as Message and LogRecord essentially mean similar things.

Destination Output: Various output destinations, such as standard output (stdout), file (File), database (db), email (email), centralized log service (log service), etc.

References

  1. Logging facility for Python, most of this article is derived from this, just translated and reorganized.

我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg