木鸟杂记

大规模数据系统

The "Pitfalls" of Python Function Default Parameters

python-default-parameter.pngpython-default-parameter.png

Introduction

After falling into the “pit” of Python’s default parameters several times, I decided to write a dedicated blog post about it. But recently I came across a great English article (Default Parameter Values in Python, Fredrik Lundh | July 17, 2008 | based on a comp.lang.python post), which is incisive and to the point. Since a gem already exists, there’s no need to show off my own writing. Of course, this is also a bit of laziness — here is a simple translation, hoping more people can see it.

The following is a translation, somewhat free, with some personal additions, not strictly consistent with the original text. Grammatical features are based on Python3.

Author: Muniao’s Notes https://www.qtmuniao.com, please indicate the source when reposting

Main Text

The way Python handles default parameter values is one of the few issues that can trip up most beginners (though usually only once).

Python’s perplexing behavior often occurs because you used a “mutable” object as a function’s default parameter. That is, an object that can be changed in place, such as a list or dictionary.

An example:

1
2
3
4
5
6
7
8
9
10
>>> def function(data=[]):
... data.append(1)
... return data
...
>>> function()
[1]
>>> function()
[1, 1]
>>> function()
[1, 1, 1]

As shown in the code, the returned list gets longer and longer, instead of being [1] every time as one might imagine. Try checking the ID of the returned list each time, and you’ll find it hasn’t changed at all.

1
2
3
4
5
6
>>> id(function())
12516768
>>> id(function())
12516768
>>> id(function())
12516768

The reason is simple: the function() function has been using the same list object across different function calls. Our modification (data.append(1)) became a sticky operation.

Why Does This Happen

The answer is: default parameter statements are always evaluated when the function is defined with the def keyword, and only executed once. You can refer to the relevant chapter in The Python Language Reference:

https://docs.python.org/zh-cn/3.7/reference/compound_stmts.html#function-definitions

Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once when the function is defined, and the same “precomputed” value is used for each call.

Note that the function signature starting with the def keyword is an executable statement in Python, and default parameters are evaluated in the def expression. If you execute the def expression multiple times, Python will create a new function object for you each time (and the default parameters will naturally be re-evaluated). We will see this in the following examples.

So What Should We Do

A temporary workaround, as others have also mentioned: use a meaningless value as the default parameter only as a placeholder, rather than directly modifying the default parameter every time. None is such a commonly used placeholder:

1
2
3
4
def myfunc(value=None):
if value is None:
value = []
# modify value here

If you need to handle arbitrary types of data (including None), you can use a sentinel instance:

1
2
3
4
5
6
sentinel = object()

def myfunc(value=sentinel):
if value is sentinel:
value = expression
# use/modify value here

Of course, in some old code, before object was introduced into Python, the following statement was also commonly used to create a unique instance with a non-false value:

1
sentinel = ['placeholder']

Because [] creates a new instance every time it is executed.

Proper Ways to Leverage This

It’s worth mentioning that some advanced Python code often deliberately takes advantage of this feature. For example, if you want to create a bunch of buttons through a loop, you might do this:

1
2
3
4
for i in range(10):
def callback():
print "clicked button", i
UI.Button("button %s" % i, callback)

But unfortunately discover that all callback functions print the same value (in the above example, most likely 9). The reason is that in Python’s inner nested scope, it binds to the outer variable itself, not its value. Therefore all callback functions will see the final value of variable i. This problem can be solved by explicitly passing the parameter when the inner function is called.

1
2
3
4
for i in range(10):
def callback(i=i):
print "clicked button", i
UI.Button("button %s" % i, callback)

The i=i statement takes advantage of the fact that the def statement rebinds every time it is executed, binding the current value of the outer i to the local variable (i.e., the formal parameter) i.

There are two other possible uses. One is result caching/memoization:

1
2
3
4
5
6
7
def calculate(a, b, c, memo={}):
try:
value = memo[a, b, c] # return already calculated value
except KeyError:
value = heavy_calculation(a, b, c)
memo[a, b, c] = value # update the memo dictionary
return value

This usage is very useful in certain recursive functions (such as memoized search).

Second, for code that needs high optimization, you can bind global variables to local ones to optimize performance:

1
2
3
4
import math

def this_one_must_be_fast(x, sin=math.sin, cos=math.cos):
...

A Detailed Explanation of the Principle

When Python executes a def expression (i.e., a function definition), it uses some existing environment fragments (such as the compiled function body code, corresponding to __code__; the current namespace environment, corresponding to __globals__) to construct a new function object. When Python does this, it also evaluates the default parameters and stores them as an attribute in the function object.

Of course, these environments can all be accessed through the function object’s attributes:

1
2
3
4
5
6
7
8
9
10
>>> function.__name__
'function'
>>> function.__code__
<code object function at 00BEC770, file "<stdin>", line 1>
>>> function.__defaults__
([1, 1, 1],)
>>> function.__globals__
{'function': <function function at 0x00BF1C30>,
'__builtins__': <module '__builtin__' (built-in)>,
'__name__': '__main__', '__doc__': None}

Since you can access the default values, you can of course modify them:

1
2
3
4
5
>>> function.__defaults__[0][:] = []
>>> function()
[1]
>>> function.__defaults__
([1],)

However, you’d better not do this (modifying things you don’t understand, such as private variables or system variables, will lead to some magical consequences).

Another way to reset the default parameters is to re-execute the same def function definition statement, that is, execute the function definition again. When you do this, Python will re-create a code object for the compiled function body, re-evaluate the default parameters, and then bind the function object to the name function once again. However, to emphasize again, only do this when you clearly know what consequences a certain way of writing will produce.

Of course, you can also define your own function objects through the function class in the new module (though in Python3, the new module has been deprecated)

Summary

The root of everything is that Python is a dynamic language. When it defines a function, it also performs a binding from a name to a function object, just like defining an ordinary variable. And it only executes the assignment statement in the function header at the time of binding, and saves the parameters as part of the function object (i.e., its attributes). Afterwards, when calling the function through that name, it only executes the statements in the function body (the code fragment pointed to by __code__).

In static languages where functions are not first-class citizens, function definitions are done at the compilation stage and cannot be repeatedly bound multiple times at runtime. During each function call, formal parameters and actual parameters are combined once, and default parameters are reassigned.


我是青藤木鸟,一个喜欢摄影、专注大规模数据系统的程序员,欢迎关注我的公众号:“木鸟杂记”,有更多的分布式系统、存储和数据库相关的文章,欢迎关注。 关注公众号后,回复“资料”可以获取我总结一份分布式数据库学习资料。 回复“优惠券”可以获取我的大规模数据系统付费专栏《系统日知录》的八折优惠券。

我们还有相关的分布式系统和数据库的群,可以添加我的微信号:qtmuniao,我拉你入群。加我时记得备注:“分布式系统群”。 另外,如果你不想加群,还有一个分布式系统和数据库的论坛(点这里),欢迎来玩耍。

wx-distributed-system-s.jpg