About   cv   Etc   Now   Zettelkästen  
Initializing nested lists correctly

Initializing nested lists correctly

This post is part of the programming series.

If you want to initialize a list with a certain size in Python, you can use the following clean syntax:

>>> arr = [None]*3
>>> arr
[None, None, None]

We can then fill the list with elements:

>>> arr[1] = 1
>>> arr
[None, 1, None]

But watch what happens when we try to use the same syntax for declaring a 2D dynamic array:

>>> arr2 = [[]]*3
>>> arr2
[[], [], []]
>>> arr2[1].append(1)
>>> arr2
[[1], [1], [1]]

The desired result here was the ragged array [[], [1], []], but instead we accidentally filled all nested lists… What happened?! Well, we observe here that the * operator creates a list where the elements reference the same underlying object!

>>> id(arr2[0])
1474142748360
>>> id(arr2[1])
1474142748360
>>> id(arr2[2])
1474142748360

So that explains why adjusting one sublist affects all sublists: they are the same object. But why did we then not have the same issue when initializing the flat empty list with None objects? Actually, the * operator works exactly the same here and also creates a reference to the same object.

>>> id(arr[0])
140720689536224
>>> id(arr[2])
140720689536224

But if we inspect the element where we filled in a value, we do see that it is a new object:

>>> id(arr[1])
140720690012576

The crucial difference is that this NoneType object is immutable. The referenced object cannot be changed but is rather replaced with a new object. The same reasoning holds when we have a list of integers or strings. In case of a list of lists, or a list of dictionaries (any mutable data structure) however, we can adjust the referenced object and then the change will reflect onto all sublists. Because something like [1]*3 works as expected, it can be hard to spot the difference in behavior when working with nested mutable data structures.

If we explicitly replace a whole sublist with a new object, there’s no issue:

>>> arr2[1] = [2]
>>> arr2
[[1], [2], [1]]

This is not a practical solution though, because we want to be able to use functions like append() on sublists correctly. The general solution is to force Python to make a new object for each sublist, which means - however nice and clean the syntax looks - we have to avoid using * for this! Instead, create the sublists in an explicit loop or for example a list comprehension:

>>> arr3 = [[] for _ in range(3)]
>>> arr3
[[], [], []]
>>> arr3[1].append(1)
>>> arr3
[[], [1], []]

This way each of the sublists is its own object, rather than being a reference to the same list, because we force Python to evaluate [] 3 times instead of only once.