Using defaultdict in Python

| Wednesday, January 1, 2025

Using defaultdict in Python

Master defaultdict and other advanced data structures in our Introduction to Python and Advanced Python courses.

Dictionaries are a convenient way to store data for later retrieval by name (key). Keys must be unique, immutable objects, and are typically strings. The values in a dictionary can be anything. For many applications the values are simple types such as integers and strings.

It gets more interesting when the values in a dictionary are collections (lists, dicts, etc.) In this case, the value (an empty list or dict) must be initialized the first time a given key is used. While this is relatively easy to do manually, the defaultdict type automates and simplifies these kinds of operations.

defaultdict works exactly like a normal dict, but it is initialized with a function (“default factory”) that takes no arguments and provides the default value for a nonexistent key.

defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.

>>> from collections import defaultdict

>>> ice_cream = defaultdict(lambda: 'Vanilla')

>>> 

>>> ice_cream = defaultdict(lambda: 'Vanilla')

>>> ice_cream['Sarah'] = 'Chunky Monkey'

>>> ice_cream['Abdul'] = 'Butter Pecan'

>>> print ice_cream['Sarah']

Chunky Monkey

>>> print ice_cream['Joe']

Vanilla

>>> 

Be sure to pass the function object to defaultdict(). Do not call the function, i.e. defaultdict(func), not defaultdict(func()).

In the following example, a defaultdict is used for counting. The default factory is int, which in turn has a default value of zero. (Note: “lambda: 0″ would also work in this situation). For each food in the list, the value is incremented by one where the key is the food. We do not need to make sure the food is already a key – it will use the default value of zero.

>>> from collections import defaultdict

>>> food_list = 'spam spam spam spam spam spam eggs spam'.split()

>>> food_count = defaultdict(int) # default value of int is 0

>>> for food in food_list:

...     food_count[food] += 1 # increment element's value by 1

...

defaultdict(<type 'int'>, {'eggs': 1, 'spam': 7})

>>> 

In the next example, we start with a list of states and cities. We want to build a dictionary where the keys are the state abbreviations and the values are lists of all cities for that state. To build this dictionary of lists, we use a defaultdict with a default factory of list. A new list is created for each new key.

>>> from collections import defaultdict

>>> city_list = [('TX','Austin'), ('TX','Houston'), ('NY','Albany'), ('NY', 'Syracuse'), ('NY', 'Buffalo'), ('NY', 'Rochester'), ('TX', 'Dallas'), ('CA','Sacramento'), ('CA', 'Palo Alto'), ('GA', 'Atlanta')]

>>> 

>>> cities_by_state = defaultdict(list)

>>> for state, city in city_list:

...     cities_by_state[state].append(city)

...

for state, cities in cities_by_state.iteritems():

...     print state, ', '.join(cities)

...

NY Albany, Syracuse, Buffalo, Rochester

CA Sacramento, Palo Alto

GA Atlanta

TX Austin, Houston, Dallas

In conclusion, whenever you need a dictionary, and each element’s value should start with a default value, use a defaultdict.

Browse our Python Training courses. Ascendient Learning offers courses for teams and individual and all training is hands-on and live.

 

Building a Lake House on AWS

Building a Lake House on AWS

A Lake House on AWS connects your data lake, your data warehouse, and all your other purpose-built services into one shared catalog. Once you build your Lake House in AWS, you can store, secure and analyze your data, and control its access. Learn the full benefits and how to prepare to build a Lake House in this blog.

AWS