Replacing the pytz dependency with native zoneinfo

By Will Thong, Wed 02 August 2023, in category Programming

programming, python

For my first foray into contributing to open source projects, I thought there’d be nothing better than making it meta by helping out the brilliant project which this blog is built with: the Pelican static site generator! So I looked in its Issues for a good-first-issue tag. This issue immediately jumped out to me: it was replacing an exising feature, so there was a clear yardstick for success, and guardrails to prevent me from inflicting too much damage. I also thought it would teach me something about how Python handles time zones. I actually ended up learning a lot more than I’d expected, and hopefully the following can help anyone else hoping to replace pytz with Python’s native alternatives.

The problem

Pelican relied on a third-party dependency, pytz. This was a popular library which defined timezones, allowing us (for instance) to accurately work out which of an article published at 7.15pm in New York time and an article published at 9.00pm in London time was published first. However, Python 3.9 brought this functionality into Python’s standard library through zoneinfo and the developers wanted the external dependency removed.

How I went about fixing it

Having forked and branched the projeect, I got to reading the source code. I began by grepping1 for pytz, and in each case working out why pytz was being used.

First, I identified that pytz was being used in contents.py. Its timezone method was used to convert the user-selected ‘TIMEZONE’ setting from a string in the list of TZ database time zones (see Pelican documentation) into a pytz.timezone object. This object would then be used to specify a timezone for the timestamp for each piece of content. Looking at the zoneinfo documentation, it seemed that its own ZoneInfo class could be initialised using the same string. I tested this in a Python shell before replacing the timezone method with the ZoneInfo class. Similarly, contents.py was also using pytz to define a default timezone to work out if a draft was in the future or the past. If the draft had a timezone in it, pytz was needed to establish the timezone-aware current time (that is, the time when the contents were being generated). Fixing this was a simple matter of replacing the pytz.utc timezone with native Python’s timezone.utc.

The next place pytz cropped up was in pelican_quickstart.py: when the user runs the script after first installing Pelican, it generates a list of timezone options to offer the user.2 Originally, list comprehension was used to iterate through the list of timezones in pytz.all_timezones. It would also store the timezones in a list so that, having compared lower-cased versions of the user’s input and the timezone, it could store the properly-capitalised version (a UI feature which permits the user latitude in how they type the timezone). The equivalent of pytz.all_timezones in native zoneinfo was the available_timezones() function. However, upon replacing the list comprehension and lookups, I encountered a bug. pytz.all_timezones is a LazyList, whereas zoneinfos available.timezones() function returns a set. As sets don’t have indexes, I couldn’t use a list to store the timezones as in the original code. Instead, I chose a dictionary to record lowercase versions of each timezone (the key) against its proper capitalisation (the value), the latter being finally stored in the settings.

The final place pytz was used was utils.py, where the set_date_tzinfo function converted timezone-naïve dates to timezone-aware dates. Similarly to contents.py, this was a simple drop-in replacement.

Tests

Having made my changes, I tested running pelican on my system’s default version of Python: 3.11. The first problem was a complaint that tzlocal was missing, so I simply imported it in the offending script (pelican_quickstart.py).

Then, conscious that Pelican also supports Python 3.7 and 3.8 (neither of which includes zoneinfo), I tested pelican in those Python versions using pyenv to select the appropriate Python version and poetry to install Pelican’s dependencies. Of course I got an error when the various scripts tried to import zoneinfo. Luckily (and as referenced in the initial issue), a backport exists so I added it to the Poetry configuration for Python versions earlier than 3.9 (I later learned I should’ve put it into setup.py too). I initially used try/except block to import either zoneinfo or backports.zoneinfo for earlier version of Python, then using further try/except blocks to call the relevant module. However, one of Pelican’s maintainers, Deniz, helpfully pointed out that it’d be simpler and more efficient to import the backport with the same name in the initial try/except block, thus obviating the need for any future error handling.

Unanticipated challenges

Having pushed changes to my fork and made a pull request, I was surprised to notice that my commit had changed nearly every single line in contents.py. This made it much harder for code reviewers to see what changes I had made. I discovered that these changes were automatically made by my editor, Neovim, upon save, according to the code formatter I had installed, Black. The lesson was twofold. First, when working with others’ code, it is important not to format the entire file! An easy way of doing this in Vim is to temporarily not run any post-save scripts with :noa w. Second, it is essential to check with git status before making any commits! Doing so would have shown me that I had inadvertently made a lot of changes.

Another challenge popped up after making my pull request. Unlike Linux and macOS, Windows does not provide the IANA database of timezones. To fix this, I used backports-zoneinfo[tzdata].

What I learned

I really enjoyed learning more about how Pelican and Python work under the hood. Alongside the lessons detailed in the above section, this bug fix also taught me lots about the open source software development process. I learned how to use pyenv to ensure compatability across different Python versions and how to use poetry to handle dependencies. I learned some of the nuances of importing dependencies into Python programs. Working with GitHub Actions, meanwhile, gave me a practical understanding of how continuous integration and continuous delivery, specifically automated testing, ensures that code changes can be integrated back into a codebase.

Pelican’s maintainers, in particular Justin Mayer, were incredibly helpful and encouraging through the process of my first open source contribution. This was a great experience and I’d recommend it to anyone!


  1. I think we still use the grep verb even if the program we’re actually using is the faster ripgrep

  2. Logically, I suppose I should have dealt with this before looking at contents.py, but in the event as it was a marginally more complex problem I’m happy that rgs alphabetic order presented them to me the ‘wrong’ way round.