Python is widely touted as one of the most developer-friendly languages, thanks to the fast feedback loop that comes from not needing to compile. However, when used at scale in Instagram Server, we’ve found a major usability problem when developing locally; every single change to any Python file requires developers to perform a slow server reload (~50 seconds on average!) to observe the effects of their change.
At Meta, we’ve tackled this problem by creating Lazy Imports: a Python runtime feature that provides a transparent and robust mechanism to lazy loading. Using this technique, we’ve saved hundreds of developer hours per day by reducing the cost of reloads by ~70%, allowing developers to iterate more quickly.
It all starts one morning. You wake up, pour yourself a hot cup of coffee and head to your laptop to start a productive day. You have a ton of great ideas about the things you are going to accomplish during the day. You rebase and the server is reloading while you take a sip of coffee. The day begins!
As usual, you edit a few files, so the server needs to reload. It takes some time to restart and we're all good... until... there's this bug that gives you an error, one of those obscure things you know nothing about or where it comes from. You need to add some logging so you modify one of the files listed in the traceback... ten seconds, twenty, sixty seconds... server still reloading... bang! Syntax error in your logging line! You fix the error and then save the file... server starts reloading again... reloading... reloading some more... After two minutes, you are ready to see your logs. An hour later, you finally nail the bug, it was that one line you removed two days ago, an
import, which unfortunately triggered an obscure import cycle after fetching and rebasing the latest code.
To this point, you've just burned a couple hours of your morningand got distracted from what you were supposed to get done today. Worst of all, your coffee is now cold!
You get the picture; waiting times for server reloads pile up throughout your day, and everyone else’s for that matter. That adds up quickly. Soon minutes become hours and hours become days, all wasted time.
Starting Instagram Server, we spend a large amount of time loading modules. Often, modules are highly entwined, which makes it hard to stop an Import Domino Effect when importing anything.
A server reload took around 25 seconds in late 2021. This time has historically been constantly regressing — an ongoing battle for years. If we don't pay close attention to keeping it optimized, reload times go up quickly; through 2021 it saw new heights. At its peak, by the end of the year, some reload times were taking as long as 1.5 minutes. This, unfortunately, is the perfect amount of time for engineers to get distracted by something shiny and forget what they are doing.
Why is the server so slow?
The main reason for slow reloads is the increasingly complex codebase that we have in Instagram, together with the fact that we have a ton of modules making lots of references.
If you have never seen an image of how complex the dependency graph of Instagram Server code is, Joshua Lear spent a full day preparing one. After 3 hours of running a modified dependency visualization script, he came back to a "large, black ball." At first he thought the dependency analyzer had a bug, but it turns out Instagram Server’s dependency graph was a giant circle.
Recreation (artistic interpretation) of Instagram Dependency Graph, by Joshua Lear
In all truth, the dependency graph in the Instagram codebase is a big ugly mesh; everything is very tightly connected. Just starting the server automatically triggers loading a huge number of modules, about 28,000, and most of that startup time is spent, literally, just importing modules, creating Python functions and class objects. A nicer looking dependency graph was first provided by Benjamin Woodruff and updated to reflect the current state:
Real Instagram Dependency Graph, January 2022
So what's the problem? Just figure out the heavy dependencies and remove them from the code in the hot path, right? Not quite.
Highly complex code and entwined dependencies are a recipe for disaster. Refactoring to keep dependencies clean and minimal sounds like the obvious fix, but the biggest point of friction is circular imports. As soon as you start trying to refactor, import cycles pop up everywhere.
Import cycles make refactoring harder and have historically produced several outages; even changing the import order can trigger an import cycle somewhere, either immediately or pretty soon for someone else.
In the past we’ve tried to refactor modules to break import cycles and simplify the dependency graph. We've tried carefully tailoring solutions by making expensive subsystems lazy, e.g., Django Urls, Notifications, Observers, even Regular Expressions. This works to a certain extent, but produces fragile solutions. Through the years, countless hours were spent trying to solve this by manually profiling, refactoring and cleaning things up, only to realize that much goes down the drain pretty soon as code and complexity continues growing. This process is hard, fragile and does not scale well.
What we needed was a robust way of lazyfing all things.
Two-toed sloth courtesy of Geoff Gallice via Creative Commons
We needed a more transparent, automatic, reliable and permanent way to make things lazy, instead of manually trying to make things lazy by using inner imports,
__import__(). The envisioned project was ambitious and risky, but I rolled my sleeves, dove deep into CPython and started implementing Lazy Imports in Cinder.
Lazy Imports changes the mechanics of how imports work in Python so that modules are imported only when they are used. At its core, every single import (e.g.,
import foo) won't immediately load and execute the module, it will instead create a "deferred object" name. That name will internally remain an instance of a deferred object until the name is used, which could be in the next line after importing it, or in a deep call stack, many hours later.
After a few weeks working on it, I was able to get a prototype. It was working, it was good and very promising; little did I know of the uphill battle that lay ahead. The hard part was going to be making things rock solid, making the implementation super efficient and rolling it out without too many hiccups. Changing the Python semantics, the way this feature does, would prove to be much more complex than I initially thought, and there were a lot of unexpected wrinkles to discover and fix along the way.
There are many quirks and nuances in the way Python works internally, and the Lazy Imports deferred objects unexpectedly leaked out of the C world into Python. After some very productive discussions with Carl Meyer and Dino Viehland, I decided to redesign the machinery and move most of it deeper, into the heart of Python: the dictionary internals. I was very excited, but modifying the highly optimized implementation of dictionaries could lead to a really bad performance penalty, so I took a lot of care on this part and optimizations took a fair amount of time.
At last, I was able to get a reliable and efficient version working. I enabled Lazy Imports in tens of thousands of Instagram Server modules and started running performance experiments on it to see if it would make any performance difference in production (it shouldn't). Sure enough, the net looked like almost a wash, we didn't see any clear signal that the implementation would affect negatively in production and I finally had a perf neutral build too.
In early January 2022, we rolled out to thousands of production and development hosts with no major issues, and we could immediately see the difference in Instagram Server start times in the graphs:
By loading ~12x less modules, we measured a ~70% reduction in p50 reload time and a ~60% reduction in p90 reload time for Instagram development servers. At the same time, it virtually got rid of all import cycle error events we were seeing every day. Other servers and tools consistently saw improvements between 50% to 70% and memory usage reduction of 20% to 40%.
Can you guess when Lazy Imports was enabled in the graph?
See additional results here.
Along the way, I ran into many obstacles, too many to list in this post. Some were more complex than others, but all of them were interesting and challenging. I can recall a couple bugs in CPython (bpo-41249, related to
TypedDict), some libraries that I had to remove and a whole bunch of tests that I had to fix.
In my journey making codebases compatible with Lazy Imports, the problems that are more common when we start using Lazy Imports are:
ModuleNotFoundError), which might complicate debugging.
from __future__ import annotations.
For more comprehensive issues and gotchas, see here.
Even though the concept of lazy imports is not entirely new and is conceptually simple (i.e., deferring module loading until imported names are used), we are not aware of any other low level implementation directly in CPython internals and none of the previous efforts matches our current implementation in Cinder. Some of its highlights are:
pyperformance3 times, and observed the following most significant results when Lazy Imports is enabled vs. without the patch:
This project was a huge undertaking, it couldn't have been possible without the help of the many engineers who gave their time and efforts towards something big. I truly want to take the time to thank everyone involved, for your code reviews and suggestions. For the help with rolling it out, and spreading the use of Lazy Imports beyond Instagram. For the ideas, suggestions, and reviews writing this post. Anirudh Padmarao, Ben Green, Benjamin Woodruff, Carl Meyer, Dino Viehland, Itamar Ostricher, Jacky Zhang, Joshua Lear, Krys Jurgowski, Lisa Roach, Loren Arthur, Miguel Gaiowski, Perry Randall, Xiaoya Xiang, and everyone else involved, thank you to all of you!