Finding modifications to Immutable data-structures via Infer

October 22, 2021ByLucas Daher

This article was written by Lucas Daher, Ezgi Çiçek, and Jules Villard.

Facebook Lite is a slimmer version of the Facebook app that is optimized to run in lower end or older phones and with restricted internet access, such as 2G networks.

There is an ongoing refactoring effort to migrate the whole view hierarchy of the Facebook Lite for Android app to an immutable paradigm. In this post, we want to highlight how Infer, a powerful and open source static analysis tool, is being used as a crucial part of this project.

Towards immutable data structures

It is no mystery why this migration is being done; the benefits of immutability are many and well known, especially when working in multi-threaded environments.

With immutable data structures, there are no race conditions: there are no write operations and all the read operations can run in parallel without any risk. As a contrast, if we have a mutable list and two threads in parallel accessing it, one wants to add a new element to the end, and the other wants to read the last element. In this case, we have a race condition: depending on what gets executed first, the thread reading the last element will get a different result.

Since there are no race conditions, we avoid complicated mutexes. How do we solve the race condition example above without immutability? With a mutex, specifically in this case a readers-writer lock. Ok, one case is solved, but in a complex application we can easily have hundreds or thousands of those to solve. With so many locks, deadlocks are a possibility, performance becomes an issue, etc. Handling mutexes in large systems gets very complicated.

This brings us to the next point: immutable paradigms are easier to reason about. Saying that handling locks is complicated is an understatement; the number of interleavings grows exponentially with execution. A nice example can be found in the "how hard is hard?" section of this post (spoiler: unbelievably hard).

With so much complexity, the best way to handle it is to not care. If there are no data races and no locks, then we do not care that the program is nondeterministic with practically endless execution possibilities—we can just write code as if in a single threaded environment and it will just work.

All the points described above result in better software that can be objectively measured in fewer bugs and crashes.

The problem

The problem is famous and highly discussed in Java circles: arrays do not cooperate with an immutable paradigm. The standard way of making an object immutable in Java is to declare it as final, but that is not enough for an array. We can modify the elements of an array like this:

And the compiler accepts it. The reason is that making it final just guarantees that it always points to the same array object. Therefore, this code won’t compile:

But of course this is not the desired behaviour: we want an immutable array with immutable elements.

Common solutions

There are several approaches to address this problem:

migrate from arrays to another data structure: native, third party, or custom implemented
change the way that the array is accessed:
- make a copy every time a getter is called
- never expose it directly, e.g., instead of having a getArray method, have getElement and getSize.

These approaches (except the copy one that has its own performance issue) have one big problem: as a refactor, they cause a lot of disruption. For example, imagine a method that receives an array and does something with it. The code is refactored and now in some calls the parameter that used to be an array becomes an ImmutableList while other calls remain unchanged, or there is no longer access to the array. It's annoying, but solvable. Now imagine a huge and very complex codebase with countless cases like this! The annoyances pile up and might become a big problem and a time sink of human hours.

Is there a way to leave most of the code intact without a huge refactoring effort, and only fix the parts that are problematic, i.e., modify immutable data-structures?

Static Analysis with Infer

Enter static analysis, which can help us find modifications to immutable data structures by analyzing the code statically without running it.

At Facebook, we develop and use the open-source Infer static analysis platform, which includes a powerful memory analysis called Pulse that can detect invalid memory manipulations such as null pointer exceptions, use-after-free errors (in C/C++), and many more. For this work, we collaborated with the Infer team to extend Pulse to also detect modifications to the immutable data structures that are used in the Facebook Lite view hierarchy (e.g., arrays annotated by @Immutable or hash maps with immutable arrays as values).

Pulse is an inter-procedural analysis based on Separation Logic. It analyzes the program method by method, inferring for each method a “summary” describing the effect of the method in terms of pre- and post-conditions that each describe sets of concrete machine states (possible values of parameters, heap structure, etc.). Then our immutability analysis post-processes these function summaries in order to flag modifications to immutable data-structures. For each such issue detected, it reports a trace through the code explaining how the modification can occur.

Furthermore, a fundamental characteristic of Pulse is that it performs a so-called under-approximate analysis, meaning that any time it reports an issue on a code path, that path is feasible in an actual run of the program (caveats apply, e.g., the analysis makes optimistic assumptions about library code to which it doesn’t have access and it ignores concurrency). This has important consequences for the immutability analysis: virtually all reports of immutability violations are true positives (i.e., actionable issues that must be fixed), at the expense of some code paths that may be missed by the analysis and lead to false negatives (i.e., the analysis may not find all bugs).

Using the Immutability Analysis in your Project

The immutability analysis is part of the Impurity analysis of Infer and activated by a flag. To use it, annotate fields meant to be immutable with @Immutable, then run Infer with

--impurity-only --impurity-report-immutable-modifications

For example, if your project uses Maven, you can run:

Infer will then report the modifications to immutable collections as errors. Here is a complete simple example involving two files:

Learn more about Infer on its website.

Conclusion

The novel solution presented here is minimally invasive as it requires only a small refactoring of the code: annotate immutable data-structures, run Infer's static analysis, fix the issues infer found (i.e., modifications to immutable data-structures), and it is done. There is no need to refactor the whole app to switch to different data structures, change thousands of lines, etc.

Static analysis is also attractive performance-wise for the app. All the analysis is performed statically ahead of time, so there is no run time performance regression. Arrays are fast and compact, with less overhead than immutable lists. Needless to say, not having to make copies every time a getter is called is much more performant than having to copy potentially mutable data. And there is only the need to import a new annotation in a small package, so the apk size regression is minimal.

To learn more about Facebook Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter and Facebook.

Interested in working with open source technologies at Facebook? Check out our open source-related job postings on our career page.

Tags:

Developer Tools Open Source 2021

More News