In this post, we explain Pysa, a static analysis tool to detect and prevent security issues in Python code, in a way that is super simple to understand (or as it’s commonly known online, ELI5. If you're interested in learning by watching or listening, check out a video about this open source project on our Facebook Open Source Youtube channel.
Consider how large code bases are built; in a single day, many new changes could be proposed, and each change could have many touch points. It is important to test if there are any bugs in the code, especially bugs which could cause security issues.
Pysa, which stands for Python Static Analyser, was developed to help catch these issues. It tracks data as it flows through a program to quickly detect if there is a bug and highlight all the affected code. If an issue is found, Pysa alerts the software engineers or security engineers so they can fix the bug before the code change is ever merged into the codebase.
Here’s how it works. First off, Pysa is a static analyzer which means it can analyze code without needing to run it. To use it, the user needs to define sources (places where the data we are interested in originates) as well as sinks (dangerous locations where data from sources could end up). Let’s look at an example. Say that we want to detect a remote code execution (RCE), a well-known vulnerability in web applications. A source would be when user controlled data enters the code, such as when accessing request.GET. A possible sink could be during code execution, such as running subprocess.run(). Pysa would track the flow of user controlled data to see if it makes it into subprocess.run. This tracking is done by performing iterative rounds of analysis. Each round builds out summaries that track which functions return data from the source and which functions eventually pass data to the sink. If Pysa does detect that the source connects to the sink, it will report an issue.
Pysa was first open sourced in early 2018 as part of the Pyre project. At Facebook, we use Pysa extensively on Instagram's code base. In the first half of 2020, 44% of Instagram server issues found by the security team were found using Pysa. Outside of Facebook, Pysa has been incorporated into open source projects such as Zulip. Pysa has detected security issues such as CVE-2019-19775, as well as remote code execution (RCE) attacks, server side request forgeries (SSRF), cross-site scripting (XSS) attacks, and open redirection vulnerabilities.
To learn more about Pysa, visit their website. It contains documentation for those who are just starting out or want to use more advanced features. If you would like to see Pysa in action, the project’s github repo has several Pysa tutorials and an accompanying video to walk you through them.
If you have any questions, you can file an issue on the Github repo.
If you have any further questions about Pysa, let us know on our Youtube channel, or by tweeting at us. We always want to hear from you and hope you will find this open source project and the new ELI5 series useful.
In a series of short videos (~1 min in length), one of our Developer Advocates on the Facebook Open Source team explains a Facebook open source project in a way that is easy to understand and use.
We will write an accompanying blog post (like the one you're reading right now) for each of these videos, which you can find on our YouTube channel.
Interested in working with open source at Facebook? Check out our open source-related job postings on our career page by taking this quick survey.