Back to News for Developers

ELI5: Captum - Easily Interpret AI Models

July 6, 2021ByJessica Lin

Artificial Intelligence is often described as a “black box”; we plug inputs in and the AI model gives us an output back, but we don’t have an easy way to understand how the model came to produce that output. The process to train AI generally goes like this - we submit inputs into a computer, train the computer to obtain an AI model, and use this model to predict our outputs. But how is that model actually working to make its conclusions?

Captum, from the Latin word for “comprehension,” aims to help us peer behind the curtain so that we can better interpret how the AI we program is “thinking,” giving us critical info on how to improve it.

In this blog post, we explain Captum in a way that is simple to understand (or as it’s commonly known online, ELI5. If you're interested in learning by watching or listening, check out the linked video about Captum on our Facebook Open Source Youtube channel.

What is Captum?

Captum is an open-source tool built on PyTorch that helps researchers and developers easily understand which features are contributing to an AI model’s prediction outputs. What Captum does is called model interpretability because it interprets the features the AI models used to make predictions.

For example, say we want to train an AI model to identify pictures of dogs. We would input lots of pictures of dogs into the computer so that it can start to understand what are defining features of a dog. It then creates a model that is able to recognize a dog the next time it sees one; however, we can’t know exactly what the computer is looking at in the images to come to the conclusion of whether or not there is a dog. Captum can help us see exactly which features the AI model is relying on to recognize if there is a dog in our picture.

In this example, Captum shows us that these green pixels are the ones most critical to helping us identify the dog in the image.

Captum is multimodal, meaning it can be used for any type of AI, across domains, including images, videos, and text. Captum is also extensible, which means you are able to add new algorithms and features on top of what already exists.
Since it can be difficult to interpret models without the proper visualization, Captum comes with a web interface called Captum Insights which works across images, text, and other features to help users easily visualize and understand feature attribution.
We recommend Captum to model developers looking to improve their work, interpretability researchers focused on multi-modal models, and application engineers who are using trained models in production.

Where is Captum used?

Captum was open sourced in mid-October 2019 during the PyTorch Developer Conference.

Within Facebook, we use Captum to better understand how our AI models interpret Profile and other Facebook pages, which often host a variety of text, audio, picture, video, and linked content.

Where can I learn more?

To learn more about Captum, visit their website, which contains documentation, tutorials, and API references. Captum’s Github repo has step-by-step instructions for installation and getting started.

If you have any further questions about Captum, let us know on our YouTube channel, or by tweeting at us. We always want to hear from you and hope you will find this open source project and the new ELI5 series useful.

About the ELI5 series

In a series of short videos (~1 min in length), one of our Developer Advocates on the Facebook Open Source team explains a Facebook open source project in a way that is easy to understand and use.

We will write an accompanying blog post (like the one you're reading right now) for each of these videos, which you can find on our YouTube channel.

To learn more about Facebook Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter and Facebook.

Interested in working with open source at Facebook? Check out our open source-related job postings on our career page by taking this quick survey.