Meta Web Crawlers

Meta uses web crawlers (software which fetches content from websites or web apps) for a number of different purposes. This page lists the User Agent (UA) strings that identify Meta’s most common web crawlers and what each of those crawlers are used for. This page also provides guidance on how to configure your robots.txt file so that our crawlers interact properly with your site.

FacebookExternalHit

The primary purpose of FacebookExternalHit is to crawl the content of an app or website that was shared on one of Meta’s family of apps, such as Facebook, Instagram, or Messenger. The link might have been shared by copying and pasting or by using the Facebook social plugin. This crawler gathers, caches, and displays information about the app or website such as its title, description, and thumbnail image.

The specific UA string that you will see in your log files will be similar to one of the following:

  • facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
  • facebookexternalhit/1.1
  • facebookcatalog/1.0

Note that the FacebookExternalHit crawler might bypass robots.txt when performing security or integrity checks, such as checking for malware or malicious content.

Crawler Requirements

  • Your server must use gzip and deflate encodings.
  • Any Open Graph properties need to be listed before the first 1 MB of your website or app, or it will be cutoff.
  • Ensure that the content can be crawled by the crawler within a few seconds or Facebook will be unable to display the content.
  • Your app or website should either generate and return a response with all required properties according to the bytes specified in the Range header of the crawler request or it should ignore the Range header altogether.
  • Add to your allow list either the user agent strings or the IP addresses (more secure) used by the crawler.

Troubleshooting

If your app or website content is not available at the time of crawling, you can force a crawl once it becomes available either by passing the URL through the Sharing Debugger tool or by using the Sharing API.

You can simulate a crawler request with the following code:

curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" "$URL"

Meta-ExternalAgent

The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.

The specific UA string that you will see in your log files will be similar to one of the following:

  • meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
  • meta-externalagent/1.1

This crawler will roll out gradually over the next few weeks, expected to complete by 10/31/2024.

Meta-ExternalFetcher

The Meta-ExternalFetcher crawler performs user-initiated fetches of individual links to support specific product functions. Because the fetch was initiated by a user, this crawler may bypass robots.txt rules.

The specific UA string that you will see in your log files will be similar to one of the following:

  • meta-externalfetcher/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
  • meta-externalfetcher/1.1

The robots.txt file

By configuring the robots.txt file on your website, you can specify to the Meta web crawlers how you would prefer them to interact with your site. In order to block these crawlers, add a disallow for the relevant crawler to robots.txt. The Meta-ExternalFetcher crawler may bypass robots.txt because it performs fetches that were requested by the user. Also, the FacebookExternalHit crawler might bypass robots.txt when performing security or integrity checks.

User-agent: meta-externalagent
Allow: /                    # Allow everything
Disallow: /private/         # Disallow a specific directory

Crawler IPs

If a crawler has a source IP address that is on the list generated by the following command, it indicates that the crawler is coming from Meta.

whois -h whois.radb.net -- '-i origin AS32934' | grep ^route  

Note that these IP addresses change often. For more information, please go to our Peering webpage or the related downloadable data (CSV format).

Example Response

...
route:      69.63.176.0/21
route:      69.63.184.0/21
route:      66.220.144.0/20
route:      69.63.176.0/20
route6:     2620:0:1c00::/40
route6:     2a03:2880::/32
route6:     2a03:2880:fffe::/48
route6:     2a03:2880:ffff::/48
route6:     2620:0:1cff::/48
... 

Contact us

If you have questions or concerns, please contact us at webmasters@meta.com (Meta Web Masters).