Meta crawlers are not sending the Accept-Encoding header when running over our site
1

We noticed a few weeks ago that the meta crawlers, for example, facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php), were receiving uncompressed responses when running over our site, leading to a more significant data transfer compared to other bots crawling the same pages.

After checking compression options in our application, load balancer, and CDN (Fastly), we noticed that the requests coming from FB were not sending the Accept-Encoding header (or were empty), which is the reason for the uncompressed response.

The shared image shows the difference between the FB crawler and other clients: the Accept-Encoding header is empty, and the response in bytes is 570k vs 84k.

According to the documentation, we should expect a request with an Accept-Encoding header or similar asking for a compressed response: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/. We also tested this with a different page in a different environment, and it's the same behavior.

We could reproduce this by running the FB debugger. There is a slight difference, though; the crawler first requests with Accept-Encoding: gzip, deflate and the Range header. However, immediately after that, and depending on the page size, additional requests will be without Accept-Encoding (or empty).

Is this the expected behavior? Is there some way to control it? We don't want to block the FB crawlers, but bandwidth usage and the cost are significant. We have estimated that around 1 TB per month, vs. 150 or 200 GB, is used by other crawlers making similar requests.

Thanks

Héctor
Gefragt am Donnerstag