Facebook's link crawler (facebookexternalhit/1.1) is making an extreme amount of requests to our site
1

Does anyone have any idea who to contact about the Facebook link preview crawler (user agent http://www.facebook.com/externalhit_uatext.php)? It's constantly making requests to our site in what seems to be a loop of sorts. Things have gotten bad enough that I've had to rate limit it! At the rate things are going, I'm sure it's blocking legitimate requests for previews of links on our site.

Eric
Đã hỏi khoảng 2 tháng trước
Eric

I've started rate limiting the crawler. I'll report back and see what kind of effect that has on us. I'm also going to look into whether or not temporarily disabling ads has any effect.

25 tháng 3 lúc 07:42
Eric

Disabling ads had no effect!

4 tháng 4 lúc 09:34
Darin

We're having the same issue. We started using Facebook Ads a few months ago, so I'm suspicious about the connection.

6 tháng 5 lúc 09:29
Câu trả lời được chọn
1

We are a web host and this popped up for a customer. It's something that's been going on for many years.

Independant testing has shown that Facebooks system is open to abuse and attackers can easily use it to attack a site and send relativly high amounts of traffic to a site. It might not be the case that they can use it to such an extent as to take down serious infrastructure, but it can easily chew up resources for a small business. Facebook, as far as I know, has never offered any discussion or features to help or to secure their system against the abuse.

Most likely it's not happening enough to get high on their radar. Of course the people that use it for abuse love that because they fly under the radar. When it comes to a serious enough situation Facebook will most likely get used by a state based actor who can leverage the situation enough to do real damage to someone at the worst possible time. That's my prediction anyway. You read it here first ;)

There will be various ways to defeat it, all of which will have consequences on your Facebook interactions.

Consider one thing, if there is an actual loop in your website, where a dumb crawler will think that essentially the same page is multiple pages, then Facebooks crawler has apparently no logic to understand that. We had this happen where a customer had a system of article TAGs on their site and Facebook would cascade through the tags thinking every combination of tags was going to result in a different page. They appear to have no throttling to slow it down or any way to tell it to exclude certain pages. Might be best to avoid such a setup on your websites anyway.

Best of luck!

10 tháng 5 lúc 05:05
Steve
Steve

A solution has been found here with Headers HTTP_ACCEPT_ENCODING and HTTP_RANGE send by real Facebook Crawler : https://www.c2script.com/scripts/bloquer-les-faux-partages-facebook-venant-de-facebookexternalhit-s66.html

17 tháng 5 lúc 06:43
Steve

Thanks for this. We generally are not in control of the code for our customers sites, but we will pass this on to customers who encounter this issue.

17 tháng 5 lúc 07:22