[Fixed] Google banned from crawling the forum

Discussion about the board itself, forums organization and site bugs.
Post Reply
ivank
Posts: 8
Joined: Sun Jan 23, 2011 5:06 pm

[Fixed] Google banned from crawling the forum

Post by ivank »

I was trying to search the forum with site:forums.informaction.com today, and noticed no useful results. Searching for 'noscript' finds this:

InformAction Forums • Information - NoScript
forums.informaction.com/ - Cached - Similar
Information. You have been permanently banned from this board. Please contact the Board Administrator for more information. A ban has been issued on your ...

https://encrypted.google.com/search?out ... .com&gbv=1
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
User avatar
therube
Ambassador
Posts: 7924
Joined: Thu Mar 19, 2009 4:17 pm
Location: Maryland USA

Re: Google banned from crawling the forum

Post by therube »

You have been permanently banned from this board.
I noticed that on a different board & was like, huh?
Thinking it is just something going on, like there isn't enough, with phpBB boards.
I just ignored that message & logged in normally.

Or maybe that message is just meant for robots.txt?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 Pinball NoScript FlashGot AdblockPlus
Mozilla/5.0 (Windows NT 5.1; rv:26.0) Gecko/20100101 SeaMonkey/2.23a2
User avatar
GµårÐïåñ
Lieutenant Colonel
Posts: 3365
Joined: Fri Mar 20, 2009 5:19 am
Location: PST - USA
Contact:

Re: Google banned from crawling the forum

Post by GµårÐïåñ »

This may indicate that at some point the crawling access by bots was open or * meaning either no robot.txt or one that had no deny and later a robot.txt was added with deny items and/or updated to deny bots of certain types and future updates to the cache were met with that message. It should update or flush over time as it propagates.
~.:[ Lï£ê ï§ å Lêmðñ åñÐ Ì Wåñ† M¥ Mðñê¥ ßå¢k ]:.~
________________ .: [ Major Mike's ] :. ________________
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.1.0.0 Safari/537.36
access2godzilla
Senior Member
Posts: 109
Joined: Sun May 20, 2012 5:09 pm

Re: Google banned from crawling the forum

Post by access2godzilla »

The issue still exists; it must be resolved ASAP. Could one of the mods, or maybe Giorgio look into the problem?

@Guardian: it can't be about robots.txt; it entirely depends upon the application to respect it. It must be phpBB's internal blacklist that's behind this.
Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20130401 Firefox/21.0
User avatar
Giorgio Maone
Site Admin
Posts: 9454
Joined: Wed Mar 18, 2009 11:22 pm
Location: Palermo - Italy
Contact:

Re: Google banned from crawling the forum

Post by Giorgio Maone »

I'm checking the blacklisted IPs.
It might take a while: they're thousands and I need to perform a reverse DNS lookup for each :(

[EDIT]:
actually there where just 3 IP bans, and almost 40,000 user bans.
Yet, I've got no idea of how Googlebot is banned, exactly, since as far as I know it shouldn't try to login (or should it?)

[EDIT2]:
OK, I've found it. Apparently known search bots get assigned a conventional user account by phpBB, and Googlebot's (userid=16) has been banned by someone (not me) with reason: spam :(
Reactivating...
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
User avatar
GµårÐïåñ
Lieutenant Colonel
Posts: 3365
Joined: Fri Mar 20, 2009 5:19 am
Location: PST - USA
Contact:

Re: Google banned from crawling the forum

Post by GµårÐïåñ »

access2godzilla wrote:@Guardian: it can't be about robots.txt; it entirely depends upon the application to respect it. It must be phpBB's internal blacklist that's behind this.
Read what I said, I didn't say it was robot.txt that's the issue. I said it might have been any number of changes to the way crawler's behave on the site which turned out to be true since Giorgio just posted that it was due to the Googlebot being blocked by someone which is a very rookie mistake. So its been resolved, let's move on.
~.:[ Lï£ê ï§ å Lêmðñ åñÐ Ì Wåñ† M¥ Mðñê¥ ßå¢k ]:.~
________________ .: [ Major Mike's ] :. ________________
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
User avatar
GµårÐïåñ
Lieutenant Colonel
Posts: 3365
Joined: Fri Mar 20, 2009 5:19 am
Location: PST - USA
Contact:

Re: Google banned from crawling the forum

Post by GµårÐïåñ »

Giorgio Maone wrote:[EDIT2]:
OK, I've found it. Apparently known search bots get assigned a conventional user account by phpBB, and Googlebot's (userid=16) has been banned by someone (not me) with reason: spam :(
Reactivating...
This is why I only ban people that ACTUALLY post SPAM unlike some of my fellow mods who think that since the account is created, let's get them before they post. Sometimes yeah waiting a day has resulted in having to delete 30 spam, but at least we don't end up crippling our users like this. So lesson in fact that jumping the gun can have unintended consequences. And not to sound like a broken record, IP blocking is valid in VERY rare circumstances and should not be used as the primary method of banning.

Thank you Giorgio for taking care of this, sorry for the hassle :(
~.:[ Lï£ê ï§ å Lêmðñ åñÐ Ì Wåñ† M¥ Mðñê¥ ßå¢k ]:.~
________________ .: [ Major Mike's ] :. ________________
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
User avatar
therube
Ambassador
Posts: 7924
Joined: Thu Mar 19, 2009 4:17 pm
Location: Maryland USA

Re: [Fixed] Google banned from crawling the forum

Post by therube »

Yet just who is "Google [Bot]" ?
And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 Pinball NoScript FlashGot AdblockPlus
Mozilla/5.0 (Windows NT 5.1; rv:26.0) Gecko/20100101 SeaMonkey/2.23a2
User avatar
Giorgio Maone
Site Admin
Posts: 9454
Joined: Wed Mar 18, 2009 11:22 pm
Location: Palermo - Italy
Contact:

Re: [Fixed] Google banned from crawling the forum

Post by Giorgio Maone »

therube wrote:Yet just who is "Google [Bot]" ?
It's the name given by phpBB to its built-in Google Bot account, reserved for usage by the bot.
There are about 50 for any known bots and crawlers, all with user IDs < 50.
therube wrote: And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?
You just need to type "Google [Bot]" in the "ban by username" field (there may be other ways, though, I guess).
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
User avatar
GµårÐïåñ
Lieutenant Colonel
Posts: 3365
Joined: Fri Mar 20, 2009 5:19 am
Location: PST - USA
Contact:

Re: [Fixed] Google banned from crawling the forum

Post by GµårÐïåñ »

therube wrote:Yet just who is "Google [Bot]" ?
And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?
According to Giorgio, Google Bot actually has an account. If anyone is familiar with Google Analytics and Webmaster Tools, they know that given login access it can provide better crawling. Given that this is not the case here obviously where Giorgio gave them explicit access, the Bot can detect certain platforms, such as our forum software and when the ability exists it will create an account or use an existing one provided by the platform (ie. phpBB) so as a user it has access to more information to crawl than would probably not be available if it was accessing it anonymously. That account it uses for crawling is what seems to have been banned hence causing the bot to have issues. It believed that it was explicitly being prohibited from crawling the site, so it "broke" it. Does that help?

EDIT: When I posted this noticed Giorgio posted at the same time, he gave you the same answer just a bit more concise.
~.:[ Lï£ê ï§ å Lêmðñ åñÐ Ì Wåñ† M¥ Mðñê¥ ßå¢k ]:.~
________________ .: [ Major Mike's ] :. ________________
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0
Post Reply