Page 1 of 1

[Fixed] Google banned from crawling the forum

Posted: Fri Nov 01, 2013 3:28 pm
by ivank
I was trying to search the forum with site:forums.informaction.com today, and noticed no useful results. Searching for 'noscript' finds this:

InformAction Forums • Information - NoScript
forums.informaction.com/ - Cached - Similar
Information. You have been permanently banned from this board. Please contact the Board Administrator for more information. A ban has been issued on your ...

https://encrypted.google.com/search?out ... .com&gbv=1

Re: Google banned from crawling the forum

Posted: Fri Nov 01, 2013 7:22 pm
by therube
You have been permanently banned from this board.
I noticed that on a different board & was like, huh?
Thinking it is just something going on, like there isn't enough, with phpBB boards.
I just ignored that message & logged in normally.

Or maybe that message is just meant for robots.txt?

Re: Google banned from crawling the forum

Posted: Fri Nov 01, 2013 10:58 pm
by GµårÐïåñ
This may indicate that at some point the crawling access by bots was open or * meaning either no robot.txt or one that had no deny and later a robot.txt was added with deny items and/or updated to deny bots of certain types and future updates to the cache were met with that message. It should update or flush over time as it propagates.

Re: Google banned from crawling the forum

Posted: Fri Nov 15, 2013 11:48 am
by access2godzilla
The issue still exists; it must be resolved ASAP. Could one of the mods, or maybe Giorgio look into the problem?

@Guardian: it can't be about robots.txt; it entirely depends upon the application to respect it. It must be phpBB's internal blacklist that's behind this.

Re: Google banned from crawling the forum

Posted: Fri Nov 15, 2013 12:12 pm
by Giorgio Maone
I'm checking the blacklisted IPs.
It might take a while: they're thousands and I need to perform a reverse DNS lookup for each :(

[EDIT]:
actually there where just 3 IP bans, and almost 40,000 user bans.
Yet, I've got no idea of how Googlebot is banned, exactly, since as far as I know it shouldn't try to login (or should it?)

[EDIT2]:
OK, I've found it. Apparently known search bots get assigned a conventional user account by phpBB, and Googlebot's (userid=16) has been banned by someone (not me) with reason: spam :(
Reactivating...

Re: Google banned from crawling the forum

Posted: Fri Nov 15, 2013 7:03 pm
by GµårÐïåñ
access2godzilla wrote:@Guardian: it can't be about robots.txt; it entirely depends upon the application to respect it. It must be phpBB's internal blacklist that's behind this.
Read what I said, I didn't say it was robot.txt that's the issue. I said it might have been any number of changes to the way crawler's behave on the site which turned out to be true since Giorgio just posted that it was due to the Googlebot being blocked by someone which is a very rookie mistake. So its been resolved, let's move on.

Re: Google banned from crawling the forum

Posted: Fri Nov 15, 2013 7:08 pm
by GµårÐïåñ
Giorgio Maone wrote:[EDIT2]:
OK, I've found it. Apparently known search bots get assigned a conventional user account by phpBB, and Googlebot's (userid=16) has been banned by someone (not me) with reason: spam :(
Reactivating...
This is why I only ban people that ACTUALLY post SPAM unlike some of my fellow mods who think that since the account is created, let's get them before they post. Sometimes yeah waiting a day has resulted in having to delete 30 spam, but at least we don't end up crippling our users like this. So lesson in fact that jumping the gun can have unintended consequences. And not to sound like a broken record, IP blocking is valid in VERY rare circumstances and should not be used as the primary method of banning.

Thank you Giorgio for taking care of this, sorry for the hassle :(

Re: [Fixed] Google banned from crawling the forum

Posted: Fri Nov 15, 2013 8:12 pm
by therube
Yet just who is "Google [Bot]" ?
And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?

Re: [Fixed] Google banned from crawling the forum

Posted: Fri Nov 15, 2013 8:51 pm
by Giorgio Maone
therube wrote:Yet just who is "Google [Bot]" ?
It's the name given by phpBB to its built-in Google Bot account, reserved for usage by the bot.
There are about 50 for any known bots and crawlers, all with user IDs < 50.
therube wrote: And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?
You just need to type "Google [Bot]" in the "ban by username" field (there may be other ways, though, I guess).

Re: [Fixed] Google banned from crawling the forum

Posted: Fri Nov 15, 2013 9:06 pm
by GµårÐïåñ
therube wrote:Yet just who is "Google [Bot]" ?
And how do you ban the "user" if the user doesn't show as one, & cannot, seemingly, "post"?
According to Giorgio, Google Bot actually has an account. If anyone is familiar with Google Analytics and Webmaster Tools, they know that given login access it can provide better crawling. Given that this is not the case here obviously where Giorgio gave them explicit access, the Bot can detect certain platforms, such as our forum software and when the ability exists it will create an account or use an existing one provided by the platform (ie. phpBB) so as a user it has access to more information to crawl than would probably not be available if it was accessing it anonymously. That account it uses for crawling is what seems to have been banned hence causing the bot to have issues. It believed that it was explicitly being prohibited from crawling the site, so it "broke" it. Does that help?

EDIT: When I posted this noticed Giorgio posted at the same time, he gave you the same answer just a bit more concise.