Recaptcha ABE Rule Recipe (s)

Discussions about the Application Boundaries Enforcer (ABE) module
Mad_Man_Moon
Senior Member
Posts: 74
Joined: Fri Oct 27, 2017 12:02 pm

Recaptcha ABE Rule Recipe (s)

Post by Mad_Man_Moon »

I thought it might be useful for those still on the old noscript with ABE to have my Google/Recaptcha ABE recipe. I've built this up over time through trial and error, so it's messy.

Obvious Caveat: NoScript Classic / ABE Is Not Supported

It requires all three rules, and where there is an ellipsis that indicates many more URLs that aren't worth listing or might trigger forum filters ;) .

Goes without saying that any improvements to this are gratefully received!

20220422160200 - Updated from fatboy's great advice to cleaner (and safer) regexes.

Rule 1 - Recaptcha ONLY Locations

Code: Select all

#recaptcha
Site .recaptcha.com .recaptcha.net .google.com/recaptcha/* .gstatic.com/recaptcha/* https://www.google.com/recaptcha/api2/
Accept from https://greasyfork.org/* .stackoverflow.com ...
Sandbox from .itch.io
Anonymize from .yahoo.com .fightcade.com
#pron
Accept from pornhub.com .phncdn.com
Deny INC
Rule 2 - Google Excluding Recaptcha

Code: Select all

## This rule allows Google scripts objects and frames to be included only
## from Google pages and apps
Site ^\Qhttps://\E(?:[^\/:]+\.)?google\.com((?!recaptcha).)*$ .google.co.uk ...
Accept from .google.com .gstatic.com .google.co.uk ...
# pron
Anonymize from .pornhub.com .phncdn.com
Deny INC
regex101 - https://regex101.com/r/N6rFFf/3

Rule 3 - Google Static
If you already have a gstatic rule, just change the Site to this.

Code: Select all

# google static
# gstatic is used for a lot of stuff everywhere other than recaptcha
Site ^\Qhttps://\E(?:[^\/:]+\.)?gstatic\.(?:net|com)((?!recaptcha).)*$
Accept from .google.com .google.co.uk ...
# Pron
Anonymize from .pornhub.com .phncdn.com
Deny
regex101 - https://regex101.com/r/j5CBKR/1
---

Some thoughts:

Anonymize Rules
Where possible I prefer to use anonymize, and at the time of addition those that are in the anonymize sections work from there. Otherwise it appears it can be picky about knowing where the requests are coming from and rarely connects to the recaptcha service unless it's in a full accept path.

REGEX Tidying
I am ridiculously new to regex and I think that I've immediately identified one change that I need to make in tidying up, as there are two different versions of regexxy stuff before domains:

Code: Select all

^(.*)gstatic\.com
and

Code: Select all

^[^\/]+:\/\/[^\/]*?\.?(gstatic\.com\/)
As far as I can see via regex101 they do the same thing, the only difference being that for some reason I have gstatic in a selection for acting on. :roll:

Rules Tidying
I think I could include another simple fix / tidy by replacing all the obstensibly same domains with:

Code: Select all

SELF++
But I'm not sure that I fully understand that.

Methods Tidying
This is all still a foreign language to me, and I've no doubt that the whole thing could be locked down even further to purely allow the type of content required for recaptcha, but ... yeah ... methods. :oops:
Old Abe PDF wrote:1.2. Methods
The <method> component of a <predicate> can be any HTTP method (GET, POST, HEAD, PUT, DELETE, TRACE, OPTIONS) with the addition of 3 “pseudo” methods:
  • ALL – the <action> of this <predicate> must be enforced independently from the HTTP method of the requests (i.e. for all methods)
  • SUB – the <action> of this <predicate> must be enforced only if this is a subdocument request, i.e. if the requested resource is going to be shown in a frame or iframe
  • INCLUSION (alias INC) – the <action> of this <predicate> must be enforced only if this is an inclusion sub-request (i.e. not a top-level load). The inclusion type(s) to match can be listed as optional comma-separated arguments inside parentheses, e.g. INCLUSION(SCRIPT, OBJ).
    If no type is specified, this pseudo-method matches any sub-request.
    Valid types are SCRIPT, CSS, IMAGE, OBJ (plugin objects and sub-requests from plugin objects), OBJSUB (just sub-request from plugin objects), MEDIA, FONT, SUBDOC (subdocuments, i.e. documents loaded in frames and iframes), XBL, PING, XHR (***ObfuscateForFilter*** and Fetch loads), DTD, OTHER and UNKNOWN. Starting with NoScript version 5.0.8 all the “external” types defined in Mozilla’s nsIContentPolicy.TYPE_* constants are dynamically supported: the name of the ABE type is the same as the constant’s, but without the TYPE_ prefix. UNKNOWN matches any request not mapped yet to a specific nsIContentPolicy.TYPE_ (like TYPE_OTHER), while OTHER, for historical/compatibility reasons, matches the same requests as UNKNOWN plus any TYPE_* (like TYPE_WEBSOCKET or TYPE_CSP_REPORT) not matched by the original “static” ABE types.
Even down to my (probably not quite right) usage of the Deny INC that flits and flitters between rules.

I'd have hidden many of these paragraphs, but I'm dumb in that I can't find the 'hide'/'spoiler' function. Anyway ... that's about it!
Last edited by Mad_Man_Moon on Fri Apr 22, 2022 3:03 pm, edited 1 time in total.
fatboy
Senior Member
Posts: 79
Joined: Fri Jul 25, 2014 6:56 am
Contact:

Re: Recaptcha ABE Rule Recipe (s)

Post by fatboy »

Rule 2
Site ^https?://(?:[^/:]+\.)?google\.com/(?!recaptcha)(?:\S+|$)
...

Rule 3
Site ^https?://(?:[^/:]+\.)?gstatic\.(?:net|com)/(?!recaptcha)(?:\S+|$)
...
Mad_Man_Moon
Senior Member
Posts: 74
Joined: Fri Oct 27, 2017 12:02 pm

Re: Recaptcha ABE Rule Recipe (s)

Post by Mad_Man_Moon »

Ooh, nice, thanks. :mrgreen:

Would I be right in assuming that the inclusion of the https at the beginning is to ensure secure connection there and that's all? I mean, I know that my one allows any protocol at all, and any subdomain ... in my dumb head I'd thought that prudent here ... but it IS a dumb head.

Also, I realise now that the extra brackets are superfluous (they're not, they contain the any character '.' which makes it work) ...

Code: Select all

^(.*)google\.com((?!recaptcha).)*$
However, I am curious ... what difference does:

Code: Select all

(?:\S+|$)
... make from ...

Code: Select all

*$
?

I'm playing now on regex101, but I'm sure it'll be useful for others even if I do work it out. :-)
fatboy wrote: Wed Apr 13, 2022 6:17 pm Site ^https?://(?:[^/:]+\.)?google\.com/(?!recaptcha)(?:\S+|$)
EDIT

OK, so I'm seeing:
  • It's separating all other characters from an end of line.
  • You've placed them in a non-capturing group, to contain the OR stancheon.
  • Also the \S is any non-whitespace character, and + is the same as * but greedy, but I don't really know what that means.
So ... I now recognise that I should maybe have a '.' before my '*', but (purely out of curiousity) why would I:
  1. ... use a "non-whitespace" instead of "any" character?
  2. ... go greedy instead of 'non'?
  3. ... need to 'or' the EOL? Since surely any amount of characters includes zero characters.
Also, if I move to the 'or', shouldn't the "\S+" be "\S+$"?

EDIT 2

OK, '+' differs from '*' because there must be at least one of the preceding character. Surely this needs to be open to zero, no?

I am fully aware I'm the stupid one here, but surely the negative look ahead is doing our 'worst timeline' OR function here, all that needs to come after that is 'anything else' ... right?

I'm questioning everything right now :lol: ... although ... ... OK ... no, mine does work as it matches recaptcha ANYWHERE in the text following the main domain/tld/etc.
Mine https://regex101.com/r/N6rFFf/1

I've just put yours into that checker and it will allow URL/URIs with recaptcha in them:
Yours https://regex101.com/r/YYJhyH/1

I think that (considering mine has been working) the safest bet would be...

For those that don't otherwise differentiate between protocols use this:

Code: Select all

^(.*)google\.com((?!recaptcha).)*$
For security, then this works:

Code: Select all

^\Qhttps://\E(.*)google\.com((?!recaptcha).)*$
... or to save one character:

Code: Select all

^https\:\/\/(.*)google\.com((?!recaptcha).)*$
However, the one thing (which I'm guessing your "(?:[^\/:]+\.)?" bit was about) is around the subdomains, as I'm well aware that you could use the following to get around my very lax "(.*):

Code: Select all

https://sdfsdfsdf.boogle.com/www.google.com/asd
Last edited by barbaz on Fri Apr 22, 2022 1:52 pm, edited 1 time in total.
Reason: kill board-generated link
Mad_Man_Moon
Senior Member
Posts: 74
Joined: Fri Oct 27, 2017 12:02 pm

Re: Recaptcha ABE Rule Recipe (s)

Post by Mad_Man_Moon »

(thanks for the url removal, barbaz! - although apparently now I can't edit the post without spammy :lol: )

So here's an updated set:
If you trust google not to be hacked:

Code: Select all

^.google\.com((?!recaptcha).)*$
For security, then this works:

Code: Select all

^\Qhttps://\E(?:[^\/:]+\.)?google\.com((?!recaptcha).)*$
... or to save one character:

Code: Select all

^https\:\/\/(?:[^\/:]+\.)?google\.com((?!recaptcha).)*$
I think I'll be using one of the security ones, I'll update the main one accordingly, presently.

I'm also now understanding the need for the package after the protocol slashes, and that's also in the above:

Code: Select all

(?:[^\/:]+\.)?
  • ()?
    This wrapper says that it must match what's inside zero to one times, on an infinite loop or some nonsense. Anyway, it's a thing, I guess to ensure, if it DOES happen, it checks what's inside ... if there's nothing, then great.
  • ?:
    This denotes that it is a non-capturing group, so it can't be referenced later.
  • [^\/:]+
    This says that any text captured must not include either a / or a colon.
  • \.
    This last part is a dot, a full stop, a period.
I guess that my only question is that colon in there ... is it to prevent port stuff, or something?

Oh! Sorry, I got so caught up in my questions that I didn't also thank you, fatboy, for the net/com thing. THANKS! :)
fatboy
Senior Member
Posts: 79
Joined: Fri Jul 25, 2014 6:56 am
Contact:

Re: Recaptcha ABE Rule Recipe (s)

Post by fatboy »

Your regular expression is not quite correct, although in this case it works as expected.
…example\.com((?!recaptcha).)*… also matches "example.com.br/", which shouldn't be the case.
Shouldn't there be a "/" between "com" and "recaptcha"?
Mad_Man_Moon
Senior Member
Posts: 74
Joined: Fri Oct 27, 2017 12:02 pm

Re: Recaptcha ABE Rule Recipe (s)

Post by Mad_Man_Moon »

Oh, no, I do get that I'm probably off in parts, but you're right that it's somewhat fit for purpose, too.

I mean there *should* be a slash there, in any normal url. But this is trying to ensure that *anything* that has 'recaptcha' after google.com is not accepted.

I can certainly edit that in. I'll play around with it tomorrow ... have you been testing it?
fatboy
Senior Member
Posts: 79
Joined: Fri Jul 25, 2014 6:56 am
Contact:

Re: Recaptcha ABE Rule Recipe (s)

Post by fatboy »

>have you been testing it?
https://ipic.su/img/img7/fs/2022-04-23_ ... 701674.png
 
>But this is trying to ensure that *anything* that has 'recaptcha' after google.com is not accepted.
Maybe so (not just google):
Rule 2+3

Code: Select all

Site .*/recaptcha* ...
Accept from .google.com .gstatic.com .google.co.uk ...
# pron
Anonymize from .pornhub.com .phncdn.com
Deny INC
or Rule 1+2+3

Code: Select all

Site .recaptcha.* .*/recaptcha*
Accept from .recaptcha.* .google.* .gstatic.com 
https://greasyfork.org/* .stackoverflow.com ...
# pron
Anonymize from .pornhub.com .phncdn.com 
.yahoo.com .fightcade.com
Deny INC
Post Reply