Ooh, nice, thanks.
Would I be right in assuming that the inclusion of the https at the beginning is to ensure secure connection there and that's all? I mean, I know that my one allows any protocol at all, and any subdomain ... in my dumb head I'd thought that prudent here ... but it IS a dumb head.
Also,
I realise now that the extra brackets are superfluous (they're not, they contain the any character '.' which makes it work) ...
Code: Select all
^(.*)google\.com((?!recaptcha).)*$
However, I am curious ... what difference does:
... make from ...
?
I'm playing now on regex101, but I'm sure it'll be useful for others even if I do work it out.
fatboy wrote: ↑Wed Apr 13, 2022 6:17 pm
Site ^https?://(?:[^/:]+\.)?google\.com/(?!recaptcha)(?:\S+|$)
EDIT
OK, so I'm seeing:
- It's separating all other characters from an end of line.
- You've placed them in a non-capturing group, to contain the OR stancheon.
- Also the \S is any non-whitespace character,
and + is the same as * but greedy, but I don't really know what that means.
So ... I now recognise that I should maybe have a '.' before my '*', but (purely out of curiousity) why would I:
- ... use a "non-whitespace" instead of "any" character?
- ... go greedy instead of 'non'?
- ... need to 'or' the EOL?
Since surely any amount of characters includes zero characters.
Also, if I move to the 'or', shouldn't the "\S+" be "\S+$"?
EDIT 2
OK, '+' differs from '*' because there must be at least one of the preceding character. Surely this needs to be open to zero, no?
I am fully aware I'm the stupid one here, but surely the negative look ahead is doing our 'worst timeline' OR function here, all that needs to come after that is 'anything else' ... right?
I'm questioning everything right now
... although ... ... OK ... no, mine does work as it matches recaptcha ANYWHERE in the text following the main domain/tld/etc.
Mine https://regex101.com/r/N6rFFf/1
I've just put yours into that checker and it will allow URL/URIs with recaptcha in them:
Yours https://regex101.com/r/YYJhyH/1
I think that (considering mine has been working) the safest bet would be...
For those that don't otherwise differentiate between protocols use this:
Code: Select all
^(.*)google\.com((?!recaptcha).)*$
For security, then this works:
Code: Select all
^\Qhttps://\E(.*)google\.com((?!recaptcha).)*$
... or to save one character:
Code: Select all
^https\:\/\/(.*)google\.com((?!recaptcha).)*$
However, the one thing (which I'm guessing your "
(?:[^\/:]+\.)?" bit was about) is around the subdomains, as I'm well aware that you could use the following to get around my very lax "(.*):
Code: Select all
https://sdfsdfsdf.boogle.com/www.google.com/asd
Ooh, nice, thanks. :mrgreen:
Would I be right in assuming that the inclusion of the https at the beginning is to ensure secure connection there and that's all? I mean, I know that my one allows any protocol at all, and any subdomain ... in my dumb head I'd thought that prudent here ... but it IS a dumb head.
Also, [s]I realise now that the extra brackets are superfluous[/s] (they're not, they contain the any character '.' which makes it work) ...
[code]^(.*)google\.com((?!recaptcha).)*$[/code]
However, I am curious ... what difference does:
[code](?:\S+|$)[/code]
... make from ...
[code]*$[/code]
?
I'm playing now on regex101, but I'm sure it'll be useful for others even if I do work it out. :-)
[quote=fatboy post_id=105570 time=1649873843 user_id=257423]
Site ^https?://(?:[^/:]+\.)?google\.com/(?!recaptcha)(?:\S+|$)
[/quote]
[b]EDIT[/b]
OK, so I'm seeing:
[list][*]It's separating all other characters from an end of line.
[*]You've placed them in a non-capturing group, to contain the OR stancheon.
[*]Also the [b]\S[/b] is any non-whitespace character, [s]and [b]+[/b] is the same as [b]*[/b] but greedy[/s], but I don't really know what that means.[/list]
So ... I now recognise that I should maybe have a '.' before my '*', but (purely out of curiousity) why would I:
[list=1][*]... use a "non-whitespace" instead of "any" character?
[*]... go greedy instead of 'non'?
[*]... need to 'or' the EOL? [s]Since surely any amount of characters includes zero characters.[/s][/list]
Also, if I move to the 'or', shouldn't the "\S+" be "\S+$"?
[b]EDIT 2[/b]
OK, '+' differs from '*' because there must be at least one of the preceding character. Surely this needs to be open to zero, no?
I am fully aware I'm the stupid one here, but surely the negative look ahead is doing our 'worst timeline' OR function here, all that needs to come after that is 'anything else' ... right?
I'm questioning everything right now :lol: ... although ... ... OK ... no, mine does work as it matches recaptcha ANYWHERE in the text following the main domain/tld/etc.
[b]Mine[/b] [url]https://regex101.com/r/N6rFFf/1[/url]
I've just put yours into that checker and it will allow URL/URIs with recaptcha in them:
[b]Yours[/b] [url]https://regex101.com/r/YYJhyH/1[/url]
I think that (considering mine has been working) the safest bet would be...
For those that don't otherwise differentiate between protocols use this:
[code]^(.*)google\.com((?!recaptcha).)*$[/code]
For security, then this works:
[code]^\Qhttps://\E(.*)google\.com((?!recaptcha).)*$[/code]
... or to save one character:
[code]^https\:\/\/(.*)google\.com((?!recaptcha).)*$[/code]
However, the one thing (which I'm guessing your "[u](?:[^\/:]+\.)?[/u]" bit was about) is around the subdomains, as I'm well aware that you could use the following to get around my very lax "(.*):
[code]https://sdfsdfsdf.boogle.com/www.google.com/asd[/code]