Page 1 of 1

Bug in XSS pattern matching sample?

Posted: Thu May 22, 2014 12:44 pm
by HappyNoScriptUser
First, thank you very very much for making NoScript!
I uninstalled my anti-virus software several years ago, when I found out that NoScript (plus a firewall) did the job.

As I wrote in the topic, I wonder if there is a bug in the parser for the RegEx in the XSS pattern matching sample.

As an example, If I want all of these URL's to be excluded from XSS protection..:

Code: Select all

http://sourceforge.net
https://sourceforge.net/
https://pre.sourceforge.net/post
https://preA2.sourceforge.net/postF5
..I would assume the correct entry should be

Code: Select all

^https?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-zA-Z0-9]+)?
However, I noticed that also this [a-z] entry

Code: Select all

^https?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-z]+)?
and this [a] entry

Code: Select all

^https?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a]+)?
and even this [] entry

Code: Select all

^https?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([]+)?
will match e.g. this URL

Code: Select all

https://preA2.sourceforge.net/postF5
At first I thought that maybe the first [a-zA-Z0-9] term somehow got inserted also into the second term..
..but that does not seem to be the case either, since..

..this entry

Code: Select all

^http://sourceforge\.net/?([a]+)?
and also even

Code: Select all

^http://sourceforge\.net/[]?
will match this URL

Code: Select all

http://sourceforge.net/postF5
I am really a very new newbie to this "RegEx world",
so it may be that there is no wrong with this behavior at all,
and instead it's just my interpretation that is wrong!
If so, I'm really sorry to have taken your time for no reason!

BTW, I am using Firefox v29.0.1 with NoScript 2.6.8.25rc2

Re: Bug in XSS pattern matching sample?

Posted: Thu May 22, 2014 1:41 pm
by barbaz
HappyNoScriptUser wrote:I uninstalled my anti-virus software several years ago, when I found out that NoScript (plus a firewall) did the job.
I'm not sure that was a good idea. NoScript, firewall, and anti-virus software all perform very different functions, and defense-in-depth is always useful.
HappyNoScriptUser wrote:I am really a very new newbie to this "RegEx world",
so it may be that there is no wrong with this behavior at all,
and instead it's just my interpretation that is wrong!
Because without an explicit start or end anchor the regex effectively expands to

Code: Select all

.*YOUR_REGEX.*
so what you're seeing is expected behavior.

For your example, if those are *exact* URLs, this should work

Code: Select all

^https?://(?:[^/:]+\.)?sourceforge\.net/(?:$|post[0-9A-Za-z]*/?$)

Re: Bug in XSS pattern matching sample?

Posted: Fri May 23, 2014 6:02 pm
by HappyNoScriptUser
(I am sorry, I had to use "h++p://" as URI scheme name to get around the
'
Ooops, something in your posting triggered my antispam filter...')

No, I am sorry - those URLs were not exact at all! :oops:
They were just general examples - of me, trying to..

1) ..first, make an entry that would also match - if present, any child domain (which may have both lowercase, uppercase and numbers) of sourceforge.net.
This, I achieved by using

Code: Select all

^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net
That ([a-zA-Z0-9]+)? term - is, for me, very easy to understand...:
[a-zA-Z0-9] : Any lowercase letter a-z / UPPERCASE LETTER A-Z / number 0-9.
[a-zA-Z0-9]+ : A repetition of this rule; the plus sign +.
([a-zA-Z0-9]+)? : This whole thing (in parenthesis) may, or may not, be present; the question mark ?.

I tested this out, and found out that e.g. having only lowercase letters [a-z] in this term, like

Code: Select all

^h++ps?://([a-z]+)?\.?sourceforge\.net
would not match an URL e.g. like

Code: Select all

h++p://smallBIG78.sourceforge.net
I absolutely had to have the [a-zA-Z0-9] there.

2) ..then, expanding the entry further, so that it would also match any sub-folder (which also may have both lowercase, uppercase and numbers)..
- I obviously thought having the same term there would work:

Code: Select all

^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-zA-Z0-9]+)?
And yes, indeed it did work, as I expected.

How..EVER..! :shock:

It was then, when I tried to experiment more, that I found out that..
..while, as I wrote, using only lowercase letters [a-z] in the first (child-domain) term, like

Code: Select all

^h++ps?://([a-z]+)?\.?sourceforge\.net
would not match an URL e.g. like

Code: Select all

h++p://smallBIG123.sourceforge.net
..versus..

..using only lowercase letters [a-z] in the second (sub-folder) term, like

Code: Select all

^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-z]+)?
would in fact, for some reason I can not understand, match an URL e.g. like

Code: Select all

h++p://www.sourceforge.net/smallBIG123
This thing, that there is a difference in the behavior in these two situations, is what I can not understand. :?:
This is the reason why I made this topic.
I would very much like to learn why this behavior is as it is,
or if that is too much work, for some kind soul to point me in the right direction where I may learn why this is.

Thank you, barbaz, for creating a term that would work for the URLs I listed,
but these were just examples, and even if I will indeed look into, and hopefully learn, that (?:[^/:]+\.) syntax which you used,
I am sorry to say that I am a bit of a slow learner, so I need to take this one step at a time,
and I feel that I can not move on until I understand why this behavior is so different between those two terms... :(

That is, why that child-domain term needs to have ([a-zA-Z0-9]+)? - but the sub-folder term does not need this.
Or, in another way, why that ([a-z]+)? will work with the sub-folder term, but not with the child-domain term.


:!: UPDATE #1: :!:

In fact, it seems the different behavior is not dependent on whether it is placed in the first or the second term,
rather, it is dependent on whether the term is placed at the very end of the entry, or not..! :o :shock:

If it is at the end, then ([a-z]+)? will match also UPPERCASE LETTERS and numbers..:

Code: Select all

^h++p://sourceforge\.net/?([a-z]+)?
will match

Code: Select all

h++p://sourceforge.net/smallBIG123
..versus..

If it is not at the end, then ([a-z]+)? will not match neither UPPERCASE LETTERS nor numbers..:

Code: Select all

^h++p://sourceforge\.net/?([a-z]+)?/evenmore
will not match

Code: Select all

h++p://sourceforge.net/smallBIG123/evenmore
- it will only match something e.g. like

Code: Select all

h++p://sourceforge.net/small/evenmore
So.. this is the behavior that I am trying to understand, then..! :)

:idea: :!: UPDATE #2: :idea: :!:

Ahh!! :D :D

Now I believe I understand this!

The entry

Code: Select all

^h++p://sourceforge\.net/?([a-z]+)?
will match
h++p://sourceforge.net/smallBIG123
because it matches the first characters;
h++p://sourceforge.net/smallBIG123
- and as long as it does match these, it simply doesn't care about those /smallBIG123 that comes afterwards..!

So, that is why an entry e.g. like

Code: Select all

^h++p://sourceforge\.net/([a-z]+)
will match
h++p://sourceforge.net/smallBIG123
but it will not match
h++p://sourceforge.net/BIGsmall123
- because the B character is not a lowercase [a-z] character!

I have the PowerGREP program, which I remembered have a RegEx search type,
so I created some text files and tested, and found out about this!

Thank you very much, barbaz, for telling me that this was the correct behavior,
- even if I had a tiny amount of doubt about it,
your statement kept me testing and trying, since I wanted to know what piece I was missing of the picture, so to say..! :)

Re: Bug in XSS pattern matching sample?

Posted: Sat May 24, 2014 12:09 am
by Thrawn
If you want your regex to describe the entire URL, then end it with a dollar sign.