(I am sorry, I had to use "h++p://" as URI scheme name to get around the
'Ooops, something in your posting triggered my antispam filter...')
No, I am sorry - those URLs were not exact at all!
They were just general examples - of me, trying to..
1) ..first, make an entry that would also match - if present, any child domain (which may have both lowercase, uppercase and numbers) of sourceforge.net.
This, I achieved by using
Code: Select all
^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net
That
([a-zA-Z0-9]+)? term - is, for me, very easy to understand...:
[a-zA-Z0-9] : Any lowercase letter a-z / UPPERCASE LETTER A-Z / number 0-9.
[a-zA-Z0-9]+ : A repetition of this rule; the
plus sign
+.
([a-zA-Z0-9]+)? : This whole thing (in parenthesis)
may, or may
not, be present; the
question mark
?.
I tested this out, and found out that e.g. having only lowercase letters
[a-z] in this term, like
Code: Select all
^h++ps?://([a-z]+)?\.?sourceforge\.net
would
not match an URL e.g. like
I absolutely
had to have the
[a-zA-Z0-9] there.
2) ..then, expanding the entry further, so that it would also match any sub-folder (which also may have both lowercase, uppercase and numbers)..
- I obviously thought having the same term there would work:
Code: Select all
^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-zA-Z0-9]+)?
And yes, indeed it
did work, as I expected.
How..EVER..!
It was
then, when I tried to experiment more, that I found out that..
..while, as I wrote, using only lowercase letters
[a-z] in the
first (
child-domain) term, like
Code: Select all
^h++ps?://([a-z]+)?\.?sourceforge\.net
would
not match an URL e.g. like
Code: Select all
h++p://smallBIG123.sourceforge.net
..versus..
..using only lowercase letters
[a-z] in the
second (
sub-folder) term, like
Code: Select all
^h++ps?://([a-zA-Z0-9]+)?\.?sourceforge\.net/?([a-z]+)?
would in fact, for some reason I can not understand, match an URL e.g. like
Code: Select all
h++p://www.sourceforge.net/smallBIG123
This thing, that there is a difference in the behavior in these two situations, is what I can not understand.
This is the reason why I made this topic.
I would very much like to learn why this behavior is as it is,
or if that is too much work, for some kind soul to point me in the right direction where I may learn why this is.
Thank you,
barbaz, for creating a term that would work for the URLs I listed,
but these were just examples, and even if I
will indeed look into, and hopefully learn, that
(?:[^/:]+\.) syntax which you used,
I am sorry to say that I am a bit of a slow learner, so I need to take this one step at a time,
and I feel that I can not move on until I understand why this behavior is so different between those two terms...
That is,
why that child-domain term needs to have
([a-zA-Z0-9]+)? - but the sub-folder term does not need this.
Or, in another way,
why that
([a-z]+)? will work with the sub-folder term, but not with the child-domain term.
UPDATE #1:
In fact, it seems the different behavior is
not dependent on whether it is placed in the first or the second term,
rather, it is dependent on whether the term is
placed at the very end of the entry, or not..!
If it
is at the end, then
([a-z]+)? will match also UPPERCASE LETTERS and numbers..:
Code: Select all
^h++p://sourceforge\.net/?([a-z]+)?
will match
Code: Select all
h++p://sourceforge.net/smallBIG123
..versus..
If it is
not at the end, then
([a-z]+)? will
not match neither UPPERCASE LETTERS nor numbers..:
Code: Select all
^h++p://sourceforge\.net/?([a-z]+)?/evenmore
will
not match
Code: Select all
h++p://sourceforge.net/smallBIG123/evenmore
- it will only match something e.g. like
Code: Select all
h++p://sourceforge.net/small/evenmore
So..
this is the behavior that I am trying to understand, then..!
UPDATE #2:
Ahh!!
Now I believe I understand this!
The entry
Code: Select all
^h++p://sourceforge\.net/?([a-z]+)?
will match
h++p://sourceforge.net/smallBIG123
because it matches the
first characters;
h++p://sourceforge.net/smallBIG123
- and as long as it does match these, it simply doesn't care about those
/smallBIG123 that comes afterwards..!
So, that is why an entry e.g. like
will match
h++p://sourceforge.net/smallBIG123
but it will
not match
h++p://sourceforge.net/BIGsmall123
- because the
B character is not a lowercase
[a-z] character!
I have the PowerGREP program, which I remembered have a RegEx search type,
so I created some text files and tested, and found out about this!
Thank you very much,
barbaz, for telling me that this was the correct behavior,
- even if I had a tiny amount of doubt about it,
your statement kept me testing and trying, since I wanted to know what piece I was missing of the picture, so to say..!
