shouldn't chrome: pages be excluded from surrogate matching

Bug reports and enhancement requests
Post Reply
al_9x
Master Bug Buster
Posts: 931
Joined: Thu Mar 19, 2009 4:52 pm

shouldn't chrome: pages be excluded from surrogate matching

Post by al_9x »

First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away. Perhaps an empty pattern "@" could also mean everything.

I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?
Last edited by al_9x on Tue Jun 08, 2010 3:47 am, edited 1 time in total.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
al_9x
Master Bug Buster
Posts: 931
Joined: Thu Mar 19, 2009 4:52 pm

Re: shouldn't chrome: pages be excluded from surrogate match

Post by al_9x »

The above question was in reference to a surrogate, which some may find useful.

When DOM storage is disabled, window.localStorage instead of returning null/nothing, throws an unexpected exception which, for example, breaks http://www.apple.com

The following surrogate, takes care of that.

Code: Select all

user_pref("noscript.surrogate.localStorage.replacement", "__defineGetter__('localStorage', function() {});");
user_pref("noscript.surrogate.localStorage.sources", "@^https?:");
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
User avatar
Giorgio Maone
Site Admin
Posts: 9524
Joined: Wed Mar 18, 2009 11:22 pm
Location: Palermo - Italy
Contact:

Re: shouldn't chrome: pages be excluded from surrogate match

Post by Giorgio Maone »

al_9x wrote:First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away.
No, it's currently not optimized (translates to .*). But it's a good idea, so I've just turned it in an even faster path, replacing the test() method of the AddressMatcher instance with { return true; } and fully skipping regular expression evaluation.
al_9x wrote: Perhaps an empty pattern "@" could also mean everything.
Nope, it would be error prone (currently empty patterns match nothing to be fool-proof).
al_9x wrote: I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?
Yes, it is by design because NoScript's AddressMatcher "class" is reused in many places (and should be used even in more places where it isn't, e.g. in XSS exceptions) so it needs to be as much flexible as possible.
al_9x wrote: When DOM storage is disabled, window.localStorage instead of returning null/nothing, throws an unexpected exception which, for example, breaks http://www.apple.com

The following surrogate, takes care of that.
Good call, thanks for the hint.
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
al_9x
Master Bug Buster
Posts: 931
Joined: Thu Mar 19, 2009 4:52 pm

Re: shouldn't chrome: pages be excluded from surrogate match

Post by al_9x »

Giorgio Maone wrote:
al_9x wrote:First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away.
No, it's currently not optimized (translates to .*).
No, that was clear, I was actually wondering if the rx engine optimized it. In the case of a test rather than a match, it doesn't need to scan the full input string for the last * in a pattern.
Giorgio Maone wrote:
al_9x wrote: I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?
Yes, it is by design because NoScript's AddressMatcher "class" is reused in many places (and should be used even in more places where it isn't, e.g. in XSS exceptions) so it needs to be as much flexible as possible.
I am not suggesting that AddressMatcher shouldn't be capable of matching certain urls. The question is, should chrome and about pages be available for page level surrogate injection or should they be prefiltered before the surrogate source pattern gets a chance to match them. In other words, did you intend for page level surrogates to be injectable into chrome/about pages? And if so, I am curious, for what scenario?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
User avatar
Giorgio Maone
Site Admin
Posts: 9524
Joined: Wed Mar 18, 2009 11:22 pm
Location: Palermo - Italy
Contact:

Re: shouldn't chrome: pages be excluded from surrogate match

Post by Giorgio Maone »

al_9x wrote: No, that was clear, I was actually wondering if the rx engine optimized it. In the case of a test rather than a match, it doesn't need to scan the full input string for the last * in a pattern.
If you've got time, you may want to investigate the source.
However I tried to run the following on Fx 3.6.3 with a 2.6 Ghz CPU:

Code: Select all

var iterations = 500000;
var r = /.*/
// var r = /(?:)/ // empty regexp variant
// var r = { test: function() { return true } } // dummy variant
var arr = [];
for (var j = 10 * 1024; j-- > 0;) {
  arr.push(String.fromCharCode(Math.round(Math.random() * 255)));
}
var str = arr.join("");

var t = Date.now();
for (var j = iterations; j-- > 0;) {
  r.test(str);
}

alert((Date.now() - t) / iterations)


Looks like no optimization is done, since /.*/ runs about 4 times slower (on average, since it's very variable) than /(?:)/, whose timings are much more repeatable too. Obviously the dummy no-regexp test runs 2 or 3 times faster than the fastest regexp.
However the difference it practically negligible for the AddressMatcher use cases, since allthe approaches run in the microsecond range even on 10KB strings, which are rather unusual as URLs.
al_9x wrote: did you intend for page level surrogates to be injectable into chrome/about pages?
Page level surrogates are meant to be injected in content pages matching their "sources" preference. So yes, even in chrome/about pages.
And if so, I am curious, for what scenario?
Live customization?
Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Post Reply