Page 1 of 1

shouldn't chrome: pages be excluded from surrogate matching

Posted: Tue Jun 08, 2010 12:02 am
by al_9x
First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away. Perhaps an empty pattern "@" could also mean everything.

I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?

Re: shouldn't chrome: pages be excluded from surrogate match

Posted: Tue Jun 08, 2010 12:41 am
by al_9x
The above question was in reference to a surrogate, which some may find useful.

When DOM storage is disabled, window.localStorage instead of returning null/nothing, throws an unexpected exception which, for example, breaks http://www.apple.com

The following surrogate, takes care of that.

Code: Select all

user_pref("noscript.surrogate.localStorage.replacement", "__defineGetter__('localStorage', function() {});");
user_pref("noscript.surrogate.localStorage.sources", "@^https?:");

Re: shouldn't chrome: pages be excluded from surrogate match

Posted: Tue Jun 08, 2010 7:07 am
by Giorgio Maone
al_9x wrote:First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away.
No, it's currently not optimized (translates to .*). But it's a good idea, so I've just turned it in an even faster path, replacing the test() method of the AddressMatcher instance with { return true; } and fully skipping regular expression evaluation.
al_9x wrote: Perhaps an empty pattern "@" could also mean everything.
Nope, it would be error prone (currently empty patterns match nothing to be fool-proof).
al_9x wrote: I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?
Yes, it is by design because NoScript's AddressMatcher "class" is reused in many places (and should be used even in more places where it isn't, e.g. in XSS exceptions) so it needs to be as much flexible as possible.
al_9x wrote: When DOM storage is disabled, window.localStorage instead of returning null/nothing, throws an unexpected exception which, for example, breaks http://www.apple.com

The following surrogate, takes care of that.
Good call, thanks for the hint.

Re: shouldn't chrome: pages be excluded from surrogate match

Posted: Tue Jun 08, 2010 8:03 am
by al_9x
Giorgio Maone wrote:
al_9x wrote:First a question, does "@*" which turns into /.*/ , result in the scanning of the whole string to consume the greedy * or is it optimized away.
No, it's currently not optimized (translates to .*).
No, that was clear, I was actually wondering if the rx engine optimized it. In the case of a test rather than a match, it doesn't need to scan the full input string for the last * in a pattern.
Giorgio Maone wrote:
al_9x wrote: I wanted a page level surrogate for all pages ("@*") but noticed that it matched chrome: and about: urls (chrome://venkman/content/venkman-output-window.html, about:config). Is that by design?
Yes, it is by design because NoScript's AddressMatcher "class" is reused in many places (and should be used even in more places where it isn't, e.g. in XSS exceptions) so it needs to be as much flexible as possible.
I am not suggesting that AddressMatcher shouldn't be capable of matching certain urls. The question is, should chrome and about pages be available for page level surrogate injection or should they be prefiltered before the surrogate source pattern gets a chance to match them. In other words, did you intend for page level surrogates to be injectable into chrome/about pages? And if so, I am curious, for what scenario?

Re: shouldn't chrome: pages be excluded from surrogate match

Posted: Tue Jun 08, 2010 9:34 am
by Giorgio Maone
al_9x wrote: No, that was clear, I was actually wondering if the rx engine optimized it. In the case of a test rather than a match, it doesn't need to scan the full input string for the last * in a pattern.
If you've got time, you may want to investigate the source.
However I tried to run the following on Fx 3.6.3 with a 2.6 Ghz CPU:

Code: Select all

var iterations = 500000;
var r = /.*/
// var r = /(?:)/ // empty regexp variant
// var r = { test: function() { return true } } // dummy variant
var arr = [];
for (var j = 10 * 1024; j-- > 0;) {
  arr.push(String.fromCharCode(Math.round(Math.random() * 255)));
}
var str = arr.join("");

var t = Date.now();
for (var j = iterations; j-- > 0;) {
  r.test(str);
}

alert((Date.now() - t) / iterations)


Looks like no optimization is done, since /.*/ runs about 4 times slower (on average, since it's very variable) than /(?:)/, whose timings are much more repeatable too. Obviously the dummy no-regexp test runs 2 or 3 times faster than the fastest regexp.
However the difference it practically negligible for the AddressMatcher use cases, since allthe approaches run in the microsecond range even on 10KB strings, which are rather unusual as URLs.
al_9x wrote: did you intend for page level surrogates to be injectable into chrome/about pages?
Page level surrogates are meant to be injected in content pages matching their "sources" preference. So yes, even in chrome/about pages.
And if so, I am curious, for what scenario?
Live customization?