Blocking *all* JS on archived pages on the Wayback Machine?

Ask for help about NoScript, no registration needed to post
Sophira
Posts: 9
Joined: Wed Mar 21, 2012 5:00 am

Blocking *all* JS on archived pages on the Wayback Machine?

Post by Sophira »

Hi. :)

I have a bit of a problem that I can't seem to solve completely with NoScript and ABE. I use the Internet Archive Wayback Machine quite extensively, as well as the Internet Archive in general.

Unfortunately, the Wayback Machine's main interface for selecting the archived pages/dates requires JavaScript, so I pretty much have to enable JavaScript for web.archive.org. However, this has the very bad side effect of also enabling JavaScript for all the archived sites, since they're displayed under the web.archive.org domain.

I've tried to mitigate this as best I can using this ABE rule:

Code: Select all

Site web.archive.org
Deny INCLUSION(SCRIPT) from ^https?://web\.archive\.org\/web\/[0-9]+\/
Accept from web.archive.org
Deny INCLUSION(SCRIPT)
This causes NoScript to block archived pages from loading additional external JavaScript that's hosted in the Wayback Machine, while still allowing the main interface to work. It also stops any other sites from loading archived JavaScript.

Unfortunately, it doesn't protect against inlined JavaScript that may be directly on the archived page itself in <SCRIPT> tags. Is there a way to protect against this? I understand that this is somewhat of an edge case, but I'd love it if this was possible.

Thanks for any help anybody can give!
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
barbaz
Senior Member
Posts: 11118
Joined: Sat Aug 03, 2013 5:45 pm

Re: Blocking *all* JS on archived pages on the Wayback Machi

Post by barbaz »

Sandbox... try this?

Code: Select all

Site web.archive.org
Deny INCLUSION(SCRIPT, OBJ, FONT, XHR, MEDIA) from ^https?://web\.archive\.org\/web\/[0-9]+\/
Sandbox
If you need JS after doing this you need to make surrogate script of the site's scripts. Download the JS you want, then set it up to run (replacement pref is file: URL pointing to downloaded script)
Only problem with this approach is you can't directly surrogate inline scripts, I think the best you can do is paste their contents into a !@ surrogate...

Also does anything in viewtopic.php?f=7&t=19174 help at all?
*Always* check the changelogs BEFORE updating that important software!
-
Sophira
Posts: 9
Joined: Wed Mar 21, 2012 5:00 am

Re: Blocking *all* JS on archived pages on the Wayback Machi

Post by Sophira »

Thanks! I hadn't really realised that sandboxing was an option. It also turns out I wasn't fully understanding how the Wayback Machine did some things. I now have a much more comprehensive collection of rules that does exactly what I want (order is important!):

Code: Select all

Site ^https?://web\.archive\.org\/web\/[0-9]+\/
# this first line allows us to view source on archived pages(!)
Accept INCLUSION() from chrome://browser/content/browser.xul
Accept INCLUSION(CSS, IMAGE) from ^https?://web\.archive\.org\/web\/[0-9]+\/ ^https?://web\.archive\.org\/web\/[0-9]+cs_\/
Deny INCLUSION()
Sandbox

Site ^https?://web\.archive\.org\/web\/[0-9]+cs_\/
Accept INCLUSION(CSS) from ^https?://web\.archive\.org\/web\/[0-9]+\/ ^https?://web\.archive\.org\/web\/[0-9]+cs_\/
Deny INCLUSION()
Sandbox

Site ^https?://web\.archive\.org\/web\/[0-9]+js_\/
Deny

Site ^https?://web\.archive\.org\/web\/[0-9]+im_\/
Accept INCLUSION(IMAGE)
Deny INCLUSION()

Site web.archive.org
Accept from web.archive.org
Deny INCLUSION(SCRIPT)

# deny archived pages from including anything else
Site ALL
Deny ALL from ^https?://web\.archive\.org\/web\/[0-9]+\/
It turns out that the Wayback Machine parses the archived pages for certain types of includes and uses different URLs to access them, but it doesn't do so completely reliably. There are also a lot of failsafes in these rules, but I think it's worth it.

As far as I can tell, these rules should pretty reliably protect against any rogue JS and other potentially-abuseable items on archived pages (and preventing other sites from using them) while still allowing pages to display properly and for the main Wayback Machine interface to work properly. Note that these rules (like my original rules) will disable the top bar that the Wayback Machine overlays on archived pages, but I think this is worth it.

Thank you again!
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
barbaz
Senior Member
Posts: 11118
Joined: Sat Aug 03, 2013 5:45 pm

Re: Blocking *all* JS on archived pages on the Wayback Machi

Post by barbaz »

You're welcome, thanks for posting that ruleset! :)
*Always* check the changelogs BEFORE updating that important software!
-
Post Reply