web.archive.org files not downloading correctly

Ask for help about FlashGot, no registration needed to post
Post Reply
msjs
Posts: 3
Joined: Sun Feb 02, 2014 8:57 am

web.archive.org files not downloading correctly

Post by msjs »

I'm trying to download more than 500 pages from a website that disappeared over 10 years ago. Flashgot removes the web.archive.org part of the URL.
This is what I want http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
This is what I get http://www.herbweb.com/herbage/1-A.htm

Is there any way to get this working?
Last edited by msjs on Mon Feb 03, 2014 11:52 am, edited 1 time in total.
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
User avatar
Thrawn
Master Bug Buster
Posts: 3106
Joined: Mon Jan 16, 2012 3:46 am
Location: Australia
Contact:

Re: web.archive.org files not downloading correctly

Post by Thrawn »

I don't think that

Code: Select all

http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
is a valid URL.
======
Thrawn
------------
Religion is not the opium of the masses. Daily life is the opium of the masses.

True religion, which dares to acknowledge death and challenge the way we live, is an attempt to wake up.
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0
msjs
Posts: 3
Joined: Sun Feb 02, 2014 8:57 am

Re: web.archive.org files not downloading correctly

Post by msjs »

Click on it, you will see it is!
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
User avatar
therube
Ambassador
Posts: 7969
Joined: Thu Mar 19, 2009 4:17 pm
Location: Maryland USA

Re: web.archive.org files not downloading correctly

Post by therube »

about:config

flashgot.redir.generic.enabled
&
flashgot.redir.generic.exceptions

Toggling the first will then make things work.
But likely will be overly broad.

Setting an exception, preferred method, should also work I would think.
Just not sure just what to set it to at this time?
(I'm sure barbaz will come up with the correct string ;-).)

---

I'll note that it saved the html "page", but not the associated picture...

Would expect ? a File | Save As (outside of FlashGot) to save "everything".
Likewise a "spider" type program might do similar in an automated fashion.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14 Pinball NoScript FlashGot AdblockPlus
Mozilla/5.0 (Windows NT 5.1; rv:26.0) Gecko/20100101 SeaMonkey/2.23
barbaz
Senior Member
Posts: 11064
Joined: Sat Aug 03, 2013 5:45 pm

Re: web.archive.org files not downloading correctly

Post by barbaz »

therube wrote:(I'm sure barbaz will come up with the correct string ;-).)
Not without documentation... is this like NoScript's AddressMatcher, is it one big regexp, or what?
Probably worth adding web.archive.org as a default exception anyway.
therube wrote:Would expect ? a File | Save As (outside of FlashGot) to save "everything".
In my case it didn't. Standalone Wget was also useless :?:
Then again, I was trying to download an archive of a partly script-generated page, so some resources weren't called directly by the HTML source.
*Always* check the changelogs BEFORE updating that important software!
Mozilla/5.0 (X11; Linux i686; rv:30.0) Gecko/20100101 Firefox/30.0 SeaMonkey/2.27a1
barbaz
Senior Member
Posts: 11064
Joined: Sat Aug 03, 2013 5:45 pm

Re: web.archive.org files not downloading correctly

Post by barbaz »

OK, I looked at the source and the pref is a list of space-separated regexes. This addition should work:

Code: Select all

^https?://web\.archive\.org/web/\d+
However, there appears to be a bug in FlashGot such that this pref, if set, controls where *to* apply the redirect fixing...
The bug can be fixed by changing line 1123 of RedirectContext.js to

Code: Select all

      var m = (!context.genericExceptionsRx || !context.genericExceptionsRx.test(url)) &&
but I did only basic testing so I'm not sure that doesn't screw something else up...

EDIT Decided to do further testing, and looks like the patch I had originally posted actually disables the whole feature. :oops:
Corrected above, sorry about that.
*Always* check the changelogs BEFORE updating that important software!
Mozilla/5.0 (X11; Linux i686; rv:30.0) Gecko/20100101 Firefox/30.0 SeaMonkey/2.27a1
barbaz
Senior Member
Posts: 11064
Joined: Sat Aug 03, 2013 5:45 pm

Re: web.archive.org files not downloading correctly

Post by barbaz »

And you can temporarily use this value for flashgot.redir.generic.exceptions until this bug gets fixed

Code: Select all

^https?://(?!web\.archive\.org/web/\d+)
*Always* check the changelogs BEFORE updating that important software!
Mozilla/5.0 (X11; Linux i686; rv:30.0) Gecko/20100101 Firefox/30.0 SeaMonkey/2.27a1
msjs
Posts: 3
Joined: Sun Feb 02, 2014 8:57 am

Re: web.archive.org files not downloading correctly

Post by msjs »

Haven't had time to get back to this til now. Thanks for the help.
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0
User avatar
Giorgio Maone
Site Admin
Posts: 9524
Joined: Wed Mar 18, 2009 11:22 pm
Location: Palermo - Italy
Contact:

Re: web.archive.org files not downloading correctly

Post by Giorgio Maone »

Please check latest development build 1.5.5.97rc2, thank you.
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
Post Reply