Page 1 of 1
web.archive.org files not downloading correctly
Posted: Mon Feb 03, 2014 12:31 am
by msjs
I'm trying to download more than 500 pages from a website that disappeared over 10 years ago. Flashgot removes the web.archive.org part of the URL.
This is what I want
http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
This is what I get
http://www.herbweb.com/herbage/1-A.htm
Is there any way to get this working?
Re: web.archive.org files not downloading correctly
Posted: Mon Feb 03, 2014 1:36 am
by Thrawn
I don't think that
Code: Select all
http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
is a valid URL.
Re: web.archive.org files not downloading correctly
Posted: Mon Feb 03, 2014 2:01 am
by msjs
Click on it, you will see it is!
Re: web.archive.org files not downloading correctly
Posted: Wed Feb 05, 2014 5:22 pm
by therube
about:config
flashgot.redir.generic.enabled
&
flashgot.redir.generic.exceptions
Toggling the first will then make things work.
But likely will be overly broad.
Setting an exception, preferred method, should also work I would think.
Just not sure just what to set it to at this time?
(I'm sure barbaz will come up with the correct string

.)
---
I'll note that it saved the html "page", but not the associated picture...
Would expect ? a File | Save As (outside of FlashGot) to save "everything".
Likewise a "spider" type program might do similar in an automated fashion.
Re: web.archive.org files not downloading correctly
Posted: Wed Feb 05, 2014 6:31 pm
by barbaz
therube wrote:(I'm sure barbaz will come up with the correct string

.)
Not without documentation... is this like NoScript's AddressMatcher, is it one big regexp, or what?
Probably worth adding web.archive.org as a default exception anyway.
therube wrote:Would expect ? a File | Save As (outside of FlashGot) to save "everything".
In my case it didn't. Standalone Wget was also useless
Then again, I was trying to download an archive of a partly script-generated page, so some resources weren't called directly by the HTML source.
Re: web.archive.org files not downloading correctly
Posted: Wed Feb 05, 2014 11:38 pm
by barbaz
OK, I looked at the source and the pref is a list of space-separated regexes. This addition should work:
Code: Select all
^https?://web\.archive\.org/web/\d+
However, there appears to be a bug in FlashGot such that this pref, if set, controls where *to* apply the redirect fixing...
The bug can be fixed by changing line 1123 of RedirectContext.js to
Code: Select all
var m = (!context.genericExceptionsRx || !context.genericExceptionsRx.test(url)) &&
but I did only basic testing so I'm not sure that doesn't screw something else up...
EDIT Decided to do further testing, and looks like the patch I had originally posted actually disables the whole feature.
Corrected above, sorry about that.
Re: web.archive.org files not downloading correctly
Posted: Fri Feb 07, 2014 1:17 pm
by barbaz
And you can temporarily use this value for flashgot.redir.generic.exceptions until this bug gets fixed
Code: Select all
^https?://(?!web\.archive\.org/web/\d+)
Re: web.archive.org files not downloading correctly
Posted: Mon Feb 17, 2014 2:26 am
by msjs
Haven't had time to get back to this til now. Thanks for the help.
Re: web.archive.org files not downloading correctly
Posted: Mon Feb 17, 2014 12:58 pm
by Giorgio Maone
Please check
latest development build 1.5.5.97rc2, thank you.