Page 1 of 1

web.archive.org files not downloading correctly

Posted: Mon Feb 03, 2014 12:31 am
by msjs
I'm trying to download more than 500 pages from a website that disappeared over 10 years ago. Flashgot removes the web.archive.org part of the URL.
This is what I want http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
This is what I get http://www.herbweb.com/herbage/1-A.htm

Is there any way to get this working?

Re: web.archive.org files not downloading correctly

Posted: Mon Feb 03, 2014 1:36 am
by Thrawn
I don't think that

Code: Select all

http://web.archive.org/web/20020206062707/http://www.herbweb.com/herbage/1-A.htm
is a valid URL.

Re: web.archive.org files not downloading correctly

Posted: Mon Feb 03, 2014 2:01 am
by msjs
Click on it, you will see it is!

Re: web.archive.org files not downloading correctly

Posted: Wed Feb 05, 2014 5:22 pm
by therube
about:config

flashgot.redir.generic.enabled
&
flashgot.redir.generic.exceptions

Toggling the first will then make things work.
But likely will be overly broad.

Setting an exception, preferred method, should also work I would think.
Just not sure just what to set it to at this time?
(I'm sure barbaz will come up with the correct string ;-).)

---

I'll note that it saved the html "page", but not the associated picture...

Would expect ? a File | Save As (outside of FlashGot) to save "everything".
Likewise a "spider" type program might do similar in an automated fashion.

Re: web.archive.org files not downloading correctly

Posted: Wed Feb 05, 2014 6:31 pm
by barbaz
therube wrote:(I'm sure barbaz will come up with the correct string ;-).)
Not without documentation... is this like NoScript's AddressMatcher, is it one big regexp, or what?
Probably worth adding web.archive.org as a default exception anyway.
therube wrote:Would expect ? a File | Save As (outside of FlashGot) to save "everything".
In my case it didn't. Standalone Wget was also useless :?:
Then again, I was trying to download an archive of a partly script-generated page, so some resources weren't called directly by the HTML source.

Re: web.archive.org files not downloading correctly

Posted: Wed Feb 05, 2014 11:38 pm
by barbaz
OK, I looked at the source and the pref is a list of space-separated regexes. This addition should work:

Code: Select all

^https?://web\.archive\.org/web/\d+
However, there appears to be a bug in FlashGot such that this pref, if set, controls where *to* apply the redirect fixing...
The bug can be fixed by changing line 1123 of RedirectContext.js to

Code: Select all

      var m = (!context.genericExceptionsRx || !context.genericExceptionsRx.test(url)) &&
but I did only basic testing so I'm not sure that doesn't screw something else up...

EDIT Decided to do further testing, and looks like the patch I had originally posted actually disables the whole feature. :oops:
Corrected above, sorry about that.

Re: web.archive.org files not downloading correctly

Posted: Fri Feb 07, 2014 1:17 pm
by barbaz
And you can temporarily use this value for flashgot.redir.generic.exceptions until this bug gets fixed

Code: Select all

^https?://(?!web\.archive\.org/web/\d+)

Re: web.archive.org files not downloading correctly

Posted: Mon Feb 17, 2014 2:26 am
by msjs
Haven't had time to get back to this til now. Thanks for the help.

Re: web.archive.org files not downloading correctly

Posted: Mon Feb 17, 2014 12:58 pm
by Giorgio Maone
Please check latest development build 1.5.5.97rc2, thank you.