Page 1 of 1

Downloading from different pages of the same website

Posted: Wed Apr 22, 2009 6:08 pm
by simster
Hello all!

Need some help here. I had the task of trying to download files (they are mostly pdf and excel files) from website. I will try and explain how the website works. (1) The website is password protected (2) The website has 2000 "sections", in which there is a pdf file in each section which I am trying to download. (3) So there is the main page with all the links to the 2000 sections. I can basically click on each link, and gets transferred to each section and download the pdf file, and do this one by one. (4) Each pdf file has a identical web address, and each file is differentiated by a different number which is indicated by * in the following link
https://village.trialwebsite.com/doc/(*)
This is just a sample weblink for illustration purposes only


Is there a way to do it faster? I tried using flashget and flashgot, but when I started downloading, all i downloaded are the pics and some strange files with weird extensions....basically everything but the pdf files!!!

Thanks everyone for your help

Re: Downloading from different pages of the same website

Posted: Wed Apr 22, 2009 7:51 pm
by therube
Depending upon the layout of the website (how the paths & files are named), the Build Gallery may help.

So perhaps something like:

Code: Select all

https://village.trialwebsite.com/doc/section[1-2000;1]/somesequenceofnumbers[1-3;1].pdf
https://village.trialwebsite.com/doc/section[1-2000;1]/somesequenceofnumbers[1-3;1].xls
If there is no workable pattern to the naming, then you might look at a program like, HTTrack Website Copier.