PDA

View Full Version : Freeware mass file downloader



Agent_24
19-03-2011, 12:25 PM
Not sure exactly what you call these things, Lightning download has a feature called "Site Storm" and GetRight has something similar... (obviously since they are basically the same program)

Basically the need is to download an entire directory and subdirectories and files in all directories from a website.


DownThemAll is halfway there, it can download all files from one page\directory, but it won't work through subdirectories etc.


Does anyone know of a free program that can do this?

Snorkbox
19-03-2011, 12:53 PM
I think you may be looking for Webreaper which does work under Win7 64 bit as it happens.

http://www.webreaper.net/

HTH.

fred_fish
19-03-2011, 12:53 PM
wget?

from 'man wget':

Wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site. This is sometimes referred to as "recursive downloading." While doing that, Wget respects the Robot Exclusion Standard (`/robots.txt'). Wget can be instructed to convert the links in downloaded files to point at the local files, for offline viewing

jwil1
19-03-2011, 05:15 PM
HT Track Website Copier (http://www.httrack.com/) is what you want.

Erayd
19-03-2011, 05:46 PM
wget?

from 'man wget':Yep, wget is perfect if the OP is happy to use a CLI tool.

fred_fish
19-03-2011, 06:15 PM
I didn't pick Agent_24 as someone who would wet his pants at the sight of a command line :cool:

Agent_24
19-03-2011, 08:10 PM
:lol: no I don't mind CLI, although some programs can be annoying to use.

Didn't know wget could do this actually, but I was looking really for something that can do this on Windows.

That WebReaper program looked promising but it had a lot of errors trying to download some of the files, which I can download just fine manually, so not sure if it's a bug or what but it doesn't seem useful in this particular instance. I think I will keep it for future use though...

Have yet to try HT Track....

Erayd
20-03-2011, 05:34 PM
...but I was looking really for something that can do this on Windows.Wget for Windows (http://gnuwin32.sourceforge.net/packages/wget.htm).

Agent_24
20-03-2011, 06:26 PM
Oooooo :D

Hopefully that will work, then!

kahawai chaser
26-03-2011, 09:40 AM
I think it can be done manually with Google: Google Docs (Spreadsheet). Use command =importxml("url" "query"), sometimes with Google Apps Scripts (http://code.google.com/googleapps/appsscript/) and Google Code. (http://code.google.com/) You can host your own scripts, if you want, on Google App Engine (http://code.google.com/appengine/). I have extracted url's/sub url's from websites for competitive analysis against my sites, or to search for my content that's been scraped. Then quickly look into the content for those urls.

The trick and tedious part, is what elements to apply (e.g. div, class, etc) and how to assign it to the "query" part in the formula, which you can get elements from the source code for the web site (or search results) in question. Helps of course if you know web design html codes and scripting. Works quick, but just getting the script set up (or series of scripts) for it to work. Then you may need filter/merge using spreadsheet commands. But should be able to build own scripts with excel/docs - basic tutorial and examples at distilled UK. (http://www.distilled.co.uk/blog/seo/how-to-build-agile-seo-tools-using-google-docs/)

Agent_24
26-03-2011, 10:15 AM
Wget for Windows did the trick where WebReaper could not :)

Not sure about HT Track Website Copier but will remember it for future reference also.

Erayd
26-03-2011, 10:20 AM
I don't think that's what he's after Kahawai Chaser - he's looking for something to mirror an entire site.