close

Privacy guaranteed - Your email is not shared with anyone.

Web Site Download Tool

Discussion in 'Tech Talk' started by Team Greenbaum, Dec 4, 2004.

  1. Team Greenbaum

    Team Greenbaum Millennium Member

    Joined:
    Jul 30, 1999
    Messages:
    90
    Likes Received:
    0
    Does anyone know a good web site download tool? One that downloads media and archive files? I've tried a few, most recently "7 Download Service". It worked great on html and image files but I never could get it to download .mpg, .mp3 or .zip files...
     
  2. chevrofreak

    chevrofreak Senior Member

    Joined:
    Dec 27, 2001
    Messages:
    2,696
    Likes Received:
    0
    Location:
    Billings, Montana
    I like Net Transport from Xi
     

  3. Team Greenbaum

    Team Greenbaum Millennium Member

    Joined:
    Jul 30, 1999
    Messages:
    90
    Likes Received:
    0
    Is there a way to make it download the whole web site? I could only get it to download 1 file at a time. I'm looking for something that will crawl a site and download the whole thing.
     
  4. Sinister Angel

    Sinister Angel I'd Hit It!

    Joined:
    Oct 11, 2004
    Messages:
    252
    Likes Received:
    0
    Location:
    Traverse City, Michigan
    Do they have a port of wget for windows?
     
  5. Team Greenbaum

    Team Greenbaum Millennium Member

    Joined:
    Jul 30, 1999
    Messages:
    90
    Likes Received:
    0
  6. Sinister Angel

    Sinister Angel I'd Hit It!

    Joined:
    Oct 11, 2004
    Messages:
    252
    Likes Received:
    0
    Location:
    Traverse City, Michigan
    Glad to help! I know it works wonders on my linux box.
     
  7. lomfs24

    lomfs24

    Joined:
    Apr 19, 2003
    Messages:
    2,388
    Likes Received:
    144
    Location:
    Montana
    How do you make wget pull an entire website? I have worked with it a little but mostly as a single file transport tool.
     
  8. grantglock

    grantglock /dev/null

    Joined:
    Feb 20, 2004
    Messages:
    219
    Likes Received:
    0
    Location:
    Iowa

    wget -r http://glocktalk.com
     
  9. lomfs24

    lomfs24

    Joined:
    Apr 19, 2003
    Messages:
    2,388
    Likes Received:
    144
    Location:
    Montana
  10. Team Greenbaum

    Team Greenbaum Millennium Member

    Joined:
    Jul 30, 1999
    Messages:
    90
    Likes Received:
    0
    Wget rocks! Almost.

    It worked great on the first site I tried it on. It downloaded everything, including media files and changed all URLs to local links. However, on the second site I tried, it immediately gets a 302 redirect to a completely different site. It's as if the web server is recognizing that I'm using wget instead of a browser and responding with the 302 redirect. Any ideas on what I can do to fix this?

    Here are the options I'm using:
    wget --output-file="wget.log" --recursive --level=inf --timestamping --convert-links --wait=1 --random-wait https://user:password@www.website.com
     
  11. HerrGlock

    HerrGlock Scouts Out CLM

    Joined:
    Dec 28, 2000
    Messages:
    23,801
    Likes Received:
    254
    The web site is seeing that you have a getter instead of a browser, you are right.

    There are things you can do, but most of the people who really don't want you to slurp their site already know them and have something to counter that too.

    There is, however, a plugin for firefox/mozilla that would work as it's a browser doing the slurping.

    Just something to think about.
    DanH
     
  12. Sinister Angel

    Sinister Angel I'd Hit It!

    Joined:
    Oct 11, 2004
    Messages:
    252
    Likes Received:
    0
    Location:
    Traverse City, Michigan
    Actually, you can have WGET send a forged AGENT header or maybe its a refer header as well.

    --referer=url
    Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.


    and


    -U agent-string
    --user-agent=agent-string
    Identify as agent-string to the HTTP server.

    The HTTP protocol allows the clients to identify themselves using a "User-Agent" header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current version number of Wget.

    However, some sites have been known to impose the policy of tailoring the output according to the "User-Agent"-supplied information. While conceptually this is not such a bad idea, it has been abused by servers denying information to clients other than "Mozilla" or Microsoft "Internet Explorer". This option allows you to change the "User-Agent" line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.

    Hope this helps!