close

Privacy guaranteed - Your email is not shared with anyone.

Web Site Download Tool

Discussion in 'Tech Talk' started by Team Greenbaum, Dec 4, 2004.


  1. Team Greenbaum

    Team Greenbaum
    Expand Collapse
    Millennium Member

    Joined:
    Jul 30, 1999
    90
    0
    Does anyone know a good web site download tool? One that downloads media and archive files? I've tried a few, most recently "7 Download Service". It worked great on html and image files but I never could get it to download .mpg, .mp3 or .zip files...
     

    Wanna kill these ads? We can help!
  2. chevrofreak

    chevrofreak
    Expand Collapse
    Senior Member

    Joined:
    Dec 27, 2001
    2,696
    0
    Location:
    Billings, Montana
    I like Net Transport from Xi
     

  3. Team Greenbaum

    Team Greenbaum
    Expand Collapse
    Millennium Member

    Joined:
    Jul 30, 1999
    90
    0
    Is there a way to make it download the whole web site? I could only get it to download 1 file at a time. I'm looking for something that will crawl a site and download the whole thing.
     
  4. Sinister Angel

    Sinister Angel
    Expand Collapse
    I'd Hit It!

    Joined:
    Oct 11, 2004
    252
    0
    Location:
    Traverse City, Michigan
    Do they have a port of wget for windows?
     
  5. Team Greenbaum

    Team Greenbaum
    Expand Collapse
    Millennium Member

    Joined:
    Jul 30, 1999
    90
    0
  6. Sinister Angel

    Sinister Angel
    Expand Collapse
    I'd Hit It!

    Joined:
    Oct 11, 2004
    252
    0
    Location:
    Traverse City, Michigan
    Glad to help! I know it works wonders on my linux box.
     
  7. lomfs24

    lomfs24
    Expand Collapse

    Joined:
    Apr 19, 2003
    2,028
    0
    Location:
    Montana
    How do you make wget pull an entire website? I have worked with it a little but mostly as a single file transport tool.
     
  8. grantglock

    grantglock
    Expand Collapse
    /dev/null

    Joined:
    Feb 20, 2004
    219
    0
    Location:
    Iowa

    wget -r http://glocktalk.com
     
  9. lomfs24

    lomfs24
    Expand Collapse

    Joined:
    Apr 19, 2003
    2,028
    0
    Location:
    Montana
  10. Team Greenbaum

    Team Greenbaum
    Expand Collapse
    Millennium Member

    Joined:
    Jul 30, 1999
    90
    0
    Wget rocks! Almost.

    It worked great on the first site I tried it on. It downloaded everything, including media files and changed all URLs to local links. However, on the second site I tried, it immediately gets a 302 redirect to a completely different site. It's as if the web server is recognizing that I'm using wget instead of a browser and responding with the 302 redirect. Any ideas on what I can do to fix this?

    Here are the options I'm using:
    wget --output-file="wget.log" --recursive --level=inf --timestamping --convert-links --wait=1 --random-wait https://user:password@www.website.com
     
  11. HerrGlock

    HerrGlock
    Expand Collapse
    Scouts Out
    CLM

    Joined:
    Dec 28, 2000
    23,791
    178
    The web site is seeing that you have a getter instead of a browser, you are right.

    There are things you can do, but most of the people who really don't want you to slurp their site already know them and have something to counter that too.

    There is, however, a plugin for firefox/mozilla that would work as it's a browser doing the slurping.

    Just something to think about.
    DanH
     
  12. Sinister Angel

    Sinister Angel
    Expand Collapse
    I'd Hit It!

    Joined:
    Oct 11, 2004
    252
    0
    Location:
    Traverse City, Michigan
    Actually, you can have WGET send a forged AGENT header or maybe its a refer header as well.

    --referer=url
    Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.


    and


    -U agent-string
    --user-agent=agent-string
    Identify as agent-string to the HTTP server.

    The HTTP protocol allows the clients to identify themselves using a "User-Agent" header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current version number of Wget.

    However, some sites have been known to impose the policy of tailoring the output according to the "User-Agent"-supplied information. While conceptually this is not such a bad idea, it has been abused by servers denying information to clients other than "Mozilla" or Microsoft "Internet Explorer". This option allows you to change the "User-Agent" line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.

    Hope this helps!
     
Loading...
Similar Threads Forum Date
Credit Karma web site The Okie Corral Apr 1, 2015
Free Web Site Hosting The Okie Corral Feb 16, 2015
New Ameriglo Web Site Sights, Optics and Lasers Sep 26, 2011
Good web site Smart Shopper Jan 11, 2006
Worst Web Site? Tech Talk Jun 2, 2005
Duty Gear at CopsPlus