close

Privacy guaranteed - Your email is not shared with anyone.

Welcome to Glock Talk

Why should YOU join our Glock forum?

  • Converse with other Glock Enthusiasts
  • Learn about the latest hunting products
  • Becoming a member is FREE and EASY

If you consider yourself a beginner or an avid shooter, the Glock Talk community is your place to discuss self defense, concealed carry, reloading, target shooting, and all things Glock.

Web Site Download Tool

Discussion in 'Tech Talk' started by Team Greenbaum, Dec 4, 2004.

  1. Team Greenbaum

    Team Greenbaum Millennium Member

    90
    0
    Jul 30, 1999
    Does anyone know a good web site download tool? One that downloads media and archive files? I've tried a few, most recently "7 Download Service". It worked great on html and image files but I never could get it to download .mpg, .mp3 or .zip files...
     

  2. Team Greenbaum

    Team Greenbaum Millennium Member

    90
    0
    Jul 30, 1999
    Is there a way to make it download the whole web site? I could only get it to download 1 file at a time. I'm looking for something that will crawl a site and download the whole thing.
     
  3. Sinister Angel

    Sinister Angel I'd Hit It!

    252
    0
    Oct 11, 2004
    Traverse City, Michigan
    Glad to help! I know it works wonders on my linux box.
     
  4. lomfs24

    lomfs24

    2,028
    0
    Apr 19, 2003
    Montana
    How do you make wget pull an entire website? I have worked with it a little but mostly as a single file transport tool.
     
  5. grantglock

    grantglock /dev/null

    219
    0
    Feb 20, 2004
    Iowa

    wget -r http://glocktalk.com
     
  6. lomfs24

    lomfs24

    2,028
    0
    Apr 19, 2003
    Montana
  7. Team Greenbaum

    Team Greenbaum Millennium Member

    90
    0
    Jul 30, 1999
    Wget rocks! Almost.

    It worked great on the first site I tried it on. It downloaded everything, including media files and changed all URLs to local links. However, on the second site I tried, it immediately gets a 302 redirect to a completely different site. It's as if the web server is recognizing that I'm using wget instead of a browser and responding with the 302 redirect. Any ideas on what I can do to fix this?

    Here are the options I'm using:
    wget --output-file="wget.log" --recursive --level=inf --timestamping --convert-links --wait=1 --random-wait https://user:password@www.website.com
     
  8. HerrGlock

    HerrGlock Scouts Out CLM

    23,791
    183
    Dec 28, 2000
    The web site is seeing that you have a getter instead of a browser, you are right.

    There are things you can do, but most of the people who really don't want you to slurp their site already know them and have something to counter that too.

    There is, however, a plugin for firefox/mozilla that would work as it's a browser doing the slurping.

    Just something to think about.
    DanH
     
  9. Sinister Angel

    Sinister Angel I'd Hit It!

    252
    0
    Oct 11, 2004
    Traverse City, Michigan
    Actually, you can have WGET send a forged AGENT header or maybe its a refer header as well.

    --referer=url
    Include `Referer: url' header in HTTP request. Useful for retrieving documents with server-side processing that assume they are always being retrieved by interactive web browsers and only come out properly when Referer is set to one of the pages that point to them.


    and


    -U agent-string
    --user-agent=agent-string
    Identify as agent-string to the HTTP server.

    The HTTP protocol allows the clients to identify themselves using a "User-Agent" header field. This enables distinguishing the WWW software, usually for statistical purposes or for tracing of protocol violations. Wget normally identifies as Wget/version, version being the current version number of Wget.

    However, some sites have been known to impose the policy of tailoring the output according to the "User-Agent"-supplied information. While conceptually this is not such a bad idea, it has been abused by servers denying information to clients other than "Mozilla" or Microsoft "Internet Explorer". This option allows you to change the "User-Agent" line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.

    Hope this helps!