l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
September 2: Social gathering
Next Installfest:
TBD
Latest News:
Aug. 18: Discounts to "Velocity" in NY; come to tonight's "Photography" talk
Page last updated:
2011 May 25 13:46

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] how to modify .htaccess to prevent wget or the likesfrom downing my site?



On 05/25/2011 12:10 PM, Chanoch (Ken) Bloom wrote:
> On Wed, 2011-05-25 at 14:50 -0400, Hai Yi wrote:
>> Hello all:
>>
>> I first asked this question to the support of my web host, and they
>> redirected me to this link:
>> http://www.webhostingtalk.com/showthread.php?t=437549
>>
>> and the snippet on that page looks like:
>>
>>
>> SetEnvIfNoCase User-Agent "^Wget" bad_bot
>>
>> <Limit GET POST>
>>    Order Allow,Deny
>>    Allow from all
>>    Deny from env=bad_bot
>> </Limit>
> 
> This snippet will only block wget, if wget deigns to identify itself as
> wget by saying so in the user-agent string.
> 
>>
>> I copied and pasted it to the .htaccess under /public_html. Still, I
>> am able to use this command to fetch my site:
>>
>> wget --wait=20 --limit-rate=20K -r -p -U Mozilla www.my_iste.com
> 
> Yup. Wget decided to identify itself as Mozilla in the user-agent
> string. That means you have no way at all of knowing that someone's
> trying to use Wget to download from your site.
> 
>> However, if I  tried the same wget with a slight change in the command
>> line (without " -U Mozilla ")
>>
>>  wget --wait=20 --limit-rate=20K -r -p www.my_site.com
>>
>> I get this:
>>
>> --2011-05-25 14:30:36--  http://www.my_site.com/
>> Resolving www.my_site.com... xxx.xx.xxx.xx
>> Connecting to www.my_site.com|xxx.xx.xxx.xx|:80... connected.
>> HTTP request sent, awaiting response... 403 Forbidden
>> 2011-05-25 14:30:37 ERROR 403: Forbidden.
> 
> Wget deigned to identify itself as wget this time.
> 
>> Now I have three questions:
> 
>> 1. Why didn't the code in .htaccess prevent the downloading? Did I
>> miss something?
> 
> (See my explanation above.)
> 
>> 2. Do we have other tools acting like wget, how can we prevent them
>> all from downing the site content?
> 
> There are other tools that act like wget. You can't prevent them *all*
> from downloading, though you could blacklist specific ones the way you
> did with Wget. Of course, they may also decide to change the User-Agent
> string, then you have no way of telling at all.
> 
>> 3. If someone is downloading, can we have some log file that can
>> expose the downloader's info?
> 
> Your web browser logs will have their IP address, but I doubt you could
> do anything useful with that information. If your user logs in to the
> site, you could try to keep track of that yourself somehow, but that
> could be very complex depending what you're trying to prevent.
> 
> ...
> 
> 
> In other words, the protection you're asking for is basically impossible
> against a determined downloader.
> 

If you're using Apache, there is a connection limiter per IP tool you
can use to restrict the usage of download accelerators. Just be careful
not to clamp down to far because you can end up limiting end users who
just appear to have the same IP at the end of the line due to IP routing.

You can also do bandwidth throttling, look for stuff on QoS for how to
handle that.

Enjoy,
Alex

_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
EDGE Tech Corp.
For donating some give-aways for our meetings.