As part of tracking the current spam problem (and the changes I made to the MT cgi directory yesterday did abruptly cut off all the error log messages — and, I’ll note, all the junk trackbacks that have gotten through — more on that later), I noted in my access logs a lot — I mean a lot — of image theft. Folks — mostly in myspace.com — linking directly through to images on my blog, rather than hosting the images on their own site (and using their own bandwidth).
Now, most of those images were copied down here by me for fair use purposes, so I can’t object too strenuously on IP grounds. But it’s not their using the images per se, but the “bandwidth theft” and server traffic impact on me that’s a problem. Someone is, essentially, making their page all pretty on the back of my account’s bandwidth, interfering to some degree with your ability to come here and see my pretty page.
I have no idea of the impact of this, though, anecdotally, it can be a serious problem on some sites. But it’s a known discourtesy, and folks have come up with any number of ways to prevent it.
E.g.:
- Preventing Image Bandwidth Theft With .htaccess (The Site Wizard)
- Bandwidth Theft (HTMLSource)
- Preventing image hotlinking: An improved tutorial (Underscorebleach.net)
- Stop Hotlinking and Bandwidth Theft with HTACCESS (Altlab). Includes a helpful tester to see if the image is showing up.
- Using .htaccess to stop remote image linking (hotlinking) and bandwidth theft (Tom Rafferty)
- Block hotlinkers but allow some sites remote access to images using .htaccess (Tom Rafferty)
- Preventing Image Bandwidth Theft (Islandnet.com)
All of the above discuss how to deal with the problem, which is basically modifying the .htaccess file in your images directory …
… you do have all your images in their own directory, don’t you? That would certainly be convenient …
… so that requesters end up with zilcho (a broken image icon) when they point to it.
There are essentially two ways of doing this. Most sources use mod_rewrite to check out the request (“RewriteCond %{HTTP_REFERER}”) and block it. The HTMLSource article above is an example of this.
A couple of sites, though, suggest using SetEnvIfNoCase (Site Wizard and Islandnet, above). From what I’ve seen elsewhere, this is becoming the preferred mechanism for doing this sort of checking, if your server supports it (not all do, or did).
(By the way, if I commit a technical gaffe here, please feel free to correct me. I’m learning here by example.)
So here’s what I’ve ended up putting in the .htaccess file of my /blog/images/ directory. (I do have other places where images reside, but they’re a lot less likely to be being stolen from):
SetEnvIfNoCase Referer "^$" locally_linked=1 SetEnvIfNoCase Referer "!^http://.$" locally_linked=1 SetEnvIfNoCase Referer "^http://(www.)?hill-kleerup.org" locally_linked=1 SetEnvIfNoCase Referer "^http://216.239.(3[2-9]|[45][0-9]|6[0-3]).*(www.)?hill-kleerup.org" locally_linked=1 SetEnvIfNoCase Referer "^http://babel.altavista.com/.*(www.)?hill-kleerup.org" locally_linked=1 SetEnvIfNoCase Referer "^http://216.243.113.1/cgi/" locally_linked=1 SetEnvIfNoCase Referer "^http://search.*.cometsystems.com/search.*(www.)?hill-kleerup.org" locally_linked=1 SetEnvIfNoCase Referer "^http://.*searchhippo.com.*(www.)?hill-kleerup.org" locally_linked=1 SetEnvIfNoCase Referer "^http://[^./]*\.bloglines\.com" locally_linked=1 SetEnvIfNoCase Referer "^http://[^./]*\.search\?q=cache" locally_linked=1 SetEnvIfNoCase Referer "^http://[^./]*\.talkr\.com" locally_linked=1 SetEnvIfNoCase Referer "^http://[^./]*\.google\." locally_linked=1 <FilesMatch "\.(gif|png|jpe?g|bmp)$"> Order Allow,Deny Allow from env=locally_linked </FilesMatch>
Essentially, any request to a file in that directory (or anything below it) will be evaluated as to where it’s coming from (the “Referer” (sic) value). If the Referrer is blank, or contains some firewall-inserted text, or (most importantly) is from my own domain (i.e., an image is being called by a web page or my account), then we want that to pass. (Ditto for some domains that may have images I’m intentionally hosting here; I’ve left those off the above listing).
There are also some bits under that to allow some search engines and RSS aggregators (and the Babelfish translator) to show the images properly. That’s okay by me, because that’s someone actually looking at my page. But anyone else is Denied access to the image.
The result of all this? Well, some folks (“don’t be a carbon copy. be original. be yourself.”) who were linking to images directly off my page (either themselves, or commenters), instead of doing the polite thing and downloading said images and putting them on their own site directly, now have little broken graphics instead. Huzzah!
Note that some solutions to this problem instead redirect the image request to a different image (e.g., something with eye-splitting colors, or something embarrassing, or something that says “I AM A THIEF!,” etc.). Those solutions can be found in the above links, too. I’ve decided not to do that, but instead just leave them with a broken image showing up. If I were to do it, it would probably be a tasteful little colored block (say, green with yellow text) that says, “If you like the image, then host it on your own server, please; don’t steal my bandwidth.”
Yes, I’m far too polite for this line of work.
Some people watch their referrer logs and tailor replacement images. Some people even make it a game. But that’s waaaay to labor-intensive for me.
Here’s an automated tool for generating a .htaccess file to do the above sort of thing (using the mod_rewrite method).
Your site may also, through cPanel, have a “Hotlink Protection” function. Now that I’ve done all this work, I discover that, hey, my host has that. On the other hand, that’s a blank block across the entire site; I kind of like (at this point) tailoring it for a specific directory. I also like the nuances of the above list (for aggregators, Google images, etc.) that might or might not (as I try to translate the query language) come into play there.
Whatever you do, test it before and after to confirm that you haven’t messed things up. And as part of that testing via thieving sites, be sure and flush your cache so that you’re not just seeing images stored on your PC.
Oddly enough, the most common hotlinked graphic from here is … Hello Kitty?
But I want to see what images being stolen.
Though this one has also been popular to steal bandwidth for:
Again, none of these are graphics I’ve created, but if folks want to post them or use them on their sites or boards or sig lines or whatever, they should host them themselves.
And this one seems oddly popular, as well.
And I’ll note that nearly all the bandwidth theft has been from myspace.com. Damned kids these days! Hey! You! Off my image files!
As you probably know from reading Blogula, I feel your pain. I’m one of those gamesters; I supply a “I’m Too Stoopid 4 The Intenets” image featuring the Ha! Ha! man to our friends at Myspace.
Apparently, Myspace pretty much supplies the code for people to copy images – they fill in a little box with the image link and get a little box of copy-paste goodness. From what I’ve heard, Myspace doesn’t provide image hosting. Many of the li’l darlings use Photobucket, but exceed their bandwidth too quickly.
My most-stolen images are a “Happy Chrismahanakwanzakuh” thing I copied from a holiday card, and also an image of Carter Oosterhouse from Trading Places, go figure. It seems like one person will take an image and paste it on all their friends’ comments pages, and all their friends steal it and paste it on THEIR friends’ pages… man, it’s lame.
The best one was the girl that stole one of my vacation photos of a red sandstone wall for the background image on her personal page. That one got pulled pretty quick when I turned it into a little puce box that said “I stole this image and I’m still stealing bandwidth.”
🙂
Other popular images from my site (now that I’m at home and looking at the error logs):
But the Hello Kitty and the Mondays images are still at the top of the list, by far.
If I read my AWSTATS page right, “hello kitty” garnered 2013 search hits to my page, or 18.6% of the total. Egads.
Okay, one disadvantage to the “deny from” approach is that it generates a 403 error (Forbidden), which clogs the error log something fierce. Since I’m getting 3-4 errors a minute, that’s a problem.
The default setting for the mod_rewrite approach “[F]” does the same — but if you change it to a “[G]” for “Gone” it generates a 410, which does not (I understand) get written in the error log.
I may change over for just that reason. We’ll see.
Gave some long, serious thought to modifying the “Hello Kitty” graphic to include the text “hates bandwidth thieves” and have that come up for such attempts. That would properly horrify both the HK fans and the HK haters.
We’ll see how I feel tonight.
Why not just host the images at Flickr.com? Unless you need big images or want to organize them there, it’s free. Even the “pro” package is only $25/year.
I like having the images myself. Flickr’s reliability — and long-term viability — feels like more of a question mark than my own (trusted) host.
But I realize that’s just my opinion, and lots of folks go with the Flickr solution.
Things seem to be pretty stable right now.