Posted October 24, 2008 and filed under Technology    tags:  ,

image In the past I have created a couple of extensions (UserAgentBlocking, and GateKeeper) which allows me to filter incoming traffic based on UserAgents and/or IP Addresses.  I’ve had great success blocking crawlers and spiders that were not following my robots.txt directions. Recently I started thinking about all of the uses for a URL rewriter.  I was wondering if this would provide a more flexible way to control access to my site.  I definitely wanted to block incoming requests that have an empty UserAgent, unruly crawlers, abusive IP addresses, etc…  I also wanted to automatically switch over to HTTPS for secure pages such as the login.aspx and Password-Maker page.  Let’s also not forget about redirecting inbound connections that are missing the www. prefix.

But what about other functionality?  For example, suppose I want to take advantage of jQuery’s ability to retrieve JSON feed?  Or pulling XML content from external websites to populate a table?  Or how about dynamically pulling an RSS feed without using an ASP.NET UserControl?  Lastly, how about blocking external sites that are hotlinking from my site?

Without using a URL rewriter the aforementioned functionality is almost impossible.  So I decided to invest some time in configuring URL rewriting for my website.  It’s not all that hard to implement but it did take a little effort and research to get the right tool for the job.

My original need was really centered around switching between HTTPS and HTTP.  For that I would recommend you have a look at SanibelLogic’s SSLRedirect product.  While not free ($39.95 for the DLL, $59.95 for the SDK) you are definitely getting your money’s worth.  I have been using their product for over two years without any problems.  The configuration and management couldn’t be easier.

My needs were considerably more so I started looking at ManagedFusion’s URLRewriter product.  Even though this product is free there is significantly more powerful features than SanibelLogic’s product.  Based on the Apache mod_rewrite extension I can do everything I previously mentioned.  Almost all of the mod_rewrite functionality has been ported over providing an extremely flexible (almost too flexible) configuration model.  While the rewriting and redirecting features are impressive enough the best part is the proxy services.  The proxy functionality allows me to respond to web requests with content received from a web server outside of my domain.

For example, suppose I wanted to include some content from last.fm on your website but didn’t want to create an ASP.NET UserControl to grab and then display the data.  I simply wrote a custom rewrite rule and specified the proxy flag. 

RewriteRule ^/lastfm-recenttracks\.(.*) http://ws.audioscrobbler.com/2.0/?method=user.getrecenttracks&user=dscoduc&api_key=123 [NC,P]

As an example have a look at my recent tracks on Last.fm using the URL Rewriter to proxy the request from last.fm.  If, after the page loads, you look at the URL you will notice the address is still my domain but the content is coming directly from last.fm.

last.fm

How about automatically switching to HTTPS for those pages requiring security, and back to HTTP for all of the other pages?  Here’s what it takes to achieve that goal:

# HTTP REQUIRED PAGES
RewriteCond %{HTTPS} =on
RewriteCond %{REQUEST_URI} !(/login\.aspx|/securepage\.aspx).*$ [NC]
RewriteCond %{HTTP_HOST} (.+)
RewriteRule ^(.*)$ http://%2$1 [R=301,L]

# HTTPS REQUIRED PAGES
RewriteCond %{HTTPS} =off
RewriteCond %{REQUEST_URI} (/login\.aspx|/securepage\.aspx).*$ [NC]
RewriteCond %{HTTP_HOST} (.+)
RewriteRule ^(.*)$ https://%2$1 [R=301,L]

Next we will stop hotlinking with the following syntax:

# BLOCK HOTLINKING
RewriteCond %{HTTP_REFERER} !^((http|https)://)?(www\.)?domain\.com/.*$ [NC]
RewriteRule \.(gif|jpe?g|png|bmp|js)$ - [F,L]

And what about blocking the spammers and connections with an empty UserAgent?  Just a couple of lines:

# BLOCK EMPTY USERAGENTS EXCEPT FOR THE IP ADDRESSES IN THE 2ND CONDITION
RewriteCond %{HTTP_USER_AGENT} !^.+$
RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$
RewriteRule .* - [F,L]

# BLOCK BOTS/SPAMMERS BASED ON THEIR USERAGENT
RewriteCond %{HTTP_USER_AGENT} (sogou|baiduspider|sosospider|larbin) [NC]
RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$
RewriteRule .* - [F,L]

It took a little while to getting the rewrite rules and conditions correct but I am very happy with the end result.  I have been able to remove the GateKeeper extension I wrote, remove the SanibelLogic product, block hotlinking, and provide external content to local jQuery scripts. 

What do you think?  Is URL Rewriting for you?

Add comment


(Will show your Gravatar icon)

biuquote
Loading