Jacob Gillespie has worked on a post concerning URL Guidelines, that underwent much revision and was posted as a guest post on CSS-Tricks named Guidelines for URI Design.
Clean URIs are one component of a clean website, and it is an important one. The majority of end-user access to the Internet involves a URI, and whether or not the user actually enters the URI, they are working with one nonetheless.
Here is an outtake of the general principles of the article:
- A URI must represent an object, uniquely and permanently - The URI must be unique so that it is a one-to-one match – one URI per one data object.
- Be as human-friendly as possible - URIs should be designed with the end user in mind. SEO and ease of development should come second.
- Consistency - URIs across a site must be consistent in format. Once you pick your URI structure, be consistent and follow it!
- “Hackable” URIs - Related to consistency, URIs should be structured so that they are intelligibly “hackable” or changeable.
- Keywords - The URI should be composed of keywords that are important to the content of the page. So, if the URI is for a blog post that has a long title, only the words important to the content of the page should be in the URI.
When it comes to technical details, here are their concerned bullet-points:
- No WWW - The www. should be dropped from the website URI, as it is unnecessary typing and violates the rules of being as human-friendly as possible and not including unnecessary information in the URI.
- Format - Google News has some interesting requirements for webpages that want to be listed in the Google News results – Google requires at least a 3-digit unique number.
- All lowercase - All characters must be lowercase. Attempting to describe a URI to someone when mixed case is involved is next to impossible.
- URI identifiers should be made URI friendly - A URI might contain the title of a post, and that title might contain characters that are not URI-friendly. That post title must therefore be made URI friendly. [...] Spaces should be replaced with hyphens.
Scott Mitchell has written an article on 4GuysFromRolla.com about Techniques for Preventing Duplicate URLs in Your Website.
A key tenet of search engine optimization is URL normalization, or URL canonicalization. URL normalization is the process of eliminating duplicate URLs in your website. This article explores four different ways to implement URL normalization in your ASP.NET website.
The important subjects of this article are the following:
- First Things First: Deciding on a Canonical URL Format - Before we examine techniques for normalizing URLs, and certainly before such techniques can be implemented, we must first decide on a canonical URL format.
- URL Normalization Using Permanent Redirects - [...] when a search engine spider receives a 301 status it updates its index with the new URL. Therefore, if anytime a request comes in for a non-canonical URL we immediately issue a permanent redirect to the same page but use the canonical form then a search engine spider crawling our site will only maintain the canonical form in its index.
- Issuing Permanent Redirects From ASP.NET - Every time an incoming request is handled by the ASP.NET engine, it raises the BeginRequest event. You can execute code in response to this event by creating an HTTP Module or by creating the Application_BeginRequest event handler in Global.asax.
- Rewriting URLs Into Canonical Form Using IIS 7's URL Rewrite Module - Shortly after releasing IIS 7, Microsoft created and released a free URL Rewrite Module. The URL Rewrite Module makes it easy to define URL rewriting rules in your Web.config file.
- Rewriting URLs Into Canonical Form Using ISAPI_Rewrite - Microsoft's URL Rewriter Module is a great choice if you are using IIS 7, but if you are using previous version of IIS you're out of luck.
- Telling Search Engine Spiders Your Canonical Form In Markup - Consider a URL that may include querystring parameters that don't affect the content rendered on the page or only affect non-essential parts of the page.
In the case of YouTube, all video pages specify a element like so, regardless of whether the querystring includes just the videoId or the videoId and other parameters:
<link rel\="canonical" href\="/watch?v=videoId"\>