The URL of a page is an important SEO factor that influences the relevancy and the position in the search results. And since it is also displayed in the result, users quickly scan the URL to assess the page’s significance. This article shows how to design and structure the URLs of your pages in a way that is helpful to both people and search engines. We also discuss solutions to recover from missing URLs.
First of all, what do we understand by the term “URL”? URL is short for “Uniform Resource Locator” and identifies resources on the network. To ensure that resources can be found through a URL, they have to consist of three mandatory elements – protocol, host name and resource path. Let’s look at an example:
The mandatory elements in the above example are
As you can see, URLs are complete path statements which lead browsers to specific resources on the network, much like the paths you use in your file manager.
Assuming that you already have a protocol and a host name, we will now cover the “resource path” part of URLs and how to name your resources.
The resource path
The resource path consists of one or more path names and an optional file name. The path names can represent a directory structure when the web page is located in a file system or a context path when it is stored in a database.
The resource path is the most important part of an URL for a web page because it identifies its content in the network and search engines consider filenames a very short résumé of the files’ contents.
Besides, URLs appear in search engines’ results pages (SERPs) and people do also evaluate a web page’s relevance by taking a look at them. Therefore, we have to take care that our URLs describe the resources which they represent most precisely. The following rules will give you some guidance:
- Use meaningful names for your resources.
Name your paths and files (HTML, images etc.) according to the content they represent. Imagine an HTML file which represents an article with the title “Why filenames are so important for search engines”. An appropriate filename would be “why-filename-important-search-engine.html” or “why-filename-important.html”.
- Do not use underscores (“_”) in resource names and use hyphens (“-“) instead.
Search engines recognize words with hyphens separately whilst two words combined with an underscore appear as one word to search engines.
- Avoid deep nesting of directories.
Search engines currently descend to a limited number of subdirectories only to index your documents. That’s why it is advisable to use not more than four subdirectories.
- Pass only a minimum of parameters to URLs.
Modern search engine robots can handle up to 30 different parameters. However, chances that a URL is recognized are higher by orders of magnitude for URLs without (many) parameters. In order to avoid duplicates you have to consider permutations of parameters (i.e. changes in the order of which they appear). So keep it simple, use only as many parameters as you really need and always use the same order.
- Subdomains (
subdomain.example.com) vs. subdirectories are disputed controversially.
In any case, do not provide duplicate content under different subdomains since that will not promote your position in the search engine results pages (SERPs); preferring subdirectories over subdomains mostly is a good idea since they are easier to manage and help your website as a whole. Use subdomains to split off content that is completely different.
- Do not use session ids in the URL unless you need them.
If you cannot get around using many URL parameters or session ids, you should provide the canonical URL in your web pages. Let’s say, there is a web page displaying the products of your company
which would deliver the same content as:
but sorted the other way round. For search engines, the content would be duplicate, though. If you provided the tag
<link rel="canonical" href="http://www.example.com/products" />
in the web page’s
<head>section, search engines would treat both URLs as the same document.
URLs tell a website’s visitor where s/he is and what the current document is all about. You should really define meaningful URLs. Here are some bad examples:
Meaningful URLs instead, help search engines (and people of course) to classify the content of your web pages. Here are some good examples:
In the days before the rise of the search engines, people had to type in URLs directly into the address bar of their browser. When a URL is very short or people know the address of a page they want to visit, they do so even today. However, there’s a risk that URLs are misspelled and that is one of the reasons why web servers display error pages.
Error pages help visitors find out what went wrong with their request and they also help them find what they were looking for. Error pages are shown when a web server detects some kind of erroneous HTTP state. There is a bunch of error states defined in RFC 2616.
There is no need to know which kind of error all the numbers represent. But you should at least know that all
4xx errors indicate that a client has done something wrong and all 5xx errors indicate errors on a server and that
404 means that a requested resource could not be found by the web server. The further discussion focusses on 4xx errors, the 5xx errors are delegated to sysdamins.
Web servers normally come with a set of error pages covering the different HTTP error states. See Figure 1 for two example error pages of an Apache web server.
There are a lot of good reasons to implement custom error pages that fit to your web site. Some of them are
- Standard error pages provide too little or wrong information
- Sometimes they expose even critical internal information like internal hostnames, network strcture etc.
- They do not implement your company’s style sheets and navigation
- They do not provide helpful links to content the visitor might have been looking for
An error page should include your website’s navigation and style sheets. Otherwise people do not even know that they are on your site. It should display a message which describes the reason for the error in simple terms.
To be more helpful, it should offer links to the home page and e.g. to the company’s department’s or its product line’s home pages. It should also offer a search box with which people can search through your website’s content.
Finally, if you can do full-text searches on your site, you should also display links to suggested pages based on the address the visitor tried to view. Figure 2 shows a suggestion of the online encyclopedia “Wikipedia”.