Internet Marketing - Matt Bailey [167]
In one case, the popular Magento eCommerce product generated thousands of duplicated pages on a website of only 25 products. The culprit was the “order by price” function, which created innumerable versions of alternate pages as part of the programmed function. In many cases, the culprit is a function that is added to increase management, but if the programmer does not take the search engines into account, the result could create hundreds to thousands of new URLs in order to perform the function.
There are a few primary places to find duplicated URLs. One is the home page logo. On websites, the logo is typically a link to the home page of the website. Every once in a while the home page logo will be programmed to go to the home page URL (typically www.website.com/index.php), but it is a different URL than the root level (typically www.website.com/). In doing this, the link from the logo had created a link to another home page URL from every page of the website. Be sure that the logo links to the preferred version of the domain, based on the redirect instructions from Wednesday’s section.
Another culprit is usually found on ecommerce websites. When the shopping cart is programmed and added to the website, it technically changes to a new subdomain, https, the secure version of the website. Visitors expect to see the security measures on the browser, such as the lock icon and the https in the URL window.
Let’s take a moment to think about the types of links programmers use in building navigation. There is the absolute link, which includes the entire link path to the document, www.website.com/products/detail/shoe. Then, there is the relative link. The relative link only uses the directory structure for navigation without the full domain path, such as /products/detail/shoe. The relative link method is a time-saver in some cases, because programmers do not have to type in the complete URL in order for the website to function properly.
However, if the navigation or other links within the secure portion of the site employ the relative URL, then the entire site will be duplicated. For example, if you are in the secure shopping portion of a website, www.website.com/cart/purchase, and there is a relative home page link on that page (/home.php), then the server fills in the rest of the information for the link based on the secure setting (https://). As a result, all of the domains can be duplicated; http://www.homepage.com/ is duplicated as https://www.homepage.com, and so on. Based on this, I always recommend using absolute URLs to manage website links, especially in the ecommerce cart or in any type of subdomains.
Each CMS has the potential to create duplicate content, and website managers have to be constantly vigilant about this. Whenever adding programming or functions, you need to be aware of how the new function will create or manage pages. The ways that CMSs create duplicated pages are as unique and as plentiful as the CMSs themselves.
If the problem cannot be corrected, then it may be managed either via a redirect on the server or by blocking the pages from the search engines by use of the robots.txt file, which is Friday’s topic.
Friday: Utilize Robots.txt to Welcome Search Engines
Nothing is as confusing to a website owner as the robots.txt file. The robots.txt file is a programming protocol (agreed upon rules) between a web server and the search engine spider. robots.txt contains a set of instructions for compliant spiders to follow. Unfortunately, although search engines understand the file, humans have a difficult time grasping machine language, especially if they rarely deal with it.
The Google blog ran a two-part series on understanding robots.txt and the robots metatag. Both of these articles, while providing a lot of great in-depth information, are much more than any site owner or manager wants to know. Especially when you start talking technology, bots, spiders,