Archive for the ‘Duplicate Content’ Category

postheadericon Google Clears up the Duplicate Content Myth

Greg Grothaus of the Search Quality Team has posted a video (along with a presentation on the Webmaster Central Blog) covering duplicate content and multiple site issues that webmasters continue to face when trying to rank well in Google. 

Greg begins by clearing up a popular myth about duplicate content, and that is that Google penalizes sites for having duplicate content. This is not the case. That's not to say that duplicate content can't have a negative impact on your rankings, but Google itself is not penalizing you for it.

Greg stresses that duplicate content is simply a factor on a "by query" basis.  "What's actually happening, is that we're looking at the query that the user's doing, and we're saying that we want diversity in the results we're going to show a user," says Grothaus. He says those who think their content is being omitted because it is duplicate, will likely find that if they adjust their query to more specifically reflect the missing piece, they may just find that it shows up in results after all.

Google recognizes that most duplicate content is not created to be deceptive. There are of course exceptions, which are considered spam. Grothaus says even spam sites aren't being penalized for having duplicate content though. They're being penalized for being spam. Just like some spammers use bold tags, he says. They don't penalize people just for using them. And they don't penalize people just for having duplicate content.

Duplicate Content:

    * example.com/
    * example.com/?
    * example.com/index.html
    * example.com/Home.aspx
    * www.example.com/
    * www.example.com/?
    * www.example.com/index.html
    * www.example.com/Home.aspx

The above list from Grothaus's presentation shows examples of URLs that are different, but show the same content. Google will recognize that they're the same, and will try to pick the right one, (although sometimes they pick the wrong one). Greg says Webmasters are the best people to know which one is best, so it helps to only use one.

You will not be penalized for using more than one, but there are some issues that can arise that may negatively affect your rankings. For one, your link popularity will be diluted. Backlinks pointing to several different URL versions of the same content, will make it harder to accumulate link juice for one URL. Greg says that user-unfriendly URLs in search results may offset branding efforts and decrease usability as well. Plus, with multiple versions of the same thing, Google will spend more time crawling the same content, meaning it will have less time to go deeper into your site, and you run the risk of having content not get indexed.

***

I really don't think this brings anything that new to the duplicate content question. It certainly doesn't 'bust' any myths.  I mean the bottom line is that if you use duplicate content, you won't get ranked for multiple pages or sites.  Google is basically saying here that your not going to get banned but its not going to help your overall cause.  It's always a good idea to use unique content on different pages and if you have more then one site, you should not use the same content you used on the main site.   You will never have multiple domains coming up for same set of content, it must be unique.
 

  • Share/Bookmark

postheadericon Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduce Duplicate Content Clutter

The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

Today, Google, Yahoo and Microsoft (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone. Webmasters rejoice! Worried about duplicate content on your site? Want to know what “canonical” means? Read on for more details.

Multiple URLs, one page

Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated…), the marketing department might want tracking codes added to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort  through 1,000 URLs.

This can be a problem for a couple of reasons.

  • Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.
  • Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value  the page could have if all 10 links pointed to a single URL.

Using the new canonical tag

Specify the canonical version using a tag in the head section of the page as follows:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

That’s it!

  • You can only use the tag on pages within a single site (subdomains and subfolders are fine).
  • You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

  • Links to all URLs will be consolidated to the one specified as canonical.
  • Search engines will consider this URL a “strong hint” as to the one to crawl and index.

Canonical URL best practices

The search engines use this as a hint, not as a directive, (Google calls it a “suggestion that we honor strongly”) but are more likely to use  it if the URLs use best practices, such as:

  • The  content rendered for each URL is very similar or exact
  • The canonical URL is the shortest version
  • The URL uses easy to understand parameter patterns (such as using ? and %)

Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google  reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines.

For instance, this tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.

If tags conflict (such as pages point to each other as canonical, the URL specified as canonical redirects to a non-canonical version, or the page specified as canonical doesn’t exist), search engines will sort things out just as they do now, and will determine which URL they think is the best canonical version.

The tag in action

This tag will most often be useful in the case of multiple URLs pointing at the same page, but might also be used when multiple versions of a page exist. For instance, wikia.com is using the tag for previous revisions of a page. Both http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&diff=4901&oldid=4819 and http://watchmen.wikia.com/index.php?title=Comedian%27s_badge&diff=5401&oldid=4901reference the latest version of the article (http://watchmen.wikia.com/wiki/Comedian%27s_badge) as the canonical.

The search engines stress that it’s still important to build good URL structure and also note that if you aren’t able to implement this tag, they’ll still keep the processes they have now to determine the canonical. For instance, at SMX West on Tuesday, Maile Ohye of Google explained how Google can detect patterns in URLs if they use standard parameters. For instance, with these URLs:

  • http://www.example.com/buffy?cat=spike
  • http://www.example.com/buffy?cat=spike&sort=evil
  • http://www.example.com/buffy?cat=spike&sort=good

Maile explained that Google can detect (particularly when looking at patterns across the site) that the sort parameter may order the page differently, but that the URLs with the sort parameter display the same  content as the shorter URL (http://www.example.com/buffy?cat=spike).

While it’s rare for the search engines to join forces, this isn’t the first time they’ve come together on a standard. In November 2006, they came together to support sitemaps.org. And in June 2008 they announced a standard set of robots.txt directives. Matt Cutts of Google and Nathan Buggia of Microsoft told me that they want to help reduce the clutter on the web, and make things easier for searchers as well as site owners.

This new tag won’t completely solve duplicate issues on the web, but it should help make things quite a bit easier particuarly for ecommerce sites, who likely need all the help they can get in the current economic conditions. Site owners have been asking for help with these issues for a really long time so this should be a greatly welcomed addition.

source: Search Engine Land

  • Share/Bookmark
Search Engine Optimization

SEO is the process of improving the volume or quality of traffic to a web site or a web page (such as a blog) from search engines via "natural" or un-paid ("organic" or "algorithmic") search results as opposed to other forms of search engine marketing (SEM) which may deal with paid inclusion and pay per click.

Subscribe By Email

Enter your email address:

About Author

Christopher Costa is the President of Lawyers Court, an Internet Marketing and Web Design firm for Lawyers.

Contact Chris at 630-393-0460 or email at law@lawyerscourt.com

Contact Chris

Your Name (required)

Your Email (required)

Your Phone

Message

captcha

Legal SM on Twitter

Posting tweet...

Famous Legal Quotes
“A lawyer with a briefcase can steal more than a thousand men with guns.”
by Mario Puzo