Yahoo! Search Marketing Home
Yahoo! Search Marketing Blog
March 11th, 2010

Getting There With Search

Six steps you can take to help search engines find and index your content

Search_GlassYou’re trying to get your website noticed and search engines are an excellent channel for visibility. In this article, we’ll look at some basic things you can do to make sure your content is being indexed by search engines. The more of the items you can implement below, the more noticeable your site will be to the search engines. 

1. Check your “Robots.TXT” file
One little line of code in a simple txt file can be very costly if it’s blocking your site from search engines. A robots.txt file allows you to tell search engines to crawl or not crawl certain directories or files in your site.

For example, this simple line of code disallows all crawlers (a.k.a. robots):

User-agent: *
Disallow: /
To find your robots.txt file simply type in your domain followed by /robots.txt. For example, here is the robots.txt file for the W3C: www.w3.org/robots.txt. For more details on what can be in this file and how search engines treat it visit robotstxt.org.

2. Make sure your content is indexable
Although search engines have come a long way over the years in terms of indexing all kinds of content on the web, there are still some types of content that may not be fully indexed or not indexed at all. If you are seeing missing content when you look at the search engine’s cache of your page, you may want to check if the content is presented in one of the tough-to-index ways below.

This is also true for people using screen readers due to a disability. A screen reader “sees” the page much in the same way a search engine crawler does – by crawling content and deciphering the elements.

JavaScript
Some JavaScript is crawled by search engines today and more will likely be in the future, but JavaScript (including its related scriping technique AJAX) can still present an issue.  Because most content in JavaScript is usually not indexed, things like navigation, on-page apps, and any other content presented by using JavaScript may not be seen and therefore cannot contribute to the context of the page (or sometimes cannot be followed, in the case of links) for search engines.

Flash
With better indexing capabilities coming about recently, Flash sites are becoming more prevalent in SERPS (Search Engine Result Pages), although a site built entirely in Flash is still probably not the best idea if you care about search engine traffic.

Today search engines primarily attempt to index links and text from Flash files. While this is better than it used to be, 100 percent of content still may not be indexed depending on how your Flash site is created. Navigation through “pages” in a Flash file is all contained within a single swf file that lives on one URL, eliminating separate topical content for separate pages.  This can be problematic when you’re up against competitors with much more targeted topical and sub-topical content living on distinct URLs (with links to each of those specific URLs providing even more context).

To minimize indexing difficulties, try to use Flash in smaller pieces. Make sure each topical page of your site has its own unique URL first, then put Flash elements on each page if you like.  Beware though—the more of your content you put in Flash, the less content and context you may be providing to the search engines. 

Image Text
It’s rare to see the entire content of a page posted as a .jpg or other image these days, but it still happens. And when it does, a crawler goes through the code and just sees an image instead of seeing all of the pictures, content, and link text on the page. Search engines simply cannot read any textual content you present in an image, whether it is the entire content of the page or just titles or headers.  You will see the images displayed when you check the cached version of a page.  This is because it is displaying the actual image that is cached, not reading the text content within it.

3. Strenghten your link structure
Links to and from your pages are very important for the “findability” of your pages.  If a page has no links connecting it with any other indexed pages on the web, it may not be found by search engines, since they follow links to discover new content. 

Internal links
Make sure you have a sensible linking structure in place on your site that is crawlable, links to top level as well as deeper level pages, and links to content relevant to the page the links are on.

Crawlable links are links that can be seen by search engines, meaning they’re not in JavaScript or in unindexable links within a Flash file. Also link to different pages within your site, not just from the home page, but all pages. Deeper pages in a site tend to be tougher to find and index, since they are linked to less often, or from more obscure pages in a site. Try to include links to pages most relevant to the content of each page, to give the search engines better context, and to provide a good mix of deeper links. 

You can also include a sitemap page on the site (similarly named xml sitemap files are discussed later).  Provide the sitemap link from your home page and/or from a header or footer on all pages.

External links
If you provide worthwhile content, your site and the pages within it will attract links naturally.  These links from external sites help search engines find and classify your site, especially if your site is newly published. To kick-start your visibility,you can add your site to  trusted directories like The Open Directory Project and Yahoo! Directory.  If it is relevant, you can also add your site to online local listings pages like Yelp, Yahoo! Local or CitySearch

Promote your website in your advertising campaigns, add it to your business card, and provide any other means for visibility that you can. If people find your site interesting and useful they will link to it.

To see what your inlinks looks like, go to https://siteexplorer.search.yahoo.com/ and type in your URL. Click on the Inlinks button.  Use the dropdowns to look at links to one page or the entire site, or to look at links from all pages, all pages except that subdomain, or all pages except that domain.

4. Create a  sitemap XML file
The major search engines we’re addressing here all support xml sitemap files. These sitemap files are different from the onsite sitemap pages previously described. They are xml files that contain a list of the URLs on your site along with a small amount of information about the URLs that is placed on your server and crawled by search engines. This allows you to tell search engines about your URLs, even if they haven’t crawled them naturally by following links on the Web.

Visit sitemaps.org for more information, or see Yahoo!, Google, and Bing’s support of sitemaps.

5. Verify your “nofollow” and “noindex” tags
Noindex and nofollow tags can be used to block search engines from crawling specific links or content. 

Noindex
The noindex meta tag tells search engines not to index a page.  It looks like this:

<meta name=”robots” content=”noindex” />

To check for noindex tags on any of your pages, right click on the page in the browser and choose “View Source”.  Search for noindex on the page.

For more information on the search engines’ support of noindex, see these Yahoo!, Google and Bing pages mentioning it.

Nofollow
Nofollow tags can be found in a robots meta tag at the page level, or within the <a> tag at the link level.

Nofollow at the page level tells search engine robots not to follow any of the links in the body of the page that the nofollow meta tag is on.  It looks like this:

<meta name=”robots” content=”nofollow” />

Nofollow at the link level tells search engine robots not to follow that particular link that the nofollow attribute is applied to.  It looks like this:

<a href=”http://www.example.com/” rel=”nofollow”>link text</a>

To check for nofollows on any page, you can look at the source code of the page by right clicking on the page and choosing “View Source.” Then do a search for the word nofollow in the source code.

For more information on nofollows, see this Wikipedia article, or see Yahoo!,  Google and Bing’s  support of nofollow.

6. Specify your site’s language
You can also help search engines by specifying what language your site is written in. This is a simple meta tag that looks like this:

<meta http-equiv=”content-language” content=”en”>

See all ISO codes at the Library of Congress site for more information.

To check for language meta tags on any page, you can look at the source code of the page by right clicking on the page and choosing “View Source.” Then do a search for the word language (or content-language) in the source code.

Still having problems?
If you’ve tried everything above and believe you still have indexing issues, browse the webmaster guidelines below for more information, troubleshooting, and contact information for the search engines. 

Search Engine Guidelines for webmasters

For more on getting your site noticed, refer to Laura Lippay’s previous post “Is Your Site Invisible?”

— Laura Lippay, Director of Technical Marketing

(Image by Kapungo via Flicker, CC 2.0)

Posted by Administrator

[ Categories: Uncategorized ]

8 Comments Add your own

  • 1. 0845 numbers  |  March 25th, 2010 at 2:18 am

    0800 Numbers, 0845 Numbers, 0844 Numbers are our specialty. If you are after a professional non geographic number for your business, you’re in the right place.

  • 2. corian  |  April 27th, 2010 at 6:53 am

    Thanks for the advice. I’m hoping our SEO company is doing all of this for us. Don;t know if there is a way for a lay person to check.

  • 3. Lexus Parts  |  April 27th, 2010 at 6:58 am

    Just ask your SEO company to show you.. if they are doing these things they will gladly show it.

  • 4. CuringHerbs.com  |  April 27th, 2010 at 7:16 am

    There robot.txt mentioned as robot.txt and Robot.txt What is right writing? Daes it matter?
    Chinese herbs and supplements,

  • 5. Mathews  |  April 27th, 2010 at 5:57 pm

    I knew about robots and nofollow tags but noindex tab was totally new concept to me. Thanks

  • 6. Daniel  |  April 27th, 2010 at 9:20 pm

    Some very interesting information in your post, I will be doing a review of our site soon and looking at the family crest images

  • 7. Peter Coughlin  |  April 30th, 2010 at 5:27 am

    Interesting that you list the robots.txt at number one. I wonder does that mean it’s the most important?

    Anyway, I just wanted to mention my wordpress plugin which automatically creates a robots.txt file with the right code in it to allowing search engine crawlers, but not spambots.

    http://petercoughlin.com/robotstxt-wordpress-plugin/

  • 8. Simon  |  June 23rd, 2010 at 2:43 am

    Hmm… I am fairly certain that AJAX is not a ‘Scriping’ technique…

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
FOLLOW US
USEFUL LINKS
OUR PHOTOS
www.flickr.com
This is a Flickr badge showing public photos from ysmblogger. Make your own badge here.
POSTS BY SUBJECT
BLOGROLL
OTHER YAHOO! BLOGS

We encourage comments and look forward to hearing from you. Please note that Yahoo! may, in our sole discretion, remove comments if they are off topic, inappropriate, or otherwise violate our Terms of Service.

Powered by WordPress
Hosted by Yahoo!

Copyright © 2008 Yahoo! Inc. All Rights Reserved | Copyright/IP Policy | Terms of Service | Trademarks | Patents | Help
NOTICE: We collect personal information on this site. To learn more about how we use your information, see our Privacy Policy.