Robot (also known as Spider or Crawler), is a program that traverses the Web automatically. It inspects all the websites, evaluates them, and reports to the Search Engine. The Search Engine updates its index with the data the Robot reports about your website. And, as soon as the Robot’s made a world trip, he’s ready to start yet another one, to check the old spots and explore new ones — to keep the Search Engine’s index up-to-date.
And what’s a Search Engine’s index? It’s a huge database, where record is kept of everything that robots find on the Web: web pages and any possible information on them. In terms of SEO, it’s crucial for a web page that all that’s good on it is recorded in the Search Engine’s index (= is indexed). If it’s not there, the page won’t be found through search, or will not bring you the results you wish.
This all means that you want the Robot to visit all your important pages, and look at every detail that can be found there. Now, one by one, let’s see the things that are important for the Robot, and even more important for you.
Find fast and reliable hosting
The Robots love fast and reliable websites. That is, let’s get fast and reliable hosting. This will guarantee that your web server is never down when a Search Engine spider tries to index it, and it’s always fast enough, both for the Robot and for users.
Why is this important? We all know that a site can be down sometimes. If this only happens on very rare occasions, it’s not that bad. But if you’ve got problems with hosting, and your site doesn’t respond quite often, the Robot may leave this kind of site not checked. And, it’s even possible that the website gets removed from the Search Engine database. Not to say that you simply might be losing sales, because users can’t reach you.
Therefore, host your site on reliable servers that are very seldom down and that are fast. By the way, your users will like speed as much as the Robot does. The faster your hosting, the sooner your site loads, the more visitors like it, and the faster they give you their money.
Create a sitemap
In its simplest terms, a sitemap is a list of the pages on your website. Generally, there are two types of sitemaps.
An HTML sitemap is made both for human users and for Search Engines and helps them easily find
the information they need. An XML Sitemap (it’s normally called a Sitemap, with a capital S) is for Search Engines only.
Create and submit a Sitemap, and thus you’ll make sure that Search Engines know about all the pages on your site, including URLs that can’t be naturally discovered by Search Engines’ crawlers.
Now what do you need, so that the Robot can visit all your pages, fast? Right, an accurate overall sitemap. You can either let a sitemap generator go and do its thing or you can tweak settings to generate the Sitemap that shows the engines exactly how you want your site crawled.
Things to tweak in your Sitemap:
- Sitemap tags
- Sitemaps segmentation — divvy up individual Sitemaps by type and by a structure that will best help you diagnose indexation shortcomings. Give them descriptive names as well.
- Exclude URLs that should NOT be indexed
- Exclude URLS disallowed in robots.txt (good time to make sure you’re disallowing the right urls)
- Exclude URLs disallowed via meta noindex tags
- Exclude duplicate URLS
- Exclude private pages
After you create the sitemap, you will upload it to your site, ideally at the root directory like so: example.com/sitemap.xml
Then you need to submit your xml sitemap directly to Google and Bing (which powers Yahoo) so that search engines know it’s available. It’s relatively simple to do through Google and Bing Webmaster Tools. Here’s a quick overview of the steps to take:
In Google Webmaster Tools:
- Select your site on your Google Search Console home page.
- Click Crawl.
- Click Sitemaps.
- Click ADD/TEST SITEMAP.
- Type sitemap.xml (or the name of your sitemap file).
- Click Submit Sitemap.
In Bing Webmaster Tools:
- Go to your website dashboard.
- Click “sitemaps” in the “configure my sitemap” drop-down on the left.
Finally, you should list your sitemap in your robots.txt file to ensure it’s found by all search engines other than Google, Bing, and Yahoo.
Sitemap has links to all pages that you’ve got on your site. So when you make a new page, don’t forget to add it to your sitemap, too. You won’t need to submit it to Search Engines again, just update the sitemap itself.
Rewrite dynamic URLs
A common problem for online stores, forums, blogs or other database-driven sites is: pages often have unclear URLs like this: weddinggift.com/?item=32554, and you cannot say which product or article it leads to. Though instead, they could have weddinggift.com/silk-linen.html, or weddinggift.com/pots.html, where you can easily see what’s on the page.
So the problem with such URLs like this one weddinggift.com/?item=32554 is: no one (neither users, nor even the Robot) can tell what product can be found under the URL. URLs like this, weddinggift.com/?item=32554, having parameters (here it’s item=32554) are called Dynamic URLs, while URLs like weddinggift.com/silk-linen.html are static. First of all, static URLs are much more user-friendly. For users, URLs with too much of “?”, “&” and “=” are hard to understand and pretty inconvenient. Secondly, search engines like static URLs much
better than dynamic ones. I probably wouldn’t believe this myself, but one of the biggest players in SEO industry confirmed that their search traffic jumped 20% due to static URL use instead of dynamic URLs.
It’s possible that you also need static URLs but have dynamic ones instead. But, I wouldn’t talk so much of this problem, if it couldn’t be solved. There’s a nice trick to make URLs look good to Search Engines.
An .htaccess file is a plain-text file, and using it, you can make amazing tricks with your web server. Just one example is rewriting dynamic URLs. And then when a user (or a robot) is trying to reach a page, this file gets a command to show a page URL that is user- and crawler-friendly.
This is, basically, hiding dynamic URLs behind the Search Engine-friendly URLs. I’ll give you an example for
an online store. As a rule, a page URL for some product looks like this: http://www.myshop.com/showgood.php?category=34&good=146 where there are two parameters: category – the group of products, and product – the product itself. At the same website, you may be offering Dove soap in the category of beauty products, having the URL: http://www.myshop.com/showgood.php?category=34&good=146. A bra by Victoria’s Secret, under the URL: http://www.myshop.com/showgood.php?category=56&good=54146.
To Search Engines, both pages appear like showgood.php. They just can’t understand that these are two different pages offering two different products. You can rewrite pages, so the Robot will see http://www.myshop.com/beauty-products/dove-soap.html instead of the first URL, for Dove soap http://www.myshop.com/showgood.php?category=34&good=146 and http://www.myshop.com/victorias-secret-underwear/bra.html
instead of the second one, for Victoria’s Secret bra http://www.myshop.com/showgood.php?category=56&good=54146. Now you have “speaking URLs” that are understood by the Robot and easy to check.
Writing an .htaccess file is not an easy task that requires special knowledge. Moreover, it’s your webmaster’s business. I personally never do this myself. So if you have a database-driven site, search the web for a special SEO service that will write a .htaccess file for you. Or, if you’re using a fairly well-known 3rd-party engine, you can write the .htacess file yourself, using some scripts that you can find in the Internet. To do the search, you can type in
the_name_of_your_site’s_engine “URL Rewrite” htaccess or something like that.
Now, the idea is: it’s of great use to rewrite URLs. So find the URL rewrite tools if you need them – or just find your webmaster. Then, one more thing, the old URLs that have parameters should be “hidden” from Search Engines. Next step helps you do that.
Make a robots.txt file
A robots.txt file will prevent the Robot from going to some pages with sensitive material, web pages that you don’t want to be found through Google search (for instance, the “shopping cart”), pages that are not important or can be negative for your rankings. And, you can direct the Robot to other, keyword-rich pages, instead.
So, if there’s something to hide, a robots.txt file is a must for your website. It helps you keep the Robot away from anything that’s not good for your Search Engine rankings.
Remember, if you rewrote the dynamic URLs we talked above, use robots.txt to forbid the old
URLs like this: http://www.myshop.com/showgood.php?category=56&good=54146
After you have the robots.txt file, run it through a validator to ensure it’s written correctly. Hundreds of robots.txt validators can be found on the web. As soon as the robots.txt file’s correct, you needn’t worry, as it will only do you a lot of good, and no harm.
- For now, that’s it.
- Search engines know about your site
- Crawlers can now check your site fast and without any problem
- Your chances to be found on the web have greatly increased
Taking all this together, you did great things to make Search Engines love your web pages and show
them to Internet users.