This post is all about how Search Engine works and how search engine uses the bots and other core programs to validate and ranks the site in SERP (Search Engine Results page).
Spider or Crawler
Spider or Crawler is the bot programs, specially programmed to crawl a complete web page/post. The spider is another name of the crawler. Generally every search engine have an own crawling software programs (bots) to crawl the sites on the internet. Crawler fetches the data on the page/post.
Robots (crawlers) first checks the robots.txt file in the websites where the robot commands and rules to crawl the page is clearly defined in the text and command line format.
Example: robots.txt File in www.esearchadvisors.com
Robot commands description
User-agent: * is used to allow the all search bots to crawl the page.
Disallow: /cgi-bin/ is used to block the /cgi-bin/ folder from crawl. It contains non-index files. So, bots don’t crawl the files inside the /cgi-bin/ folder.
This command line indicates the site map link to the crawler. BOTS fetches all the links in the site-map and index the site frequently as possible.
Code to block other search engines except Google
The above code can only allow the Google bot to fetch details on the page the other search engines like yahoo, bing, ask, yandex, baidu and others can’t fetch the data on the site.
Code to block specific file types
Code used to block .gif type of images from crawl
Code used to block .jpg type of images from crawl
Code used to block .png type of images from crawl
Code used to block .mp4 type of videos from crawl
Searches related to Search Engine Crawler Working
- search engine crawler test
- web crawler working
- crawler tool
- what is a web crawler and how does it work
- web crawler software