What is a robots.txt file?
Robots.txt is a text file that webmasters create to instruct web robots (usually search engine robots) how to crawl pages on their websites. The robots.txt file is part of the robots exclusion protocol (REP), a group of web standards that govern how robots crawl the web, access and index content, and serve that content to users. The REP also includes directives such as meta robots, as well as page, subdirectory, or site instructions for how search engines should treat links (such as "follow" or "nofollow"). In practice, a robots.txt file indicates whether certain user agents (web crawling software) can or cannot crawl parts of a website. These crawling instructions are specified by "denying" or "allowing" certain (or all) user agent behaviors.
User-agent: Googlebot
Disallow:
User-agent: msnbot
Disallow:
User-agent: Yahoo-slurp
Disallow:
User-agent: Slurp
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Googlebot-Mobile
Disallow:
User-agent: Googlebot-Image
Disallow:
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: *
Disallow: /
Disallow: /2014_06_01_archive.html?m=1
Disallow: /2014_07_01_archive.html?m=1
Disallow: /2015_02_01_archive.html?m=1
Disallow: /2014_05_01_archive.html?m=1
Disallow: /2014_11_01_archive.html?m=1
Disallow: /2014_08_01_archive.html?m=1
Disallow: /2015_03_01_archive.html?m=1
Disallow: /2014_04_01_archive.html?m=1
Disallow: /2014/09/mengenal-leica-hds-untuk-forensik-dan-investigasi.html
Disallow: /2014/10/bagaimanakah-proses-fabrikasi.html
Disallow: /2014/11/how-the-work-flow-3d-laser-scanning-to-be-applied-at-pertamina.html
Sitemap: http://www.gatewan.com/feeds/posts/default?orderby=UPDATED
Traces of Gatewan's blog at that time.
User-agent: Googlebot
Disallow:
Disallow: /2014_06_01_archive.html?m=1
Disallow: /2014_07_01_archive.html?m=1
Disallow: /2015_02_01_archive.html?m=1
Disallow: /2014_05_01_archive.html?m=1
Disallow: /2014_11_01_archive.html?m=1
Disallow: /2014_08_01_archive.html?m=1
Disallow: /2015_03_01_archive.html?m=1
Disallow: /2014_04_01_archive.html?m=1
Disallow: /2014/09/mengenal-leica-hds-untuk-forensik-dan-investigasi.html
Disallow: /2014/10/bagaimanakah-proses-fabrikasi.html
Disallow: /2014/11/how-the-work-flow-3d-laser-scanning-to-be-applied-at-pertamina.html
User-agent: msnbot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Yahoo-slurp
Disallow:
User-agent: Slurp
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: Googlebot-Mobile
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: *
Disallow: /
Sitemap: http://www.gatewan.com/feeds/posts/default?orderby=UPDATED
Traces of the casualarea blog at that time.
User-agent: Googlebot
Disallow:
Disallow: /s72-c/
Disallow: /delete-comment.g?blogID&m=1
Disallow: /delete-comment.g?blogID=
Disallow: /s
Disallow: /p/about-me.html?m=1
Disallow: /s?m=1
Disallow: /p/memuat.html?m=1
Disallow: /2015_02_01_archive.html
Disallow: /2014/06?m=1
User-agent: msnbot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Yahoo-slurp
Disallow:
User-agent: Slurp
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: Googlebot-Mobile
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: *
Disallow: /
Sitemap: http://www.santaiarea.com/feeds/posts/default?orderby=UPDATED
Totaltrend blog traces at that time.
User-agent: Googlebot
Disallow:
Disallow: /s
Disallow: /s72-c/
User-agent: msnbot
Disallow:
User-agent: Bingbot
Disallow:
User-agent: Yahoo-slurp
Disallow:
User-agent: Slurp
Disallow:
User-agent: Mediapartners-Google
Disallow:
User-agent: Googlebot-Mobile
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Yahoo-MMCrawler
Disallow:
User-agent: *
Disallow: /
Sitemap: http://www.totaltren.com/feeds/posts/default?orderby=UPDATED