hardrock 发表于 2014-12-29 15:35:48

悬赏wordpress英文站robots.txt 规则的完整写法

本帖最后由 hardrock 于 2014-12-29 15:45 编辑

悬赏wordpress英文站robots.txt 规则的完整写法
按以下要求写
1,只能Alexa, aol, Ask, Google、Yahoo、Bing的爬虫访问,其他的禁止
2,禁止访问特定目录,文件类型的通用写法,适用于wordpress英文版3.6~4.1版本
3,Sitemaps文件







yzhvps 发表于 2014-12-29 15:35:49

本帖最后由 yzhvps 于 2014-12-29 22:09 编辑

User-agent: Alexa*
Allow: /
User-agent: Aol*
Allow: /
User-agent: Ask*
Allow: /
User-agent: Yahoo*
Allow: /
User-agent: Bing*
Allow: /
User-agent: Google*
Allow: /

User-agent: *
Disallow: /
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

Sitemap: http://www.eefaq.com/sitemap.xml
很粗率,完了上robots验证验证一下或者麻烦些。。。

User-agent: Alexa*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: Aol*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: Ask*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: Yahoo*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: Bing*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: Google*
Disallow: /*.php$
Disallow: /search/
Disallow: /tag/*?*
Disallow: /*.html?*
Disallow: /page/*?*
Disallow: /wp-admin
Disallow: /wp-*.php
Disallow: /comments/
Disallow: /trackback
Disallow: /*.html/*?*
Disallow: /*/trackback
Disallow: /comments/feed
Disallow: /*?replytocom=*
Disallow: /*/comment-page-*

User-agent: *
Disallow: /
Disallow: *

Sitemap: http://www.eefaq.com/sitemap.xml







foxconndmd 发表于 2014-12-29 16:10:49

1.学习下robots.txt语法,其实很简单
2.google下各个ua

work4seo 发表于 2014-12-29 17:39:32

本帖最后由 work4seo 于 2014-12-29 17:40 编辑

只写了google的写法,你需要让哪个爬虫爬,照抄一份UA进去就行了。
User-agent: Googlebot
Allow: /
Disallow: /mulu/
Disallow: /*.css$


User-agent: Googlebot-Mobile
Allow: /
Disallow: /mulu/
Disallow: /*.css$

User-agent: *
Disallow: /

Sitemap:http://www.xxx.com/sitemap.xml
其实百度百科里面写的很详细啊。。。
http://baike.baidu.com/view/9274458.htm#3_1

hardrock 发表于 2014-12-29 19:57:12

work4seo 发表于 2014-12-29 17:39 static/image/common/back.gif
只写了google的写法,你需要让哪个爬虫爬,照抄一份UA进去就行了。
User-agent: Googlebot
Allow: /



Alexa, aol, Ask, Yahoo、Bing??

cnwebmasters201 发表于 2014-12-29 23:48:51

要将
User-agent: *
Disallow: /

放在最前面吧?后面的设置会覆盖前面的设置?

cnwebmasters201 发表于 2014-12-30 00:10:04


User-agent: *
Disallow: /

User-agent: Google*,Yahoo*,msn*,.....
Allow: /
Disallow: /mulu/
Disallow: /*.css$

Sitemap: http://www.eefaq.com/sitemap.xml

cnwebmasters201 发表于 2014-12-30 00:16:14


User-agent: *
Disallow: /

User-agent: Google*,Yahoo*,msn*,.....
Allow: /
Disallow: /mulu/
Disallow: /*.css$

Sitemap: http://www.eefaq.com/sitemap.xml

euguene 发表于 2014-12-30 10:29:14

楼上几个都不错,炒一下吧

gger 发表于 2014-12-30 20:50:17

这么多都要写啊。。。

盛老师 发表于 2014-12-30 23:48:03

楼主如果找到了 可以分享一下吗 :'(:'(:'(
页: [1]
查看完整版本: 悬赏wordpress英文站robots.txt 规则的完整写法