Contents
What Does 'Blocked by robots.txt' Mean?
Learn how robots.txt can prevent Google from crawling your pages and how to fix this common indexing issue.
Understanding 'Blocked by robots.txt' Status
When Google reports a URL as "blocked by robots.txt," it means Google has discovered the URL but cannot crawl it because your site's robots.txt file contains directives that prevent Googlebot from accessing that page. This is a technical issue that can be easily fixed once you understand how robots.txt works.
How robots.txt Affects Indexing
The robots.txt file is a text file located at the root of your website that tells search engine crawlers which pages or sections of your site they can or cannot access:
- •
Direct Blocking
A "Disallow" directive explicitly prevents Googlebot from crawling specific URLs or patterns
- •
Partial Indexing Possible
In some cases, Google may still index a blocked URL based on information from other sources, but without seeing the actual content
- •
Common Mistakes
Accidentally blocking important pages through overly broad patterns like "Disallow: /blog" when you only meant to block certain blog categories
- •
User-Agent Specific Rules
Different directives can apply to different crawlers, so a page might be blocked for Googlebot but not for other search engines
How to Fix 'Blocked by robots.txt' Issues
Follow these steps to resolve robots.txt blocking issues and get your content indexed:
- •
Check Your robots.txt File
Visit yourdomain.com/robots.txt to see the current directives and identify what might be blocking your URLs
- •
Use the robots.txt Tester
Google Search Console provides a robots.txt tester to verify if specific URLs are blocked and by which directives
- •
Modify Your robots.txt File
Edit the file to remove or refine the blocking directives, being careful to only allow pages you want indexed
- •
Monitor with MyURLMonitor
After fixing your robots.txt file, use MyURLMonitor to track when Google recrawls and indexes the previously blocked pages
Common robots.txt Patterns and Their Effects
Understanding these common robots.txt directives will help you avoid accidental blocking:
- •
Block Everything
User-agent: *
- Blocks all crawlers from the entire site (very dangerous for SEO)
Disallow: / - •
Block a Directory
Disallow: /admin/
- Blocks crawling of all URLs in the /admin/ directory - •
Block File Types
Disallow: /*.pdf$
- Blocks crawling of all PDF files - •
Allow Everything
User-agent: *
- Explicitly allows crawling of the entire site (default behavior even without a robots.txt)
Allow: /
Quick Fix Potential
Unlike other indexing issues, 'Blocked by robots.txt' problems can often be resolved quickly by editing a single file, with results visible in days rather than weeks.