19 intentional issues

CRITICALIMPORTANTSTANDARD

#8 All redirects in place (301/302 correct)

#9 No broken internal links

#10 www vs non-www redirects correct

#11 No redirect chains or loops

#14 Crawl depth shallow (3 clicks from homepage)

#17 No orphan pages (all linked somewhere)

#50 External links use noopener noreferrer

#157 Image sitemap created (separate or <image:image> entries in main sitemap)

#165 Crawl-delay directive set where server needs rate limiting

#175 Canonical tag not blocked by robots.txt (Google must be able to access canonical)

#180 Video sitemap created with <video:video> entries (duration, thumbnail, URL)

#187 GPTBot (OpenAI) explicitly allowed or disallowed in robots.txt

#188 ClaudeBot (Anthropic) explicitly allowed or disallowed in robots.txt

#189 PerplexityBot explicitly allowed or disallowed in robots.txt

#190 Google-Extended bot rule set in robots.txt (used for Gemini/Vertex AI training)

#191 CCBot (Common Crawl) rule set in robots.txt

#199 llms-full.txt published for AI systems requiring full-content access

#220 Moved/deleted pages return 301 redirect (not instant 404) for at least 1 year

#186 llms.txt file created at /llms.txt (AI model content manifest)

Crawl Issues — Redirects, Depth & Canonicals

19 Intentional Issues

This page demonstrates redirect, crawl depth, canonical, and link-related SEO issues. Broken internal links and redirect chains are intentionally present.

Redirect Configuration & Loop Detection#8Issue #8: Redirect types (301/302) not correctly configured#11Issue #11: Redirect chains and loops present — hurt SEO performance#220Issue #220: Moved pages should return 301 for at least 1 year, not instant 404

This site uses 302 (temporary) redirects where 301 (permanent) redirects should be used. It also contains redirect chains and potential redirect loops (e.g., A → B → A or A → A) that waste crawl budget and cause browser errors.

Redirect Loop Checker Tool:

Run locally: node redirect-tracker.js --test

Tracked redirect loop examples detected programmatically: A → B → A, A → A.

Broken internal link (404) — Issue #9

www vs non-www#10Issue #10: www vs non-www redirects not correctly set up

Both www.example.com and example.com serve content without a canonical redirect, creating duplicate content.

Crawl Depth#14Issue #14: Pages more than 3 clicks from homepage — too deep for crawlers

Some pages on this site are buried more than 3 clicks from the homepage, making them harder for search engine crawlers to discover within their crawl budget.

Orphan Pages#17Issue #17: Pages with no internal links pointing to them

Several pages on this site have zero inbound internal links, making them invisible to search engine crawlers following links.

External Link Security#50Issue #50: External links missing rel=noopener noreferrer

External link without noopener noreferrer (intentional) →

Canonical Tag Issues#172Issue #172: Canonical URL does not match URL in XML sitemap#175Issue #175: Canonical page blocked by robots.txt

SEO Issue #172 — Canonical vs Sitemap Mismatch:

This page canonical: https://acmeanalytics.example.com/crawl-issues

Sitemap URL: http://acmeanalytics.example.com/crawl-issues

Protocol mismatch (https vs http) — Google cannot confirm canonical

SEO Issue #175 — Canonical Blocked by robots.txt:

Canonical href: https://acmeanalytics.example.com/blocked-canonical/about

robots.txt rule: Disallow: /blocked-canonical/

Google can see the canonical tag but cannot fetch or validate the canonical URL

Image & Video Sitemaps#157Issue #157: No image sitemap entries in XML sitemap#180Issue #180: No video sitemap with required video:video entries

The sitemap declares image: and video: namespaces but contains ZERO <image:image> or <video:video> entries, reducing image/video discovery by Google.

AI Discoverability Files Missing#186Issue #186: /llms.txt file missing — AI crawlers cannot discover site content manifest#199Issue #199: /llms-full.txt file missing — AI systems cannot access full content

Files intentionally absent (returns 404):

GET /llms.txt → 404 Not Found (Issue #186)

GET /llms-full.txt → 404 Not Found (Issue #199)

llms.txt (emerging standard) tells LLMs what site content is available for citation. llms-full.txt provides extended full-content access for AI ingestion. Neither file exists on this site — intentional test case.

Page Indexability Audit#227Issue #227: Critical pages not confirmed indexable via audit

No indexability audit has been performed. Some pages may inadvertently carry a noindex directive or be blocked in robots.txt.