Back to homepage

How SiteGuard Works

Technical documentation of our 14 scan modules

No Lighthouse — Custom Analysis Engine

SiteGuard deliberately does NOT use Google Lighthouse. Instead, we use a custom fetch-based analysis engine that runs on Vercel Serverless. This means: consistent results, no browser dependency, and significantly faster scans (2-3 seconds per page instead of 15-30 seconds with Lighthouse).

Important Legal & Accessibility Notice

Automated legal and accessibility scans provide technical signals and prioritization. They do not replace legal advice, a legal review, or a manual WCAG/EAA/BFSG certification by qualified experts.

How a Scan Works

  1. 1Website URL is fetched — HTML, response headers, cookies are captured
  2. 2Internal links + sitemap are analyzed — up to 5 pages are scanned (root + 4 subpages)
  3. 3Depending on the plan, up to 12 core modules plus Cookie Audit and Discoverability run as companion scans — checks use HTML, headers, cookies, DNS/RDAP, and targeted HEAD requests depending on the module
  4. 4Results are aggregated — scores 0-100 per module, issues sorted by severity
  5. 5AI report (optional) — Claude AI creates a management summary with top 5 action items

Methodology, Limits, and Confidence

SiteGuard intentionally separates measured technical signals, derived prioritization, and areas that require manual or legal review. This table shows how reliable each result category is.

Accessibility

Measured

HTML signals such as lang attribute, title, alt text, H1, labels, link text, and selected WCAG mapping.

Not Automatically Assessed

Keyboard navigation, screen reader behavior, focus order, rendered UI contrast, and full WCAG/BFSG/EAA conformance.

Confidence

High for technical presence signals; medium for the derived priority.

Performance

Measured

Response time, page size, resource count, broken links, redirects, SSL status, and HEAD request results.

Not Automatically Assessed

Real Core Web Vitals such as LCP, CLS, and INP without browser or CrUX data.

Confidence

High for fetch, header, and link signals; no statement on real Core Web Vitals.

Privacy & Legal

Measured

Cookie and tracker patterns, CMP detection, imprint/privacy links, contact details, and HTTPS signals.

Not Automatically Assessed

Legal completeness, individual legal bases, contractual context, data flows, and sector-specific obligations.

Confidence

Medium; technical evidence is reliable, legal assessment remains case-specific.

Cookie Audit

Measured

Cookies, storage, third-party requests, and CMP declarations before consent, after rejection, and after acceptance.

Not Automatically Assessed

Legal approval of consent design, banner wording, and complete review of all privacy texts.

Confidence

Medium to high for observed browser states; no legal clearance.

Security

Measured

Publicly visible headers, SSL/TLS, CSP, HSTS, mixed content, CORS signals, and known frontend library patterns.

Not Automatically Assessed

Penetration testing, auth/business-logic vulnerabilities, server internals, and non-public infrastructure.

Confidence

High for observable web signals; no statement on hidden vulnerabilities.

SEO & Discoverability

Measured

Meta tags, robots.txt, sitemap, canonical, hreflang, structured data, crawl coverage, and noindex conflicts.

Not Automatically Assessed

Actual Google ranking, guaranteed indexing, search demand, backlink quality, and competitive analysis.

Confidence

High for technical discoverability signals; no ranking or indexing guarantee.

The 14 Modules and Companion Scans in Detail

Privacy Scanner

HTML pattern matching + cookie headers

  • Cookie detection from Set-Cookie headers
  • 13 third-party tracker patterns (Google, Meta, TikTok, LinkedIn, etc.)
  • 10 consent management platforms (Cookiebot, Usercentrics, OneTrust, etc.)
  • Cookie classification (necessary/analytics/advertising)
  • GDPR/ePrivacy assessment

Scoring: Start 100. No consent banner: -50. Trackers without consent: -10 each (max -30). Non-essential cookies: -5 each (max -20).

Accessibility Audit

HTML regex analysis (no axe-core browser needed)

  • Images without alt text (WCAG 1.1.1)
  • Missing HTML lang attribute (WCAG 3.1.1)
  • Missing page title (WCAG 2.4.2)
  • H1 presence and heading hierarchy (WCAG 1.3.1)
  • Inputs without labels (WCAG 4.1.2)
  • Empty links (WCAG 2.4.4)
  • EAA prioritization signal

Scoring: Start 100. Missing lang: -15. Missing title: -10. Images without alt: -3 each (max -20). Inputs without label: -5 each (max -15). No H1: -10.

SEO + GEO Audit

HTML analysis + HEAD requests + JSON-LD parsing

  • Title, meta description, viewport, canonical
  • Open Graph (8 tags) + Twitter Card (4 tags)
  • Structured data: 13 Schema.org types with required field validation
  • GEO score: content structure, entity signals, AI discoverability, citation readiness
  • Sitemap.xml, robots.txt, hreflang tags
  • Favicon completeness, social preview quality
  • Image optimization: dimensions, lazy loading, WebP/AVIF, file size
  • RSS/Atom feed detection, web manifest, resource hints

Scoring: 28+ individual checks. Missing title: -15. Missing meta: -15. No OG: -10. No structured data: -10. Plus GEO score 0-100 separately.

Security Scanner

fetch() + node:https for SSL inspection

  • 10 HTTP security headers (HSTS, CSP, X-Frame-Options, etc.)
  • SSL/TLS certificate validation + expiry
  • CSP deep analysis (unsafe-inline, unsafe-eval, wildcards, frame-ancestors)
  • HTTPS redirect check
  • Mixed content detection
  • Subresource Integrity (SRI)
  • CORS configuration
  • Server information leakage (version disclosure)
  • Outdated JS libraries (jQuery <3.5, Bootstrap <5, etc.)
  • Grading A+ to F (like SecurityHeaders.com)

Scoring: Start 100. Missing HSTS: -15. Missing/weak CSP: -15. SSL issues: up to -30. CORS wildcard: -10. Mixed content: -3 each. Server leak: -3 each.

Performance Check

fetch() with timing + HEAD requests for links

  • Response time (via fetch timing)
  • Page size (Content-Length)
  • Broken links: all resource URLs (a, img, script, link, video, iframe)
  • Redirect chains (manual following, hop counting)
  • SSL validation
  • Resource count (scripts, stylesheets, images)
  • Broken images (HEAD request check)
  • Oversized images (>500KB)

Scoring: Start 100. Response >3s: -20, >5s: -30. Broken internal: -5 each. Broken external: -2 each. Broken image: -3 each. Redirect chains: -2 each.

Tag Validator

HTML pattern matching across all scanned pages

  • 12 tag types: GA4, GTM, Meta Pixel, LinkedIn, TikTok, Hotjar, Matomo, etc.
  • Tag ID extraction (G-XXXXX, GTM-XXXXX, pixel IDs)
  • DataLayer detection
  • Cross-page consistency check (tag on homepage but missing on subpages?)
  • Duplicate detection

Scoring: Start 100. No analytics: -20. GA without GTM: -10. No dataLayer with GTM: -15. Inconsistent tags: -3 each (max -9).

Legal Compliance

HTML pattern matching for DACH law

  • Imprint/Impressum link present
  • Privacy policy/Datenschutz link present
  • Cookie banner detection (20+ CMP platforms)
  • Terms/AGB link present
  • Contact information (email, phone)
  • HTTPS active

Scoring: Start 100. No imprint: -25. No privacy: -25. No cookie banner: -15. No terms: -10. No contact: -10.

Content Changes

Text fingerprinting + comparison

  • Text extraction (HTML tags stripped)
  • Word, link, and image counts
  • Content hash (fingerprint)
  • Comparison with previous scan
  • Change detection: none/minor/significant/major

Scoring: 100 = no change. 80 = minor change (<10%). 50 = significant. 30 = major.

SSL & Domain

node:https + node:dns + RDAP API

  • SSL certificate: validity, issuer, expiry, protocol
  • DNS records: A, AAAA, MX, NS, TXT
  • DMARC record
  • SPF record
  • Domain WHOIS via RDAP (expiry date, registrar)

Scoring: Start 100. SSL expired: -40. SSL <7 days: -25. No DMARC: -10. No SPF: -10. Domain <30 days: -15.

CO₂ Footprint

fetch() + page size measurement

  • Transfer size (KB)
  • Resource count (scripts, styles, images, fonts)
  • CO₂ estimate: 0.2g per MB transferred
  • Rating: A+ (<0.5g), A (<1g), B (<1.5g), C (<2g), D (>2g)
  • Comparison with global average (1.76g per page view)

Scoring: A+ = 100. A = 85. B = 70. C = 50. D = 30.

Tech Stack

HTML + response header analysis

  • 17 CMS with version detection (WordPress, TYPO3, Drupal, Shopify, etc.)
  • 15 frontend frameworks with version (React, Next.js, Vue, Angular, jQuery, etc.)
  • 10 JS libraries (Lodash, GSAP, Three.js, D3.js, etc.)
  • 8 CDN providers (Cloudflare, Vercel, AWS CloudFront, etc.)
  • 14 analytics tools
  • 9 CSS frameworks
  • Font providers (Google Fonts, Adobe Fonts)
  • Server + hosting detection
  • Programming language hints (X-Powered-By)

Scoring: Informational — always score 100. No deductions.

Third-Party Risk

HTML src/href extraction + domain classification

  • External domains from all resource URLs
  • Categorization: analytics, advertising, social, CDN, fonts, maps, video, payment
  • Risk assessment: known tracker (high), CDN (low), unknown (medium)
  • Count: total third-parties, high-risk percentage

Scoring: Start 100. >10 third-parties: -5. >20: -15. High risk: -10 each (max -30). Unknown: -3 each (max -15).

Cookie Audit

Browser/fetch-based companion scan across three consent states

  • Consent banner detection and provider
  • Cookies, Local Storage, and Session Storage before consent, after rejection, and after acceptance
  • Third-party requests by consent state
  • CMP declarations
  • Findings for non-essential cookies and tracking before consent

Scoring: Dedicated score from consent findings. Cookie Audit runs as a website-scoped companion scan and is not stored as a regular scan_result module.

Discoverability

robots.txt + sitemap fetching + crawl comparison

  • robots.txt status and sitemap directives
  • Sitemap URLs and lastmod validation
  • Crawl coverage
  • Noindex conflicts
  • Orphan URLs and missing sitemap entries
  • IndexNow key and submission

Scoring: Dedicated score from discoverability findings. Discoverability runs as a website-scoped companion scan and is stored separately.

Additional Features

Multi-Page Scanning

Each scan analyzes up to 5 pages (root + 4 subpages from sitemap and internal links).

Uptime Monitoring

Ping check every 5 minutes with status history, response time, and downtime alerts.

AI Reports

Claude AI generates management summaries with top 5 action items.

PDF Export

Branded PDF report with scores, charts, and issues for download.

CSV/JSON Export

Scan results as CSV or JSON for further analysis.

Score Trends

Trend chart showing score development across all scans.

Scheduled Scans

Automatic scans daily, weekly, or monthly via Inngest cron.

Why Not Lighthouse?

Google Lighthouse is unreliable in serverless environments, produces inconsistent results, and requires a full Chrome browser. SiteGuard uses a custom fetch-based engine instead: consistent results, 2-3 seconds per page (vs 15-30s), and runs on Vercel Serverless without Chromium.

Note: Real Core Web Vitals (LCP, CLS, INP) require a browser. These can be added later via the CrUX API (Chrome real-user data) — more reliable than Lighthouse lab data.