Technical documentation of our 14 scan modules
SiteGuard deliberately does NOT use Google Lighthouse. Instead, we use a custom fetch-based analysis engine that runs on Vercel Serverless. This means: consistent results, no browser dependency, and significantly faster scans (2-3 seconds per page instead of 15-30 seconds with Lighthouse).
Automated legal and accessibility scans provide technical signals and prioritization. They do not replace legal advice, a legal review, or a manual WCAG/EAA/BFSG certification by qualified experts.
SiteGuard intentionally separates measured technical signals, derived prioritization, and areas that require manual or legal review. This table shows how reliable each result category is.
Measured
HTML signals such as lang attribute, title, alt text, H1, labels, link text, and selected WCAG mapping.
Not Automatically Assessed
Keyboard navigation, screen reader behavior, focus order, rendered UI contrast, and full WCAG/BFSG/EAA conformance.
Confidence
High for technical presence signals; medium for the derived priority.
Measured
Response time, page size, resource count, broken links, redirects, SSL status, and HEAD request results.
Not Automatically Assessed
Real Core Web Vitals such as LCP, CLS, and INP without browser or CrUX data.
Confidence
High for fetch, header, and link signals; no statement on real Core Web Vitals.
Measured
Cookie and tracker patterns, CMP detection, imprint/privacy links, contact details, and HTTPS signals.
Not Automatically Assessed
Legal completeness, individual legal bases, contractual context, data flows, and sector-specific obligations.
Confidence
Medium; technical evidence is reliable, legal assessment remains case-specific.
Measured
Cookies, storage, third-party requests, and CMP declarations before consent, after rejection, and after acceptance.
Not Automatically Assessed
Legal approval of consent design, banner wording, and complete review of all privacy texts.
Confidence
Medium to high for observed browser states; no legal clearance.
Measured
Publicly visible headers, SSL/TLS, CSP, HSTS, mixed content, CORS signals, and known frontend library patterns.
Not Automatically Assessed
Penetration testing, auth/business-logic vulnerabilities, server internals, and non-public infrastructure.
Confidence
High for observable web signals; no statement on hidden vulnerabilities.
Measured
Meta tags, robots.txt, sitemap, canonical, hreflang, structured data, crawl coverage, and noindex conflicts.
Not Automatically Assessed
Actual Google ranking, guaranteed indexing, search demand, backlink quality, and competitive analysis.
Confidence
High for technical discoverability signals; no ranking or indexing guarantee.
HTML pattern matching + cookie headers
Scoring: Start 100. No consent banner: -50. Trackers without consent: -10 each (max -30). Non-essential cookies: -5 each (max -20).
HTML regex analysis (no axe-core browser needed)
Scoring: Start 100. Missing lang: -15. Missing title: -10. Images without alt: -3 each (max -20). Inputs without label: -5 each (max -15). No H1: -10.
HTML analysis + HEAD requests + JSON-LD parsing
Scoring: 28+ individual checks. Missing title: -15. Missing meta: -15. No OG: -10. No structured data: -10. Plus GEO score 0-100 separately.
fetch() + node:https for SSL inspection
Scoring: Start 100. Missing HSTS: -15. Missing/weak CSP: -15. SSL issues: up to -30. CORS wildcard: -10. Mixed content: -3 each. Server leak: -3 each.
fetch() with timing + HEAD requests for links
Scoring: Start 100. Response >3s: -20, >5s: -30. Broken internal: -5 each. Broken external: -2 each. Broken image: -3 each. Redirect chains: -2 each.
HTML pattern matching across all scanned pages
Scoring: Start 100. No analytics: -20. GA without GTM: -10. No dataLayer with GTM: -15. Inconsistent tags: -3 each (max -9).
HTML pattern matching for DACH law
Scoring: Start 100. No imprint: -25. No privacy: -25. No cookie banner: -15. No terms: -10. No contact: -10.
Text fingerprinting + comparison
Scoring: 100 = no change. 80 = minor change (<10%). 50 = significant. 30 = major.
node:https + node:dns + RDAP API
Scoring: Start 100. SSL expired: -40. SSL <7 days: -25. No DMARC: -10. No SPF: -10. Domain <30 days: -15.
fetch() + page size measurement
Scoring: A+ = 100. A = 85. B = 70. C = 50. D = 30.
HTML + response header analysis
Scoring: Informational — always score 100. No deductions.
HTML src/href extraction + domain classification
Scoring: Start 100. >10 third-parties: -5. >20: -15. High risk: -10 each (max -30). Unknown: -3 each (max -15).
Browser/fetch-based companion scan across three consent states
Scoring: Dedicated score from consent findings. Cookie Audit runs as a website-scoped companion scan and is not stored as a regular scan_result module.
robots.txt + sitemap fetching + crawl comparison
Scoring: Dedicated score from discoverability findings. Discoverability runs as a website-scoped companion scan and is stored separately.
Each scan analyzes up to 5 pages (root + 4 subpages from sitemap and internal links).
Ping check every 5 minutes with status history, response time, and downtime alerts.
Claude AI generates management summaries with top 5 action items.
Branded PDF report with scores, charts, and issues for download.
Scan results as CSV or JSON for further analysis.
Trend chart showing score development across all scans.
Automatic scans daily, weekly, or monthly via Inngest cron.
Google Lighthouse is unreliable in serverless environments, produces inconsistent results, and requires a full Chrome browser. SiteGuard uses a custom fetch-based engine instead: consistent results, 2-3 seconds per page (vs 15-30s), and runs on Vercel Serverless without Chromium.
Note: Real Core Web Vitals (LCP, CLS, INP) require a browser. These can be added later via the CrUX API (Chrome real-user data) — more reliable than Lighthouse lab data.