What AI model does the GreenClaims Scanner use?

The scanner uses ClimateBERT, a transformer model fine-tuned on climate and sustainability text including IPCC reports, ESG disclosures, and regulatory documents. It classifies claims by type, specificity, verifiability, and greenwashing pattern, then maps results against ECGT rules via a separate rule-based layer.

What score is needed to get the compliance badge?

Sites need a score of 80 or above on a full-site scan to qualify for the GreenClaims compliance badge. The badge is a dynamic embed that expires after 90 days and links to a public verification page showing scan date, score, and scope.

Does the scanner check the accuracy of environmental claims or just their structure?

The scanner checks structural compliance — whether claims are specific, substantiated, scoped, and use permissible language under ECGT. It does not verify the factual accuracy of the underlying data. If a company falsely claims 30% recycled content, that factual inaccuracy is outside the scanner's scope.

How often is the ECGT rule layer updated?

The rule layer is updated as member states publish implementing legislation and as national enforcement bodies publish guidance. You are always checked against the current state of the law, not a historical snapshot.

Can the scanner handle JavaScript-rendered content?

Yes. Dynamic content rendered by JavaScript is handled via headless browser rendering. This is essential for e-commerce sites where sustainability claims often appear in asynchronously loaded product data that a static HTML fetch would miss.

How Our Green Claims Website Scanner Works

Paste a URL. Get a compliance score in under two minutes. That is the promise — but the question we hear most from teams evaluating the tool is: what is actually happening under the hood? This page explains the full pipeline, from crawl to badge, so you can judge whether the results are trustworthy enough to act on.

Short answer: yes. The scanner uses ClimateBERT, a transformer model fine-tuned specifically on climate and sustainability language, combined with a rule-based ECGT layer that maps detected claims to the directive's banned terms and substantiation requirements. The result is not a keyword filter — it understands context.

Step 1: Crawl and Content Extraction

When you submit a URL, the scanner fetches the page and strips non-content elements — navigation, footers, scripts, and ads. What remains is the substantive text that a visitor (and a regulator) would read: product descriptions, marketing copy, blog content, landing page claims.

The crawler respects robots.txt and does not index or store page content beyond the analysis session. For paid plans, full-site crawls follow internal links up to the configured depth, so every product page and category description gets scanned, not just the homepage.

Dynamic content rendered by JavaScript is handled via headless browser rendering. This matters for e-commerce sites where sustainability claims often live in product data loaded asynchronously — a static HTML fetch would miss them entirely.

Step 2: ClimateBERT Analysis

ClimateBERT is a BERT-based language model pre-trained on a large corpus of climate-related text: IPCC reports, sustainability disclosures, academic papers, regulatory documents, and corporate ESG reports. Standard language models struggle with sustainability claims because the vocabulary is highly domain-specific and context-dependent. "Carbon neutral" means something precise in a regulatory context and something vague in a marketing context — ClimateBERT understands that distinction.

What ClimateBERT Detects

The model classifies text segments across several dimensions:

Claim type: Is this an environmental claim? Is it generic, comparative, or future-oriented?
Specificity: Is the claim quantified, time-bound, and scoped? Or vague and absolute?
Verifiability: Does the claim reference a standard, certification, or methodology that would allow independent verification?
Greenwashing patterns: Does the text exhibit known greenwashing signals — hidden trade-offs, irrelevant claims, lesser-of-two-evils framing, false labelling?

The output for each detected claim is a classification label plus a confidence score. Claims below a confidence threshold are flagged as "review required" rather than automatically marked non-compliant — the scanner does not generate false positives on ambiguous text.

Step 3: ECGT Rule Layer

ClimateBERT provides the semantic understanding. A separate rule-based layer maps results to ECGT's specific requirements under Directive 2024/825/EU.

This layer checks:

Banned terms: Does the page use any of the terms explicitly prohibited under ECGT Annex I without a qualifying EU ecolabel? The banned terms list includes "eco-friendly", "environmentally friendly", "green", "sustainable", "natural", "climate neutral", and several dozen others.
Comparative claim requirements: If a comparative claim is detected, does the page reference the methodology and data used? ECGT Article 3 requires comparative claims to specify the comparison basis.
Future claim requirements: If a future commitment claim is detected ("we will be carbon neutral by X"), is there a linked public plan with interim milestones?
Scope limitations: Are partial claims clearly scoped? "Our packaging is 100% recycled" must not imply the whole product is sustainable.

The ECGT layer is updated as member states publish implementing legislation and as enforcement bodies (ACM, DGCCRF, CMA, etc.) publish guidance notes. You are always being checked against the current state of the law, not a snapshot from 18 months ago.

Step 4: Scoring

Each scanned page receives a compliance score from 0 to 100. The score is a weighted composite:

Factor	Weight	Notes
Banned term usage	40%	Each instance reduces score; severity weighted by prominence (H1 > body text)
Unsubstantiated claims	30%	Claims without linked evidence or certification
Comparative claim compliance	15%	Methodology and data referenced or not
Future claim compliance	10%	Public plan linked or not
Positive signals	5%	Recognised ecolabels, third-party certifications, LCA references

A score of 80+ is considered compliant. Scores between 60–79 indicate issues that should be addressed before September 2026. Scores below 60 indicate material compliance risk requiring immediate attention.

The score is not a legal opinion — it is a risk signal. Use it to prioritise remediation, not to certify compliance in legal proceedings.

Step 5: The Compliance Report

The scan produces a structured report with three sections:

Issue List

Every flagged claim, with its location on the page (URL, HTML element, surrounding context), the ECGT rule it violates, the confidence score, and a suggested remediation. The remediation suggestions are generated by the same AI layer and follow the guidelines in our sustainability claims do/don't reference.

Risk Summary

A categorised breakdown: critical issues (immediate compliance risk), warnings (likely issues depending on supporting documentation), and informational flags (patterns worth reviewing). This structure helps legal and marketing teams triage work efficiently.

Benchmark

On paid plans, your score is benchmarked against other sites in the same industry vertical and against the verified sites in our database. This contextualises your score — a 72 in a sector where the average is 58 reads differently than a 72 where the average is 85.

Step 6: The Compliance Badge

Sites that score 80+ on a full scan receive a GreenClaims compliance badge. The badge is a dynamic embed — it links to a public verification page showing the scan date, score, and scope. It cannot be copied or forged; the embed script verifies the domain on each page load.

The badge communicates three things to visitors: the site's environmental claims have been independently reviewed, the review used a standardised methodology, and the result is current (badges expire after 90 days unless renewed). For B2B sites, this is increasingly relevant in procurement — sustainability claim verification is appearing in vendor qualification questionnaires.

What the Scanner Does Not Do

Transparency requires being clear about limitations:

The scanner analyses text. It does not verify the underlying facts — if a company claims "30% recycled content" and that figure is false, the scanner will not catch the factual inaccuracy. It checks whether the claim is structured compliantly, not whether the data behind it is accurate.
It does not scan images, video, or audio content. Claims in image alt text are scanned; claims embedded in image graphics are not.
PDF documents linked from pages are not scanned in the current version.
Legal compliance determination requires human legal review. The scanner is a risk identification tool, not a substitute for legal advice.

Running Your First Scan

The free scan covers your homepage and up to five linked pages. It takes under two minutes and requires no account creation. Scan your website free to see your current compliance score.

For agencies and teams managing multiple sites, the API integration guide covers bulk scanning, webhook alerts, and report automation. The paid plans include full-site crawls, historical tracking, white-label reports, and the compliance badge.

Questions about methodology, false positives, or specific ECGT interpretation edge cases can be submitted through the contact form. The model is continuously updated — feedback on edge cases improves accuracy for everyone.