Schema markup vocabulary AI engines look for

By Rankply · 21 May 2026 · 7 min read

## Why structured data matters more than ever

AI engines like ChatGPT and Perplexity prefer pages with explicit machine-readable structure. JSON-LD schema markup tells the engine *what* a page is about without forcing it to guess from prose.

Two pages with identical visible content can get wildly different AI citation rates if one has rich schema and the other doesn't. The schema page gets quoted verbatim; the unstructured page gets summarised loosely. We've measured the gap at 2-5x more citations for schema-rich pages versus identical pages without schema, controlled for word count and inbound links.

## The five schema types that move the needle

**FAQPage.** Your most-asked questions, tagged so AI can lift them verbatim into "how does X" / "why does Y" queries. Single highest-impact schema type for most B2B sites. Aim for 6-12 Q&A pairs per page; pages with 3 or fewer get treated as low-density and skipped.

**Product / Service.** Pricing, ratings, availability, currency. AI engines tabulate these when comparing alternatives. Without Product schema, your pricing page is invisible to comparison queries. Include `priceCurrency`, `priceValidUntil`, and `availability` fields explicitly — missing fields are silently dropped from comparison answers.

**Organization / LocalBusiness.** Who you are, what you do, where you operate, when you were founded. Foundational for direct-recall queries about your brand. Include `sameAs` links to your LinkedIn, Crunchbase, Wikipedia (if you have one) — these resolve entity-disambiguation problems that otherwise suppress your visibility.

**HowTo.** Step-by-step instructions that AI loves to quote. Particularly powerful for SaaS / DIY / process-driven categories. Each step needs a `name`, `text`, and ideally an `image` — strip down to under 8 steps for best lift.

**Review.** Third-party reviews aggregated. Trust signal at the citation layer — an aggregate rating in Review schema is one of the few signals that compresses cleanly into AI answers. Make sure the underlying reviews are real and verifiable; engines are increasingly cross-checking aggregate ratings against the listed source domains, and synthetic review fraud now causes silent visibility penalties.

## What good schema looks like

A FAQPage block isn't just "we put some Q&A on the page". It's:

- Real `<script type="application/ld+json">` tags in the page head - Specific question phrasing that matches buyer queries verbatim - Answers that lead with the answer (not setup) in the first 50 words - Cross-referenced with the page's visible H2 questions so prose and schema reinforce each other - A single canonical entity reference per page (don't mix Product and Service schema on one page without explicit `mainEntity` declarations)

Validate every implementation against Google's Rich Results test AND Schema.org's validator (they catch different errors). Pages with broken schema lose the citation signal entirely — a malformed FAQPage block is worse than no schema at all because it signals carelessness.

## Common mistakes that quietly kill the signal

**Hidden content in schema that isn't on the page.** Engines penalise this hard. If the FAQ answer isn't visible to a human reader, don't put it in the schema.

**Stale `dateModified` fields.** A `dateModified` of 2021 on a 2026 page makes the engine treat it as outdated. Update the field on every meaningful edit.

**Multiple conflicting Organization blocks.** Common when a site has both a global layout schema and per-page overrides. Pick one source of truth.

**Schema on pages that 404 or redirect.** The schema gets crawled, the page doesn't resolve, and the entity gets flagged. Audit quarterly.

## How Rankply uses this

Our platform-scan component checks every page on your site for missing schema, incorrectly implemented schema, and high-leverage opportunities. The recommendations panel surfaces these as "Fix this" cards with one-click purchase if you want our team to implement the JSON-LD blocks for you. Each scan also flags pages where prose and schema disagree (e.g. your visible price is £49 but your Product schema still says £39) — silent drift that nobody catches manually.

## The compounding effect

Schema is one of the rare GEO levers where the work is finite (implement once per page) but the payoff is continuous. Once your top 10 pages have proper schema, you stop losing the structural-signal layer of AI citation — month over month, the citation share goes up. Unlike content updates or PR placements, schema doesn't decay; a well-structured Product block from 2026 will still be lifting citations in 2030 as long as the prices and availability fields are kept current.

## A two-week project that pays for years

For most B2B SaaS, the schema upgrade is the single highest-ROI two-week project we can run. The work breaks down to:

- Week 1: audit top 15 pages, identify schema gaps, draft JSON-LD blocks - Week 2: implement, validate, monitor for crawl errors

Citation lift typically becomes measurable in your tracked-prompts dashboard within 30-45 days. Pair the schema upgrade with the heading-hierarchy rewrite covered in our lesson on H1/H2/H3 structure — together they reshape every signal AI engines use to decide whether your page is worth quoting.

## Beyond the five: secondary schema types worth using

After you've shipped the core five, a handful of secondary types repay the effort:

**BreadcrumbList.** Helps engines understand site structure and entity hierarchy. Small lift on its own, meaningful lift when paired with deep site architecture (lots of category and subcategory pages).

**VideoObject.** If you publish demos, tutorials, or talks, VideoObject schema with `transcript` and `duration` fields makes the video discoverable in AI answers about its topic. Most teams skip this because the video lives on YouTube; embedding with proper schema captures the citation on your domain too.

**Article / BlogPosting.** For long-form content. Include `author` (with linked Person schema), `dateModified`, and `headline` fields explicitly. The `author` field with a real Person entity attached is one of the strongest authorship signals available and matters for `E-E-A-T`-style weighting that AI engines now use.

**Course / LearningResource.** For categories with educational content (which Rankply happens to be — note the Learn GEO section you're reading). Marks content as instructional, which boosts citation rates for "how to" queries.

**Event.** For webinars, launches, conferences. Underused, but the date-bounded nature makes it perfect for time-sensitive AI queries about upcoming or recent industry events.

Skip: types that don't match your page's actual purpose. Bloating a service page with Recipe schema doesn't help; it actively hurts trust.

## A schema-quality checklist for content reviewers

Use this when commissioning or reviewing new pages:

1. Does every page have at least one schema type declared? 2. Is the schema visible to crawlers (not injected client-side after JS)? 3. Do all required fields for that schema type validate against Schema.org? 4. Does the visible content on the page match the schema (no hidden-content tricks)? 5. Is `dateModified` current within the last 12 months? 6. Are entity references (`sameAs`, `author`, `publisher`) cross-linked to real, indexable profiles? 7. Does the schema reinforce the page's H1/H2 structure rather than contradict it?

Pages that score 7/7 outperform pages that score 4/7 by a wide margin. The platform-scan component checks all seven on every page; failures land in your recommendations panel ranked by impact.