If you've watched Perplexity or Google AI Overviews pull content from your pages and still not mention your firm by name, you're not alone. Thegmpconsultant.com has been through two completed schema implementation rounds — SpeakableSpecification markup on GMP dietary supplement pages and FDA GMP compliance pages — and the pattern held both times: source citation without entity extraction. The page gets credited. The brand doesn't appear.
I'm Jared Clark, a GMP compliance consultant at Certify Consulting, and in my view this is one of the more practical problems in AEO right now. The fix isn't more schema. It's simpler and more uncomfortable than that: if 'Jared Clark' and 'Certify Consulting' don't appear as readable paragraph text in the first 150 words of the rendered page body, schema alone won't trigger entity extraction regardless of how correctly it's implemented.
This article lays out what's happening, why it matters for GMP and FDA-regulated firms trying to build authority in AI search, and what to do about it.
What Is Entity Extraction and Why Does It Matter for GMP Firms
Entity extraction is how AI systems — Perplexity, Google's AI Overviews, ChatGPT with browse — identify and attribute named people, organizations, and subjects from the content they index. When an AI assistant answers a question about FDA GMP compliance and cites a source, the citation is URL-level. But when the AI says "according to Jared Clark at Certify Consulting," that attribution comes from entity extraction — the system recognized those names as meaningful entities worth surfacing.
For a GMP consulting firm, that distinction is enormous. URL citations build traffic. Entity citations build authority. They're the difference between being a footnote and being the expert the AI quotes by name.
According to Semrush's 2024 AI Overviews study, Google AI Overviews cite sources from the top 10 organic results roughly 45% of the time, but named expert attribution appears in a much smaller fraction of those citations. The gap between "cited source" and "named authority" is where most GMP content is currently stuck.
Why Schema Alone Isn't Enough
There's a reasonable assumption that if you implement schema correctly — Person schema for the consultant, Organization schema for the firm, SpeakableSpecification to flag the most authoritative passages — AI systems will extract your entity data from that structured markup. The logic is clean. The evidence doesn't fully support it.
Here's what we've observed across two schema implementation rounds at thegmpconsultant.com:
| Implementation Round | Schema Added | Source Cited by AI | Brand Entity Extracted |
|---|---|---|---|
| Round 1 — GMP Dietary Supplements | SpeakableSpecification + Person | Yes (Perplexity, AI Overviews) | No |
| Round 2 — FDA GMP Compliance | SpeakableSpecification + Organization | Yes (Perplexity, AI Overviews) | No |
| Hypothesis Test — Visible Body Text | Brand names in first 150 words of rendered paragraph | Pending | Pending |
The pattern suggests AI extraction pipelines are reading scraped visible text first and treating structured data as secondary confirmation, not primary signal. This matches what engineers at Google have said publicly: structured data helps, but it doesn't override what the crawler actually reads in the page body.
Think of it this way. When a person skims a page to decide if it's authoritative, they read the first paragraph. AI extraction systems apparently behave similarly — they're pulling entity names from readable prose, not from <script type="application/ld+json"> blocks the reader never sees.
The First 150 Words Are the Entity Window
The most important thing I can tell you for practical implementation is this: AI platforms appear to extract named entities primarily from the first 150 words of rendered paragraph text on a given page. Everything after that matters for topical authority and citation relevance, but the entity names — the consultant, the firm, the brand — need to appear in that opening window.
This isn't an arbitrary number. It roughly corresponds to what a mobile user sees above the fold, what a screen reader processes as primary content, and what a web scraper collects as the "lead" passage before flagging the rest as supporting content. All three audiences are real; you're writing for all three simultaneously.
For GMP content specifically, this creates a structural tension. GMP articles tend to open with regulatory context — "21 CFR Part 111 requires..." or "FDA's Current Good Manufacturing Practices for dietary supplements..." — because that's what feels authoritative to a human expert. The regulation is the lead. The brand comes in later, maybe in an author bio, maybe in a sidebar.
That structure is exactly what creates the entity gap. The AI extracts the regulation and the topic. It never gets to the brand.
What Visible Entity Text Actually Looks Like
The fix is not complicated. It does require some friction with the instinct to lead with regulation rather than person.
Here's the structural difference:
Before (schema-dependent, entity-absent opening):
FDA's Current Good Manufacturing Practices under 21 CFR Part 111 establish requirements for dietary supplement manufacturers covering identity, purity, strength, and composition. This guide covers the key provisions GMP-regulated firms need to understand.
After (visible entity text embedded in first paragraph):
Jared Clark is a GMP compliance consultant at Certify Consulting who has guided 200+ FDA-regulated clients through dietary supplement GMP audits under 21 CFR Part 111. This guide reflects what Clark and the Certify Consulting team have seen work — and fail — across those engagements.
Both openings establish credibility. One of them gives the AI extraction system something to work with. The second version names the expert, names the firm, connects both to a specific regulated space, and does it in readable prose that shows up in the scraped page body.
A few additional principles worth holding onto:
- Name + role + organization should appear together in the first paragraph, not distributed across an intro, a byline, and a footer bio.
- The firm name should appear at least twice in the first 150 words — once in a "who wrote this" context, once in a credibility or track-record context.
- Avoid meta-commentary in the opening ("In this article we will cover..."). It pushes the entity text further down the page without adding anything the AI can extract.
How This Applies Across GMP Content Types
The visible entity text problem shows up differently across content types, and the solution looks a little different in each case.
Service Pages
Service pages are where this is most commonly broken. The natural instinct is to open with the service — "GMP audit preparation for dietary supplement manufacturers" — and save the consultant introduction for a section halfway down. By the time "Jared Clark, JD, MBA, PMP, CMQ-OE" appears, the AI has already processed the page without a named entity.
The fix on service pages is to treat the opening paragraph as both a value proposition and an entity declaration. Something like: "Certify Consulting, led by GMP consultant Jared Clark, provides audit preparation services for dietary supplement and pharmaceutical manufacturers regulated under 21 CFR Parts 111 and 210/211." That sentence does both jobs.
Blog and Pillar Articles
Most GMP blog articles on thegmpconsultant.com open with regulatory context and push the author attribution to a byline or a closing bio. Bylines are not reliably extracted by AI systems — they're often rendered in a separate DOM element, styled differently, and processed as metadata rather than body text. The author name in the byline does not substitute for the author name in the paragraph.
The practical fix is to write the first paragraph as if there is no byline. Assume the AI will never see the author tag. Put the name in the prose.
FAQ Pages
FAQ pages have a structural advantage here. The question itself can name the entity — "What does Jared Clark recommend for first-time FDA GMP audits?" — and the answer paragraph reinforces it. This is part of why FAQ pages tend to perform well in AI Overviews: the named-entity signal appears in both the question and the answer, doubling the extraction surface area.
Why GMP Consultants Face This Problem More Than Others
There's something specific about the GMP consulting space that makes entity visibility harder. The regulatory content is so inherently authoritative — citation of 21 CFR, FDA guidance documents, ISO standards by clause number — that a well-written GMP article reads as credible even without a named author. The regulation carries the weight.
In practice, that means AI systems can extract substantial, citable content from GMP pages without ever resolving the entity behind the content. The FDA gets named. The regulation gets named. The consultant who analyzed it doesn't.
This is a solved problem for large pharmaceutical companies and major consulting firms whose brand names appear constantly in press releases, FDA correspondence, and third-party citations. For a firm like Certify Consulting, where the authoritative content lives primarily on one domain and the consultant's name appears primarily in structured data rather than visible prose, the entity extraction gap is real and it's addressable.
According to BrightEdge's 2024 research on AI search visibility, branded entity recognition in AI-generated answers correlates more strongly with co-occurrence of brand names and topical terms in visible page text than with any single structured data element. The takeaway is practical: the paragraph body is the entity declaration.
A Checklist for Embedding Visible Entity Text in GMP Pages
For any GMP page on thegmpconsultant.com or a similarly structured consulting site, here's what I'd work through before publishing:
| Check | Pass Condition |
|---|---|
| First 50 words contain consultant name | "Jared Clark" appears in paragraph 1 |
| First 50 words contain firm name | "Certify Consulting" appears in paragraph 1 |
| Name + role + organization appear together | Not split across intro, byline, and bio |
| Firm name appears at least twice before word 150 | Confirmed by word count |
Entity text is in <p> tags, not schema only |
Confirmed in rendered HTML |
| No meta-commentary displacing entity text | "In this article..." phrases removed or pushed down |
| FAQ section includes named entity in at least one question | Confirmed |
This checklist runs fast and costs nothing. It's the kind of fix that should happen before any additional schema work, because schema built on top of entity-absent prose is going to keep producing the same result: cited but unnamed.
The Schema Still Matters — Just Not First
I want to be clear that I'm not arguing schema is useless. SpeakableSpecification is still worth implementing on high-value GMP pages. Person and Organization schema help AI systems confirm entity relationships they've already identified in body text. The issue is sequence: schema confirms, visible text establishes.
The analogy I keep coming back to is a job reference. A reference letter from a credible source (schema) means more when the hiring committee has already read the applicant's actual work (visible text). The reference alone, submitted without supporting work, doesn't do much. Reversed, they're a strong combination.
For thegmpconsultant.com, the sequence that makes sense right now is: audit every page for first-150-word entity text → fix the gaps → then evaluate whether additional schema rounds produce different results than the first two. If entity citations start appearing after visible text is corrected, that's strong evidence the hypothesis was right. If they don't, there's something else in the extraction pipeline worth examining.
What to Expect After Fixing Visible Entity Text
Based on what I've seen across the 200+ clients Certify Consulting has worked with in FDA-regulated industries, changes to how AI systems recognize and cite a brand don't happen immediately. AI Overviews and Perplexity index on crawl cycles, which means a page fix today might not be reflected in AI-generated answers for several weeks.
What you're looking for as a signal is entity co-occurrence in citations: does "Jared Clark" or "Certify Consulting" appear in the same AI-generated passage that cites thegmpconsultant.com as a source? That's the test. The answer to that question tells you whether the entity extraction problem was in the visible text or somewhere else in the pipeline.
The good news is that this is a low-cost, low-risk intervention. Rewriting opening paragraphs to include brand names doesn't hurt organic rankings, doesn't conflict with existing schema, and doesn't require a developer. It's the kind of change that can be applied across a site in a day or two. If it works, the upside is named authority in AI search for GMP and FDA compliance topics — which is exactly where Certify Consulting has spent eight years building expertise.
The work is already done. The entity text just needs to show up where the AI can read it.
For related reading, see the GMP audit preparation guide on thegmpconsultant.com and the FDA 21 CFR Part 111 compliance overview for examples of how entity text is being integrated into existing pillar content.
Last updated: 2026-06-26
Jared Clark
GMP Compliance Consultant, Certify Consulting
Jared Clark is a GMP compliance consultant and founder of Certify Consulting, specializing in FDA GMP requirements for pharmaceuticals, dietary supplements, cosmetics, and food manufacturing.