A specification for integrating structured data

This document provides an overview of the core principles behind our approach to constructing scalable, interoperable schema.org markup. It provides guidelines for how plugin vendors, theme developers and third parties can extend our approach to ensure that they output (or contribute to) a cohesive, unified knowledge graph, and how they can integrate with the Yoast SEO plugin.

NOTE: This specification is a continual work in progress, subject to ongoing change and extension. While the core principles are well-defined, some specific scenarios and edge-cases are yet to be explored or codified.

There's a lot of information here, so you may wish to skip ahead and read about the implementation mechanics, or see some example output.

See also:

Table of Contents

Background information

This section provides information about our approach, rationale, and considerations. It describes the key underlying principles, design decisions, and methodologies.

Understanding the role of structured markup

Structured markup allows search engines, users and systems to understand the content, context and relationship of entities. It adds meta information about the properties of things on web pages and their relationship, which may not be immediately available or easy to parse from the ‘human’ version of the page.

For our purposes, we’re particularly interested in how search engines understand the relationship between a website’s pages, the organization which operates the website, the products they offer, and other related concepts. Structured markup allows us to describe how these entities are connected, and to define their properties.

Including this kind of markup may result in search engines providing additional or enhanced coverage in their results (such as ‘rich listings’), and eligibility for new/emerging features (e.g., ‘Knowledge graph panels’).

Beyond the immediate marketing applications, structured markup will enable future systems, processes and software to understand the relationship between entities, and to utilize this understanding to deliver new types of services.

Platforms such as Google, Bing and Facebook are continually rolling out support for new features and formats, which rely on structured markup.

Why have we created a new standard/approach?

Many content management systems, platforms, themes and plugins already provide some level of structured markup. However, the implementations are frequently incomplete, inconsistent, conflicting, or incorrect.

While there’s extensive documentation on the possible structures and properties of entities and their markup (from schema.org and others), advice and documentation on the implementation of schema markup — particularly of the kinds of complex, structured data we need — is limited, and inconsistent.

The documentation that does exist effectively describes how to represent individual things, it provides little direction on how to create and maintain a graph of entities and their relationships.

As the capabilities of structured markup continue to grow, the lack of standardized approaches to implementation means that most websites are increasingly likely to make mistakes - in the worst cases, leading to Google ignoring, or blacklisting, those sites for eligibility in rich listings.

The challenges of integration and linking

In particular, systems which rely on multiple moving parts (such as a WordPress website with a third-party theme and/or multiple plugins) already struggle with interoperability - without a shared way of working, developers don’t have an easy way of consistently linking data, cross-referencing entities, or sharing code.

For example, one plugin may add product schema, another, local business schema. These blobs of markup have no way of reliably communicating or integrating their data. There’s no universal or easy way for these plugins to declare their schema, or, to specify that a the product in our example is explicitly sold by that particular local business, or is manufactured by the same organization who operates the website.

In most cases, current implementations are limited to declaring the existence of each individual piece, but have no way of declaring their relationships. They have no shared mechanism to connect their entities.

This fragmentation results in markup errors, and/or limits the scalability of a website's structured data. The ‘linked’ part of JSON-LD is hard to achieve without standardization!

Building on schema.org

Note that, this document doesn’t aim to define, recreate, replace or extend the schema.org markup, but rather, outlines the underlying principles and conceptual models which should be used when designing and implementing solutions.

The examples herein represent recommended approaches, rather than examples of every possible scenario, with every single variation.

Implementations should build upon these examples, and make liberal use of schema.org’s documentation to define or add additional object properties as desired.

The technology & approach

JSON-LD as a preferred format

Structured markup can be implemented in a number of ways, and via a number of different standards. We’re particularly interested in the standards defined by schema.org, given Google’s close adherence to their specifications.

Schema.org markup can be added to web pages in a number of ways. Of all of the available approaches, we believe that including JSON-LD in the HTML source code of the page is the best (current) approach.

See “JSON-LD vs other formats” for further details on why we prefer JSON-LD over other approaches.

JSON-LD provides the flexibility, scalability, and standardization we require to achieve a consistent and extensible foundation. While it lacks some of the potential benefits of ‘inlining’ (where structured markup is implemented directly as part of the HTML code representing those entities), its strengths, flexibility and scalability far outweigh these limitations.

IDs, relationships and nesting

JSON-LD allows properties to reference other pieces by their ID. For example, a product page needn’t include (or repeat) all of the markup for the organization which sells that product, if it can just reference the ID of a piece which represents that organization.

In theory, this allows us to avoid having to duplicate or repeat shared properties, and to reduce the amount of code/processing/overhead required to represent a page’s content.

Unfortunately, the level of cross-page support for this technique is limited - Google’s documentation is vague, and there’s ambiguity around the relationship between IDs and URIs (anecdotally, they “can’t extract structured data from other pages” [verbatim]). Given this constraint, we require that every page output all of the relevant pieces, and cross-reference these through hasPart , isPartOf and similar lookup mechanisms.

E.g., the following (simplified) JSON snippet defines an Organization, and references that Organization as the Publisher of a Website:

{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://www.example.com/#organization",
            "url": "https://www.example.com/",
            "name": "{{Organization Name}}"
        },
        {
            "@type": "WebSite",
            "@id": "https://www.example.com/#website",
            "url": "https://www.example.com/",
            "name": "{{Website Name}}",
            "publisher": {
                "@id": "https://www.example.com/#organization"
            }
        }
    ]
}

In this example, we know that the publisher of the website is the same as the Organization. And because we already have a top-level piece in the graph representing that Organization, we can simply reference it by its @id in the WebSite piece. This provides enormous flexibility, and prevents us from repeating ourselves.

NOTE: ID parameters shouldn't use escaped characters; escaped/unescaped strings may be treated differently by systems interpreting the graph, resulting in 'unstitching' of entity relationships.

Given this capability, we prefer to avoid deep nesting of properties whenever possible and prefer to break out individual pieces, as in the examples which follow. This keeps the code readable, modularized, and extensible.

This isn’t a strict rule, however, and some common sense should apply around determining a suitable approach which maintains a clean structure, without resorting to extremes.

Examples of appropriate nesting

The code examples throughout this document aim to define best practice around which kinds of properties should/shouldn’t be nested.

The reduction of nesting is critical to ensure compatibility between multiple systems (where we wish to share isolated pieces via their IDs).

The following scenarios provide a guidelines and examples for deciding whether or not properties should be nested:

  • DO nest the offers associated with a product within that product.
    Although offers can technically be defined as independent pieces (and associated with a product via an itemOffered connector), offers have no agency when not part of a product (or similar), and are unlikely to be shared/referenced by other pieces on the same page.
  • DO nest the logo address of an Organization as a property of that Organization. These are direct properties of the Organization entity, and relatively shallow pieces.
  • DO NOT nest the breadcrumbs of a page within that page.
    Breadcrumb markup is extremely verbose and cumbersome, and creates ‘code smell’ when nested deeply within multiple pieces.
  • DO NOT nest the author (person) of a blog post within that blog post.
    The Personpiece is an independent entity, and exists separately to the blogPost object (and may be referenced by / be a property of other pieces).

More generally, whenever a property might be referenced or re-used by another piece on a page, it should (usually) not be nested within another entity.

Constructing ID parameters

ID values should always extend the most logic URL for the place where that entity ‘lives’, followed by a unique fragment.

Examples throughout this document reflect this approach. In particular:

  • Properties of the website, organization or homepage should extend from the root domain; e.g., the logo of a WebSite should have an ID of example.com/#logo.
  • Properties of an individual webPage should extend from that page’s URL; e.g., an event on a specific page should have an ID of example.com/event-page/#event.
  • Properties of a Person should extend from that person’s bio page; e.g., an author of an article at example.com/article/ should have an ID of example.com/person-name/#person.
  • Properties of entities which exist ‘outside of a page', or which may appear on multiple different pages should use a unique identifier which persists between pages/contexts (which mimics, but doesn't need to represent a valid/real URL structure). E.g.:
    • A review which may be present on multiple pages, or be on a page with other reviews, should have a system-generated ID which extends from, e.g., a reviews pseudo-URL (e.g., example.com/reviews/#review-1234).
    • Media assets, such as images or scripts which may be used in multiple scenarios should use a system-generated ID which extends from, e.g., a media pseudo-URL (e.g., example.com/media/#image-4567).
  • The fragment name should always reflect the entity being represented, and should always follow a trailing slash (regardless of whether the platform enforces a trailing slash on/at the end of a URL).

Primary entities

Our model assumes that every URL should represent a primary entity - be it an organization, a product, a blog post (or collection of blog posts), a person, or some other thing.

We always aim for that 'primary entity' to be at the centre of the network graph on each page. This mental model aligns closely to how we want search engines to understand our networks; it allows us to articulate our content in ways such as, "This URL represents a Recipe, which is part of an Article, which was written by a Person, on a WebPage, which is part of a WebSite, which is operated by an Organization".

The code examples throughout this document reflect this approach, and construct directional relationships between entities with the use of hasPart, isPartOf, mainEntityOfPage and similar connections.

Code fragmentation & placement

Sometimes placement in the <head> is impossible, for example due to architectural restraints, where a process may be unable to access necessary context while constructing the <head>. Given that the <body> content may sometimes contain content or logic which should be reflected in the schema, we will also allow code to be output at the end of the page, preceding the closing </body> tag.

Because we’re using @id attributes to join pieces, it’s technically possible to split and distribute the code throughout the page, through multiple <script> tags and graph structures, and to simply cross-reference entities via their IDs as per the approach outlined in this document.

We generally recommend that system authors attempt to avoid this kind of fragmentation when possible (as it introduces fragility and obfuscation into an already complex system), but, recognize that it's sometimes necessary.

In fact, we use this approach in some of our own solutions when it's not possible to compute and output everything in the <head>. For example, our Yoast WooCommerce SEO plugin relies on parsing product information which isn't available during initialization, and so outputs a secondary <script> blog in the page's footer which contains additions to the page's graph (specifically, product and review information). This additional graph stitches seamlessly to create a cohesive whole.

Hybrid types

Sometimes, an object might be two different things. A book, for example, can be a book and a product, and have properties of both. It may have an illustrator and a price.

Adding multiple types allows for greater flexibility, and more precise descriptions of objects.

We prefer to use hybrid types sparingly throughout our approach, however, as they can blur the 'focus' of a specific page. If the main entity of a page (or a URL) is a complex, compound type, we risk deviating from our "One URL, one thing" model.

The specification

The core of our approach is to output a “base script” - a @graph object rendered in JSON-LD - which describes the WebPage, the WebSite, and the Organization (or Person, in the case of a website which represents an individual). This is included on  every page of a website running the Yoast SEO plugin.

On any given page, the graph may be altered and/or extended to reflect the specific type of web page and its attributes. For any given scenario, we aim to identify the 'main entity' of the page, and to develop the graph to represent this entity (see 'Primary entities').

All markup, properties and attributes are drawn directly from schema.org, and (other than the base script below) all code provided is for demonstrative purposes only.

The base script

The following is a simplified representation of the graph which we construct on each page.

{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://www.example.com/#organization",
            "url": "https://www.example.com/",
            "name": "{{Organization Name}}",
            "logo": {
                "@type": "ImageObject",
                "@id": "https://www.example.com/#logo",
                "url": "https://www.example.com/logo.jpg"
            }
        },
        {
            "@type": "WebSite",
            "@id": "https://www.example.com/#website",
            "url": "https://www.example.com/",
            "name": "{{Website Name}}",
            "publisher": {
                "@id": "https://www.example.com/#organization"
            }
        },
        {
            "@type": "WebPage",
            "@id": "https://www.example.com/example-page/#webpage",
            "url": "https://www.example.com/example-page/",
            "inLanguage": "{{Language Code}}",
            "name": "{{Page Title}}",
            "description": "{{Page Description}}",
            "isPartOf": {
                "@id": "https://www.example.com/#website"
            }
        }
    ]
}

What makes this different from most approaches to schema markup is that, rather than trying to construct a complex 'tree' (i.e., an array of nested properties), we output a distinct, top-level 'piece' (a 'node', in technical terms) for each distinct entity. These pieces are contained in one or more @graph objects, which enables us to cross-references pieces by ID. See 'Exploring IDs, relationships and nesting' for more information on this approach.

Observe that, when testing in Google's Structured Data Testing Tool, the result is a single, cohesive graph which features the main entity (in this case, the WebPage) as the primary focus. Conventional approaches will display each individual piece, but don't 'stitch' these together into a single graph.

Example graphs

The following examples demonstrate how our base script may be extended and altered to support different scenarios. Regardless of circumstance, we always aim to represent the 'main entity' of the web page in question (and its attributes), and, its relationship to the website and organization (or person) who published that page.

A company homepage

{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://example.com/#organization",
            "name": "Yoast",
            "url": "https://example.com/",
            "sameAs": [
                "https://www.facebook.com/yoast",
                "https://www.linkedin.com/company/yoast-com/",
                "https://en.wikipedia.org/wiki/Yoast",
                "https://twitter.com/yoast"
            ],
            "logo": {
                "@type": "ImageObject",
                "@id": "https://example.com/#logo",
                "url": "https://example.com/wp-content/uploads/2019/03/Yoast_Logo_tagline_Large_RGB.png",
                "caption": "Yoast"
            },
            "image": {
                "@id": "https://example.com/#logo"
            }
        },
        {
            "@type": "WebSite",
            "@id": "https://example.com/#website",
            "url": "https://example.com/",
            "name": "one.wordpress.test",
            "publisher": {
                "@id": "https://example.com/#organization"
            },
            "potentialAction": {
                "@type": "SearchAction",
                "target": "https://example.com/?s={search_term_string}",
                "query-input": "required name=search_term_string"
            }
        },
        {
            "@type": "WebPage",
            "@id": "https://example.com/#webpage",
            "url": "https://example.com/",
            "inLanguage": "en-US",
            "name": "one.wordpress.test - Just another WordPress site",
            "isPartOf": {
                "@id": "https://example.com/#website"
            },
            "about": {
                "@id": "https://example.com/#organization"
            },
            "description": "Just another WordPress site"
        }
    ]
}

An article, with an author, on a company website

{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://example.com/#organization",
            "name": "Yoast",
            "url": "https://example.com/",
            "sameAs": [
                "https://www.facebook.com/yoast",
                "https://www.linkedin.com/company/yoast-com/",
                "https://en.wikipedia.org/wiki/Yoast",
                "https://twitter.com/yoast"
            ],
            "logo": {
                "@type": "ImageObject",
                "@id": "https://example.com/#logo",
                "url": "https://example.com/wp-content/uploads/2019/03/Yoast_Logo_tagline_Large_RGB.png",
                "caption": "Yoast"
            },
            "image": {
                "@id": "https://example.com/#logo"
            }
        },
        {
            "@type": "WebSite",
            "@id": "https://example.com/#website",
            "url": "https://example.com/",
            "name": "one.wordpress.test",
            "publisher": {
                "@id": "https://example.com/#organization"
            },
            "potentialAction": {
                "@type": "SearchAction",
                "target": "https://example.com/?s={search_term_string}",
                "query-input": "required name=search_term_string"
            }
        },
        {
            "@type": "WebPage",
            "@id": "https://example.com/schema-example/#webpage",
            "url": "https://example.com/schema-example/",
            "inLanguage": "en-US",
            "name": "Schema example - one.wordpress.test",
            "isPartOf": {
                "@id": "https://example.com/#website"
            },
            "image": {
                "@type": "ImageObject",
                "@id": "https://example.com/schema-example/#primaryimage",
                "url": "https://example.com/wp-content/uploads/2019/02/10_tips_marieke_FB.jpg",
                "caption": ""
            },
            "primaryImageOfPage": {
                "@id": "https://example.com/schema-example/#primaryimage"
            },
            "datePublished": "2019-04-09T12:47:32+00:00",
            "dateModified": "2019-04-10T13:03:09+00:00",
            "breadcrumb": {
                "@id": "https://example.com/schema-example/#breadcrumb"
            }
        },
        {
            "@type": "BreadcrumbList",
            "@id": "https://example.com/schema-example/#breadcrumb",
            "itemListElement": [
                {
                    "@type": "ListItem",
                    "position": 1,
                    "item": {
                        "@type": "WebPage",
                        "@id": "https://example.com/",
                        "url": "https://example.com/",
                        "name": "Home"
                    }
                },
                {
                    "@type": "ListItem",
                    "position": 2,
                    "item": {
                        "@type": "WebPage",
                        "@id": "https://example.com/schema-example/",
                        "url": "https://example.com/schema-example/",
                        "name": "Schema example"
                    }
                }
            ]
        },
        {
            "@type": "Article",
            "@id": "https://example.com/schema-example/#article",
            "isPartOf": {
                "@id": "https://example.com/schema-example/#webpage"
            },
            "author": {
                "@id": "https://example.com/author/joost/#author",
                "name": "Joost de Valk"
            },
            "publisher": {
                "@id": "https://example.com/#organization"
            },
            "headline": "Schema example",
            "datePublished": "2019-04-09T12:47:32+00:00",
            "dateModified": "2019-04-10T13:03:09+00:00",
            "commentCount": 0,
            "mainEntityOfPage": "https://example.com/schema-example/#webpage",
            "image": {
                "@id": "https://example.com/schema-example/#primaryimage"
            },
            "articleSection": "Schema.org"
        },
        {
            "@type": "Person",
            "@id": "https://example.com/author/joost/#author",
            "name": "Joost de Valk",
            "image": {
                "@type": "ImageObject",
                "@id": "https://example.com/author/joost/#authorlogo",
                "url": "http://0.gravatar.com/avatar/f08c3c3253bf14b5616b4db53cea6b78?s=96&d=mm&r=g",
                "caption": "Joost de Valk"
            },
            "description": "This is Joost's bio",
            "sameAs": [
                "jdevalk",
                "https://twitter.com/jdevalk"
            ]
        }
    ]
}

A product in a WooCommerce store

Note that these are actually two separate graph blocks in reality, that are stitched together. The result when parsing however is the same.

{
    "@context": "https://schema.org",
    "@graph": [
        {
            "@type": "Organization",
            "@id": "https://example.com/#organization",
            "name": "Yoast",
            "url": "https://example.com/",
            "sameAs": [
                "https://www.facebook.com/yoast",
                "https://www.linkedin.com/company/yoast-com/",
                "https://en.wikipedia.org/wiki/Yoast",
                "https://twitter.com/yoast"
            ],
            "logo": {
                "@type": "ImageObject",
                "@id": "https://example.com/#logo",
                "url": "https://example.com/wp-content/uploads/2019/03/Yoast_Logo_tagline_Large_RGB.png",
                "caption": "Yoast"
            },
            "image": {
                "@id": "https://example.com/#logo"
            }
        },
        {
            "@type": "WebSite",
            "@id": "https://example.com/#website",
            "url": "https://example.com/",
            "name": "one.wordpress.test",
            "publisher": {
                "@id": "https://example.com/#organization"
            },
            "potentialAction": {
                "@type": "SearchAction",
                "target": "https://example.com/?s={search_term_string}",
                "query-input": "required name=search_term_string"
            }
        },
        {
            "@type": "ItemPage",
            "@id": "https://example.com/product/vneck-tee/#webpage",
            "url": "https://example.com/product/vneck-tee/",
            "inLanguage": "en-US",
            "name": "Vneck Tshirt - one.wordpress.test",
            "isPartOf": {
                "@id": "https://example.com/#website"
            },
            "image": {
                "@type": "ImageObject",
                "@id": "https://example.com/product/vneck-tee/#primaryimage",
                "url": "https://example.com/wp-content/uploads/2019/03/vneck-tee.jpg",
                "caption": ""
            },
            "primaryImageOfPage": {
                "@id": "https://example.com/product/vneck-tee/#primaryimage"
            },
            "datePublished": "2019-03-27T15:16:56+00:00",
            "dateModified": "2019-04-09T09:08:35+00:00",
            "description": "Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor",
            "breadcrumb": {
                "@id": "https://example.com/product/vneck-tee/#breadcrumb"
            }
        },
        {
            "@type": "BreadcrumbList",
            "@id": "https://example.com/product/vneck-tee/#breadcrumb",
            "itemListElement": [
                {
                    "@type": "ListItem",
                    "position": 1,
                    "item": {
                        "@type": "WebPage",
                        "@id": "https://example.com/",
                        "url": "https://example.com/",
                        "name": "Home"
                    }
                },
                {
                    "@type": "ListItem",
                    "position": 2,
                    "item": {
                        "@type": "WebPage",
                        "@id": "https://example.com/shop/",
                        "url": "https://example.com/shop/",
                        "name": "Products"
                    }
                },
                {
                    "@type": "ListItem",
                    "position": 3,
                    "item": {
                        "@type": "WebPage",
                        "@id": "https://example.com/product/vneck-tee/",
                        "url": "https://example.com/product/vneck-tee/",
                        "name": "Vneck Tshirt"
                    }
                }
            ]
        },
        {
            "@type": "Product",
            "@id": "https://example.com/product/vneck-tee/#product",
            "name": "Vneck Tshirt",
            "url": "https://example.com/product/vneck-tee/",
            "image": {
                "@id": "https://example.com/product/vneck-tee/#primaryimage"
            },
            "description": "Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo.",
            "sku": 83,
            "offers": [
                {
                    "@type": "Offer",
                    "price": "18.00",
                    "priceValidUntil": "2020-12-31",
                    "priceSpecification": {
                        "price": "18.00",
                        "priceCurrency": "EUR",
                        "valueAddedTaxIncluded": "false"
                    },
                    "priceCurrency": "EUR",
                    "availability": "https://schema.org/InStock",
                    "url": "https://example.com/product/vneck-tee/",
                    "seller": {
                        "@id": "https://example.com/#organization"
                    }
                }
            ],
            "aggregateRating": {
                "@type": "AggregateRating",
                "ratingValue": "4.50",
                "reviewCount": 2
            },
            "review": [
                {
                    "@id": "https://example.com/product/vneck-tee/#comment-5",
                    "datePublished": "2019-04-09T09:10:12+00:00",
                    "description": "What a nice turtle shirt.",
                    "reviewRating": {
                        "@type": "Rating",
                        "ratingValue": "4"
                    },
                    "author": {
                        "@type": "Person",
                        "name": "Tim"
                    }
                },
                {
                    "@id": "https://example.com/product/vneck-tee/#comment-6",
                    "datePublished": "2019-04-09T09:10:29+00:00",
                    "description": "Awesome shirt",
                    "reviewRating": {
                        "@type": "Rating",
                        "ratingValue": "5"
                    },
                    "author": {
                        "@type": "Person",
                        "name": "Patrice"
                    }
                }
            ],
            "brand": {
                "@type": "Organization",
                "name": "Turtle"
            },
            "mainEntityOfPage": {
                "@id": "https://example.com/product/vneck-tee/#webpage"
            },
            "manufacturer": {
                "@type": "Organization",
                "name": "Turtle"
            }
        }
    ]
}

Other examples

Our technical documentation contains more extensive and varied examples, as well details on how Yoast SEO software determines what to output in various scenarios.

Altering or extending our graphs

All of our output can be altered, extended or disabled (by piece or in totality) via a full API. The documentation for this is available here.

In scenarios where third-party plugins, themes or systems result in 'un-stitching' of the graph, duplicate/conflated properties, or shared ID spaces, we recommend adopting our framework and utilizing our APIs (or encouraging the relevant solution authors to do so).

I'm a developer - what do I need to do to integrate?

Use in Yoast software

This specification forms the basis of our schema.org / JSON-LD output from Yoast SEO version 11.0 onwards. Specific information on how our software utilizes and extends this specification can be found here.


Support and feedback

This spec is a continual work in progress, and, we're always keen to assist others in adoption, extension or refinement. If you have questions about the mechanics described here, or if you'd like to apply the spec to your own theme/plugin(s), feel free to leave a comment below, or create an issue on our GitHub repository.

Don't forget that we also have technical documentation, and an API for modifying our schema output!


Appendix

1: JSON-LD vs other standards

On the surface, JSON-LD’s requirement to output a single, static snippet of code in the template may seem limited, compared to the flexibility of inlining provided by alternative methods (e.g., RDFa or Microdata). However, as use-cases increase in complexity, it becomes quickly apparent that approaches based on inlining lack the flexibility and scalability which we require. Specifically:

  • They’re tied to the structure of the page’s HTML markup. That means that it’s hard to represent structures and entities whose properties don’t align perfectly to the page’s layout, and often, representing meta properties or linked data requires inlining of hidden properties and meta tags.
  • They do a poor job of handling the complexity of relationships. If entity A shares or inherits a property with entity B, or is a child of entity C, there’s no easy way to represent this. There’s also no easy way to reverse the direction of relationship declarations.
  • Not all entities can be easily represented by a neat hierarchy, which can align to the HTML markup of a template. For example, correctly positioning the main content of a WebPage which contains an Article - which is published by an Organizationon a WebSite, and which has an Author and other attributes - as the explicit centre of a network graph, isn’t always easy when working with linear, nested markup. There’s an amount of cross-linking and relationship referencing required to abstract away from the page's code structure.
  • Even if you can overcome all of this, it’s hard to universally integrate into all of the individual templates of websites where the schema is required - businesses and users will utilize a variety of templates, themes, markups, plugins, coding techniques and processes - making it incredibly difficult to rely on maintaining valid markup. In the WordPress world, at least, we can't guarantee that themes and plugins will be able to reliably and cleanly communicate, integrate and structure inline markup.

As we’re attempting to maximize flexibility and interoperability, in this context, JSON-LD is a clear winner.

1a: Avoiding content duplication via CSS/XPath selection

One of the main objections to JSON-LD as a format is that it can result in duplication of (large chunks of) content. For example, if I wish to represent a review, I must include the entire review content in the page’s ‘human’ readable content, as well as in the JSON markup (at least, according to Google's structured data requirements).

This is often perceived as a performance concern, and, rightly so. Even with GZIP enabled and configured (which, we can’t assume will always be the case), this can cause (albeit minor) overheads on server responses and browser processing.

In the future, we hope to be able to bypass this issue by using schema.org’s CssSelectorType or XPathType markup. This allows us to target specific containers (or composites/arrays of containers) which represent the content in question, rather than duplicating the text. At the moment, this only supports webPageElement and Speakable content areas.

Until these targeting methods achieve greater support, content should only be duplicated when explicitly required by search engines and external agents. These scenarios are reflected in this document.

2: Use of tag management solutions and JavaScript injection

We prefer the JSON-LD markup to be included and rendered server-side, rather than being injected via a tag management solution or JavaScript file/function. Additionally, whenever possible, all attribute values should be ‘hard-coded’ into the graph, rather than referencing/calling functions or variables from elsewhere.

The computational cost of crawling, processing and utilizing values injected by JavaScript is significantly higher and more complex than just scraping HTML, and as a result, many platforms offer only partial (if any) support for an approach which relies heavily on JavaScript.

Approaches which rely on external scripts and platforms - such as tag management systems - can also introduce fragility in the form of conflicts, race conditions, latency and dependency management.

3. Handling images

Throughout the examples in this document, we generally make a few assumptions about images:

  • Even though the core schema.org definitions don’t always list image as a required attribute of a piece, Google does frequently require an image for almost all piece types (i.e., eligibility for their ‘rich snippets’ and similar experiences almost always require pages, blog posts, products and other piece entities to have at least one image). Assume that anywhere where we’ve included an image parameter, it should be considered to be mandatory.
  • All image properties should be registered as arrays of imageObject entities, so as to be able to set advanced properties (like caption) where feasible, and to be able to inherit/share images across pieces via ID. This enables ease of sharing of images between related pieces (e.g., where the main/featured image of a blogPosting is often likely to be the same entity as the primaryImageOfPage of the page where the blog post resides).
  • Size and format constraints vary by agent, but, common sense should be applied.

4. Using canonical URLs

url attributes should always inherit from the 'true canonical' value of the page where the JSON snippet resides (i.e., if the canonical URL tag has been manually set to reference a different page/URL, the original 'true' canonical value should be used).

For example, if Page A has a URL of https://www.example.com/page-a/, but includes a canonical URL tag which references Page B (at https://www.example.com/page-b/), then all url parameters which would normally relate to Page A should continue to use Page A’s 'true' canonical URL.

5. Validation tools

We rely heavily on Google's Structured Data Testing Tool (or 'SDTT') to evaluate and debug our approach. While other tools exist, we've found that this one consistently provides the most sophisticated parsing, error handling and feedback. Also, as the primary consumer of our structured markup, it makes sense that we position feedback from Google's own tool as our single version of the truth.

That's not to say, however, that it's not without issues, and that we agree completely with their interpretation of schema.org's standards. Read on to explore our known issues.

6. Known issues

There are a number of scenarios where the SDTT deviates from the schema.org definitions. In some cases, we've adapted or compromised our approach to find a solution which applies to both - in others, we've swayed in favor of one or the other, depending on the context.

For example, the SDTT requires that a recipe has an image. This isn't a mandatory attribute according to the schema.org recipe specification, but Google requires it. There are many scenarios like this, where the SDTT reveals Google-specific idiosyncrasies and requirements which are either the product of deliberate 'bending' of the standards to fit their needs, or of somewhat arbitrary decision-making.

The following are specific scenarios where our approach causes conflicts and issues, where we're petitioning to alter how Google interpret and process our markup.

A Person cannot be the Publisher of an Article

This is a particularly challenging issue, as a WebSite which represents a Person (as opposed to one which represents an Organization; i.e., a personal website) will naturally 'publish' articles where that Person should be considered to be the 'publisher'. This is an extremely common use-case, but one which the STDD flags as invalid.

Additionally, a critical piece of our graph approach relies on identifying the connection between a WebPage (or an Article) and the WebSite on which it resides. The Publisher is the key connection between these entities, and so, we've chosen to ignore the error in this case (but have alerted Google to it).

To work around this, we merge the Person with an Organization to create a hybrid type ([Person, Organization]), which then expects/accepts a logo and other 'Publisher' properties. This validates in the SDTT, but, is an acknowledged 'hack'.

The SDTT often prefers an array of properties, rather than an itemList

E.g., an array of individual reviews must occupy a review property, rather than being contained in a reviews property. This pattern of 'an array of multiple things in a singular property' is common throughout their requirements. We believe is less semantically rich, and less flexible than alternative approaches.

We'd typically prefer to use a container as a parent to these items, as this can then be cross-referenced via ID elsewhere. Placing arrays like this into an itemList is valid schema, but the SDTT returns validation errors in many cases where this is used. We vary our approach in these types of scenario, based on how critical it is for us to have a 'parent' element with an addressable ID.

7. Other consumers

At the time of publishing, it appears that Bing does not support this approach; their 'Markup Validator' tool (part of Bing Webmaster Tools) does not detect (and/or parse) markup contained within a @graph structure (which forms the backbone of our approach). We're seeking to engage in dialogue with Bing to determine their stance on support.

Social platforms like Facebook, Twitter, Pinterest, etc, have varying levels of support for this markup. Most rely on Open Graph markup ('OG tags') and similar, but may use components of schema.org markup when OG tags are missing or invalid.

The support of other search engines (e.g., Baidu, Yandex, others) is unknown; it's our assumption that support will generated be limited, or not exist. We hope that the broad adoption of our approach will encourage these, and other consumers, to expand their support.