The future of structured data in HTML: RDFa, Microdata and microformats

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
scor's picture

The recent launch of schema.org has caused a lot of heated debates within the Web community about structured data syntaxes. Over the past decade, three major syntaxes for expressing structured data in HTML have been developped in parallel: microformats, RDFa and microdata. They've all formed their own communities, and for the older ones (microformats and RDFa) enjoyed some steady adoption on the Web. The most recent syntax, microdata, started off as a stripped down version of RDFa as part of the development of HTML5.

So far, search engines have usually been pretty tolerant and recognized all these syntaxes for SEO purposes. However, having multiple syntaxes for structured data is not a sustainable solution for the long term. It leads to confusion among web developers which have to evaluate these syntaxes and choose one between the three. Implementers willing to make use of this structured data have to support several different algorithms for extracting the data. What got some members of the Web community worried with the schema.org announcement is the sudden push by the major search engines companies for one of these syntaxes (microdata) without any community consensus, dismissing the existing deployments of the other syntaxes (RDFa and microformats) and ignoring the needs of the Web community at large.

This morning, the W3C Technical Architecture Group (TAG) sent a note to the HTML working group and the RDFa working group requesting for a new group to be created in order to agree on a unified language for structured data on the Web. Quoting from the TAG issue on HTML+RDFa and Microdata last call drafts announcement:

Specifically, our opinion is that the W3C should not publish two specifications that meet such similar requirements in incompatible ways. We think doing so would cause confusion for users and implementers, promote lock-in, and fragment the web. We request that the W3C Director set up a task force to find agreement on a way forward.

What does this mean for Drupal?

Drupal 7 was released last January with native support for RDFa 1.0 (a W3C Recommendation) and is one of the many platforms who made the choice to adopt RDFa as their main structured data format a couple of years ago, along with Facebook, Best Buy, the White House, the BBC, Newsweek. This choice was motivated by the fact that RDFa was already a W3C Recommendation and was being supported by Yahoo! and Google. In the schema.org FAQ, Google and Bing clarify that they will continue to support RDFa, which is reassuring for the Drupal community given that Drupal 7 will be around for many years to come. However, the question of choosing a structured data syntax applies to the upcoming Drupal 8 which is currently under active development. Sticking with RDFa 1.1, or switching to microdata? It is premature to speculate what will come out of this announcement, whether we will see a merge of syntaxes, or a whole new unified syntax, but hopefully it will mean that there will be a clear way forward for Drupal, and the Web in general.

Comments

Task Force launched

scor's picture

The Task Force has been launched. Discussions are public and anyone is welcome to join. See all the details including how to subscribe to the mailing list at http://www.w3.org/wiki/Html-data-tf

Semantic Web

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: