A roadmap for RDFa in Drupal 7

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
scor's picture

RDFa into Drupal 7 core is on its way! It is now time to start implementing it. We need to agree on a roadmap to help ensure we all head in the same direction before rushing to our keyboards and posting patches on drupal.org. I met up with John and Alexandre from the SIOC project and we came up with the following approach.

What is RDFa?

RDFa provides a set of attributes to annotate XHTML documents with machine-readable semantics. It is then possible to extract RDF data from these pages. Read more on RDFa and check its primer for examples.

RDF namespace registry

In order to be able to use Compact URIs (CURIEs) when referring to RDF vocabulary terms, prefixes should be defined in the <html> tag using the XML Namespace mechanism. A simple registry can collect these namespaces defined by modules and serialize them in the header of generated XHTML output. This allows greater flexibility compared to hardcoding them in page.tpl.php: contributed module can define extra namespaces which might not come with Drupal core and it's also less work for theme maintainers.

Template files

The following changes concern the page.tpl.php file.

  1. The doctype would need to be changed to
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
  2. The <html> tag would contain a list of namespaces used in the document serialized from the namespace registry such as
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr"
        xmlns:foaf="http://xmlns.com/foaf/0.1/"
        xmlns:dc="http://purl.org/dc/elements/1.1/">
  3. The template files should be updated to include a link to a GRDDL transformation:
    <head profile="http://ns.inria.fr/grddl/rdfa/">
  4. At present the tag <h1 id="site-name"> is hardcoded in the default page.tpl.php file. While it would be possible to hardcode property="dc:title" directly in the template , a better approach would be to provide the template with a variable similar to the existing $title which would contains the correct RDFa property. Other variables might need to be created in the same fashion.
  5. The tag <div id="content"> is most likely the best place to insert the about attribute defining the URI of the current entity (product, person, book...) represented by the page. A default value can be #self or #it:
    <div id="content" about="http://example.com/node/123#self">

    To ensure the URI is unique, it is important to choose a fragment which does not exist in the DOM tree of the page.

RDFa and the theme layer

RDFa requires XHTML which is already generated by Drupal.

Each module should be able to tag its data with RDFa. Since RDFa operates on the XHTML level, modules can specify their RDFa attributes via the theme functions along with the XHTML code.

Using the helper function l() we can write a link to Bob's homepage:

<?php
l
('Bob','http://example.com/bob', array('attributes' => array('property' => 'foaf:name', 'rel' => 'foaf:homepage')));
?>

which will output (ignore <?php)
<?php
<a href="http://example.com/bob" property="foaf:name" rel="foaf:homepage">Bob</a>
?>

theme_item_list() would need to be modified to allow the title <h3> to be tagged via the $attributes argument, following the same model as above with l(). Otherwise it is possible to embed RDFa in the list elements <ul> and <li> via the $attributes arguments as follows:

<?php
$items
= array();
$items[] = array('Gift of Silence', 'property' => 'dc:title');
$items[] = array('Joe', 'property' => 'dc:creator');
$items[] = array('2006-10-01', 'property' => 'dc:date');
theme('item_list', $items, 'the book', 'ul', array('about' => 'http://example.com/book#gift_of_silence'));
?>

which generates
<?php
<div class="item-list">
  <
h3>the book</h3>
  <
ul about="http://example.com/book#gift_of_silence">
    <
li property="dc:title" class="first">Gift of Silence</li>
    <
li property="dc:creator">Joe</li>
    <
li property="dc:date" class="last">2006-10-01</li>
  </
ul>
</
div>
?>

RDFaification of basic Drupal pages

While the above is being implemented, the XHTML output of the Drupal basic page types (blog, book, forum, user profile... ) including the blocks could be tagged with the most relevant RDF terms. This in itself could require some slight vocabulary alignments, which will be possible at least in the case of SIOC. For instance, an OnlineBook class was recently added in the SIOC Types module to fit the needs of a particular Drupal-enabled project and a class such as 'ProfilePage' might be added in the same module if needed.

The Drupal RDF Schema I posted earlier this year will be one of the things we are planning to work on at the next VoCamp in Galway (Nov 25th - 26th). VoCamp is a series of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. One of the goals will be to update the Drupal core schema with the most suitable terms for describing Drupal data and tag its HTML output. The event is free and there are still a few places left.

RDFa in content types and fields

Since fields will be present in Drupal 7 along with content types, it would be a good idea to give site architects basic control over the RDF terms used to describe the content of their site. A simplified version of the RDF CCK module I presented in Szeged would suffice, where the RDF terms can be specified in a textfield or chosen from a short list of predefined terms.

  • Each content type is described by an RDF class
  • Each field is described by an RDF property

Modules implementing their content type and fields could predefine these terms, but site administrators could change these default terms to better match their application.

Documentation

There will be some need to document RDF/RDFa best practices in order to educate both module maintainers and site architects.

Open questions

Should we force RDFa or should there be an option to turn it off?
In some rare situations, site owners might not want to expose RDF data about their users for example. In the past, some sites had to turn off their FOAF exports after complaints. A mechanism should be implemented in order to let site administrators to opt-out. It could be done on the permissions level, content type/ field level, or on the node level (similarly to the way comments work). That would be easy to do in the l() function. For the theme functions, another mechanism should be put into place.
RDFa in the node body
This would be more the role of an WYSIWYG editor, or could be typed by hand. This is a separate issue, and we should focus first on implementing RDFa in the code Drupal outputs as described above.

Comments

Yes, please. :)

Dries's picture

Yes, please. :)

Hello Stephane, Great

davidseth's picture

Hello Stephane,

Great roadmap, thanks. I have a few things I was writing up in a Google doc which I will add to this conversation.

Can't wait for the first new code to land!

Force or Not

shelleyp's picture

"Should we force RDFa or should there be an option to turn it off?"

We shouldn't force RDFa, the same as we shouldn't force syndication feeds. People use Drupal for different purposes, and not all of the purposes have to do with "participating in a broader community".

As for implementing the automated stuff, there's a lot of metadata that can be generated automatically. Adding in post-specific metadata annotation can be done via optional modules.

A real issue is going to be the XHTML specific content type...

Opt-out

comfycat's picture

I also agree that there has to be an option to turn off RDFa.

For some data of users I'd argue that an opt-in would be more appropriate - especially things like sha1 of people's email addresses (which is currently often used in foaf files) shouldn't be exported by default, as many people want to decide for themselves if they can be uniquely idenfitied of not.
I'm not sure whether other inverse functional properties like foaf:homepage should be exported by default.

Anyway: great project!

WYSIWYG

kvantomme's picture

Great write up!

Erno, a colleague is working on a microformat-RDFa WYSIWYG editor for his thesis project. At this point it's still in the research phase but I have great expectations for the project.


Drupalcon Szeged was a blast! Check out the video's on http://szeged2008.drupalcon.org/program/sessions

--

Check out more of my writing on our blog and my Twitter account.

Keeping it decoupled...

dahacouk's picture

+1 to whole concept. I really hope this gets all the way into D7 core.

+1 to being able to opt-out.

+1 to "site administrators could change these default terms to better match their application". However, I'd like to see the ability for site administrators to change default terms not only for "RDFa in content types and fields" but for "RDFaification of basic Drupal pages" too and anywhere else that RDF/RDFa is being output by core or modules.

Site administrators should have the last say for what their site means and how it relates to the rest of the Semantic Web. I think it important that we provide some educated default suggestions but nothing should be hard coded in. See a bit more here and here.

I'd also like to see the same interface whether you are mapping RDF terms from attributes in core, contributed modules, CCK fields or whatever. That would be sweet. And the site administrator should get progressively more intense warnings the further into core they go to change these terms.

Cheers Daniel

:-)

dman's picture

OK. This looks like a great start to me.
I'm in!
Very neat.

Thanks for your efforts

Wim Leers's picture

Thanks for your efforts Stéphane! The first piece of RDF in Drupal is now a fact, yay! :)

Wooottt!! Drupalcon Szeged

kvantomme's picture

Wooottt!!


Drupalcon Szeged was a blast! Check out the video's on http://szeged2008.drupalcon.org/program/sessions

--

Check out more of my writing on our blog and my Twitter account.

RDF mappings in Drupal 6.x

Arto's picture

Note that the development version of the RDF module for Drupal 6.x includes support for defining RDF mappings for Drupal content types, CCK fields, taxonomies as well as profile fields; mappings can be defined both on the API level (third-party modules) and by the administrator via the Drupal administration interface:


Only local images are allowed.

From a quick look at Stéphane's patch for Drupal 7.x, our APIs are very similar, and anybody writing code for 6.x using the RDF API should have an easy time converting to 7.x.

Interface for mapping RDF terms...

dahacouk's picture

Arto and Stéphane:

I'd also like to see the same interface whether you are mapping RDF terms from attributes in core, contributed modules, CCK fields or whatever.

Is this your current thinking too?

Cheers Daniel

It's a core question

Arto's picture

Those aspects depend on the Drupal core providing a unified and consistent data model and API. The increased field-centricity in Drupal 7.x will no doubt contribute towards this end.

Semantic Web

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week