Drupal RDF Schema proposal

scor's picture

I'd like to share some quick thoughts on how Drupal Data could be described in RDF. The attached schema represents the mappings between the current Drupal data structure and the proposed RDF Schema, reusing existing ontologies such as Dublin Core, FOAF, SIOC and SKOS.

The green circles represent the Drupal objects (node, revision, user, role, term), with their equivalent RDF class. The rectangles are the values used in Drupal. It's important to differentiate a class from its actual instances (resources) which are each defined by a unique URI, see the examples below. This schema is meant to be simple, incomplete, and to show the main core features.
Comment and Node are 2 different elements in Drupal, they can be combined in the same Class with the recursive property sioc:has_reply (Comment as Node). Node and Revision objects are separate here as they are in the Drupal Data structure, but they could fundamentally be merged as well.

I presented SIOC at DrupalCon Barcelona, and showed how it can be used to describe online communities. The SIOC sioc:Item class which I used here as equivalent of a Node is a broad Class with many sub-types: AddressBook, AnnotationSet, AudioChannel, BookmarkFolder, MailingList, MessageBoard, BlogPost, BoardPost, WikiArticle... See the SIOC Types Module for more details.

This is an example of how this schema can be used in the case of a role:
The user 5 has the 'authenticated user' role. This role has 3 permissions: 'create book content', 'view revisions' and 'upload files'.
user 5 is an instance of sioc:User and its URI is http://example.com/user/5.
'authenticated user' role is an instance of sioc:Role and its URI is http://example.com/admin/user/permissions/2.
All this information can be expressed in turtle:

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix drupal: <http://drupal.org/ns/> .
@prefix sioc: <http://rdfs.org/sioc/ns#> .

<http://example.com/user/5>    sioc:has_function    <http://example.com/admin/user/permissions/2> .
<http://example.com/admin/user/permissions/2>    dc:title    "authenticated user"@en .
<http://example.com/admin/user/permissions/2>    drupal:has_permission    "create book content"@en .
<http://example.com/admin/user/permissions/2>    drupal:has_permission    "view revisions"@en .
<http://example.com/admin/user/permissions/2>    drupal:has_permission    "upload files"@en .

Another example in RDF/XML describing the Drupal 6.0 release post:

<sioc:Post rdf:about="http://drupal.org/drupal-6.0">
    <dc:title>Drupal 6.0 released</dc:title>
    <dcterms:created>2008-02-13T014:42:00Z</dcterms:created>
    <sioc:has_creator>
        <sioc:User rdf:about="http://drupal.org/user/4166" rdfs:label="Gábor Hojtsy" />
    </sioc:has_creator>
    <sioc:content>After one year of development we are ready to release Drupal 6.0 to the world. Thanks to the tireless work of the Drupal community, over 1,600 issues have been resolved during the Drupal 6.0 release cycle. These changes are...
    </sioc:content>
    <sioc:topic rdfs:label="News and announcements" rdf:resource="http://drupal.org/forum/8"/>
    <sioc:topic rdfs:label="Drupal 6.x" rdf:resource="http://drupal.org/taxonomy/term/102"/>
    <sioc:has_reply>
        <sioc:Post rdf:about="http://drupal.org/drupal-6.0#comment-728240" />
        <sioc:Post rdf:about="http://drupal.org/drupal-6.0#comment-728246" />
        <sioc:Post rdf:about="http://drupal.org/drupal-6.0#comment-729369" />
    </sioc:has_reply>
</sioc:Post>

Thanks to Sergio Fernández, Thomas Schandl and Uldis Bojars for their feedback.

AttachmentSize
drupalRDFschema.png74.93 KB

Comments

This RDF in Drupal movement

moshe weitzman's picture

This RDF in Drupal movement is really coming together. Thanks for helping us along. I'd love some comments from others knowledgeable in this area.

Well done

hendler's picture

+1

I think this incorporates a lot of existing standards, as it should, and SIOC is working with the W3C to make this an accepted standard. What is above is a good core schema for the core data of Drupal. Basically, having Drupal buy-in to SIOC, and SIOC into Drupal can only be good for moving all of this forward.

The one concern I had was about expanding this to include CCK. While CCK is an extension, it is part of core and widely used. There is a lot of meta data out there, should the RDF lib successfully deal with CCK.

There are two approaches:

  1. let the user create arbitrary URIs
  2. let the RDF store automatically map from CCK field/node types to a core RDF schema

I think the second approach is best, and best if the second approach is done in a way that is not Drupal specific. I think this is the hard part of mapping (deciding the level of constraints to provide), so perhaps this would be out of scope for a first round.

Scor suggested also blending the above approaches, where you can choose from a pulldown list, or see a tooltip, when you create CCK field, so the user can decide on a mapping to an existing RDF property.

Thanks to Scor for the very

evgeny's picture

Thanks to Scor for the very comprehensive schema proposal. Very helpful.

As far as the implementation goes, I think the blended approach (if I understood the above correctly) would be optimal in the longer term. The mapping proposed above could perhaps constitute the best practice, which would work for a good majority of installs. At the same time, the data architect for each particular install would be able to define /adjust the mappings to reflect the semantics of the problem domain the particular install supports.

The simple example that comes to mind is group membership, which, depending on the supported domain, may mean COI membership, social connection, project participation, etc. Depending on that, one or another vocabulary mapping may be more suitable. The same will apply to CCK fields, etc.

In other words, defining the RDF schema for a particular install (data architect's job) would be decoupled from the mechanism of mapping Drupal data objects to the schema.

Just my $.02.

drupal:type?

fgiasson's picture

Hi here,

Really nice mapping you have here. You certainly got the point of re-using existing ontologies instead of starting new one. This becomes much easier to integrate that stuff into the linked data space :)

Just one little thing that I noticed: drupal:type. If it really needed?

First of all, I would suggest you to take a look at this: http://rdfs.org/sioc/types#

Then check how they produced new types (such as a BlogPost). The idea is the following: you have a sioc:Item. However if you want to create more precise types for drupal's needs, then you create new classes and make them subClassOf sioc:Item, sioc:Post, sioc:BlogPost, or whatever.

The resource (item) should have its own "type" without creating a new type property (there is enough of one: rdf:type :) )

There you will create the drupal "extension" ontology by defining these new classes such as:

<rdfs:Class rdf:about="http://drupal.org/ontology/WhateverTheNewType">
<rdfs:label xml:lang="en">Label (name) of the new type</rdfs:label>
<rdfs:comment xml:lang="en">Extensive description of what is this new type about.</rdfs:comment>
<rdfs:subClassOf rdf:resource="http://rdfs.org/sioc/ns#Post"/>
<rdfs:isDefinedBy rdf:resource="http://drupal.org/ontology/"/>
</rdfs:Class>

And so on. From there, you can create union of subtypes, etc, etc, etc. You ahve the leasure to define things as you want.

This comment would also apply to sioc:Role.

I would be pleased to revise your efforts on this project, you only have to contact me directly or by other means.

Thanks,

Take care,

Fred

thanks Fred for this

scor's picture

thanks Fred for this contructive comments. I'll take a closer look and update my graph.

btw, just came out the demo of Twine we had here at DERI by Nova Spivack. It's quite impressive what they've been able to do with the Semantic Web! Let's see how far we can get with Drupal.

vocabulary defines property relating node to term

thomas_green's picture

A drupal taxonomy vocabulary would theoretically be the equivalent of a specific RDF property relating a node to terms in that vocabulary. I.e if I create a "species" vocabulary and associate it with a property such as "xyz:hasSpecies" (where xyz: is a prefix for some appropriate species namespace) with a term "gorilla" in it. Then tagging node X with that term translates to a triple like:

<node x> , xyz:hasSpecies , xyz:gorilla

Perhaps this is just saying that with specific vocabularies you'd be in effect defining subproperties of sioc:topic in your picture.

Drupal 6 vs Drupal 7

Arto's picture

This is a very good diagram, Stéphane, and the mapping you outline makes perfect sense for a semantic CMS such as what Drupal hopefully will evolve into.

I've however currently taken a bit of a different approach to the RDF mapping as provided by the RDF Schema submodule of the RDF API for Drupal 6.x. Essentially, I map everything by default to Drupal-specific properties which are then defined to be subproperties of well-known external vocabularies such as Dublin Core and SIOC; for example, the node:title property would be defined as a subproperty of dc:title.

There are several reasons why I favor this indirect mapping.

First and foremost, it makes everything rather straightforward and simple for developers (including less technical Drupal users who may be defining SPARQL queries): there is no need to remember which particular external vocabularies and terms a given Drupal entity's fields are mapped to. That is, the RDF API provides a "logical" reification of the implicit current Drupal vocabulary, with easily-memorable terms such as user:mail, node:created, term:description, cck:my_arbitrary_field and so on. I think this will significantly ease the RDF learning curve for Drupal developers.

Second, I don't think that we will ever be able to map every Drupal field to pre-existing terms in external vocabularies (this is true by definition, since CCK exists), but we might still want to make the data in question available as RDF. When working with RDF-based data interchange in an all-Drupal context (e.g. importing and exporting data, or synchronizing several RDF-enabled Drupal 6.x instances as I'm looking to do with RDFbus), the indirect mapping provides for unambiguous exchange of potentially all data in a given Drupal instance without reference to any external vocabularies. Again, this means simplicity and ease of use for Drupal users.

Third, the indirect approach unblocks development of both the RDF API itself and modules dependent on it; instead of constantly interrupting development with costly microdecisions regarding which particular external term to map something onto in order to be able to use it in the code, we can just go ahead and steam full ahead. Essentially, the indirect mapping removes the need for developers to figure out things which are really more the purview of data architects (and, as Evgeny mentions above, may often even be case-specific - certainly for CCK fields).

So, in summary, I've presently adopted the indirect mapping because it enables two parallel tracks (developers and data architects) to proceed relatively independent of each other, instead of a single one that would have a critical path lined up with constant roadblocks.

While I believe what I've described above to be the right solution for Drupal 6.x, I've explicitly left the door wide open to alternative approaches in the design of the RDF API by factoring out the RDF Schema submodule from the core RDF API itself. Anyone wanting to explore alternative schema mappings is thus encouraged to fork the RDF Schema module and to do a better job at it :-)

For Drupal 7.x, I'd definitely like to see RDF integrated (as Boris would say) into the DNA of Drupal to such an extent that this whole question could be revisited from the ground up.

For the time being, I'm going to adopt Stéphane's suggested terms into the RDF vocabulary defined by the existing RDF Schema module, in the form of providing default mappings to external vocabularies for the terms in question. I will also work to provide mechanisms for overriding these on an instance-specific basis (for CCK fields and for taxonomy vocabularies, per Thomas's idea above).

I think what we need right now is a healthy dose of experimentation and coding, from which convergence and standardization will eventually follow.

Keeping it decoupled...

dahacouk's picture

A beautiful and useful diagram, Stéphane...

I strongly support arto's suggestion of indirect mapping for RDF (and other Semantic Web markup languages) in Drupal. I may even suggest taking it a little further.

Every module, theme, data type, Drupal add-on, etc should bring in its own semantic vocabulary as arto suggested:

user:mail
node:created
term:description
cck:my_arbitrary_field

But then these must map to absolute URIs and the only choice is to map them initially to the local website:

www.mydomain.co.uk/drupal/user:mail
www.mydomain.co.uk/drupal/node:created
www.mydomain.co.uk/drupal/term:description
www.mydomain.co.uk/drupal/cck:my_arbitrary_field

Then the webmaster is free to link/map terms in their own namespace to terms in any other namespace they wish - such as:

drupal.org/user:mail
drupal.org/node:created
drupal.org/term:description
drupal.org/cck:my_arbitrary_field (perhaps not but you get the picture)

Or:

sioc:email
dcterms:created
dc:description
mpeg:song_beats_per_minute

Or you can map your local namespace to both sets shown here and/or many, many others around the web. Each local term like "www.mydomain.co.uk/drupal/user:mail" could map to multiple other terms in external semantic ontologies. The point is that it is up to the webmaster to decide. There is no one right way.

But just as fresh Drupal installs come with pre-loaded themes and modules so too should Drupal come with pre-loaded links/mappings to external semantic ontologies. That's where I think Stéphane's diagram will be very useful to us.

But I would strongly recommend that each Drupal install comes with at least one set of pre-loaded external ontology mappings: drupal.org. That should be the first one. Again, each Drupal module will come with it's own mini-namespace (as it comes with it's own mini-translation and mini-CSS). It's most likely that many modules will share the same terms. But all Drupal project/theme/whatever namespaces should be normalised on drupal.org. Then people will be free to offer up suggested mappings of this drupal.org normalised namespace to external sources (like DC or MPEG) for webmasters. And webmasters can use these suggestions or roll their own.

Currently, as webmasters can modify themes or even make their own up completely from scratch so should they be able to do the same with their local RDF vocabulary external mappings.

I'm really emphasising user control here. In Boston Dries said "Eliminate designer" among others. Well, I'm following that theme.

It's also politically incorrect to define hardwired external RDF linkages. If you track Tim Berners-Lee you'll notice a common theme running through all his talks: decoupling. Such as "Separation of form and content" and "Independence of the communication from the provider of software".

I hope I haven't laboured the point (too much) nor tried to teach you to suck eggs.

I'm so much looking forward to the day that Drupal has the kind of semantic/RDF configuration tools that we currently have for building themes and translations. Being able to pull in external ontogolies and incorporate them into our local systems is going to be so fun. And we should be able to do this without having to see any RDF. Or we could view the graph as N3 or bubbles (!) or whichever way we wanted. I'll stop there. I could go on though...

Cheers Daniel

my 2 cents

dorgon's picture

Hi,

your graph looks great, Stephane. Always reuse existing vocabs. It's great that for a community platform like Drupal there are many popular vocabs around which perfectly fit.
I would also vote for Jonathan's 2nd approach with the possibility of manually specifying URIs overriding the default URI mapping.

@arto:
Do you assert instances with the sub-classed Drupal-specific concepts into your store? Or do you just allow app. developers to use the concepts from the drupal namespace instead of DC/SIOC/etc. when accessing the graph because of convenience? At least the indirect mapping requires subsumption reasoning all time. When exporting data or providing public sparql endpoints it would be better to directly assert existing concepts and add custom classes and properties only if necessary. It's not a problem to introduce new durpal-sepcific concepts in a "Drupal" namespace and intermingle these with DC/SIOC/SKOS/DOAP/etc... but only if there is no existing concept already.

@dahacouk:
please note that there is a difference between ontological concepts like classes and properties and instances of these (and literals). For instance, www.mydomain.co.uk/drupal/user:mail is a concept and should have a globally unique namespace URI like http://drupal.org/vocab/2008-03-14/user#mail. Instances like users should have installation or domain-specific URIs like http://www.mydomain.co.uk/drupal/mike.

Regarding URI schemes I'd like to point you to these resources:
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
http://www.w3.org/TR/swbp-vocab-pub/

I would appreciate if a Semantic Drupal also complied with these specs and supports 303-redirects used to point to description of non-information resources.

Great what's going on in this community. I've just dropped in after our "Semantic Web" Drupal meetup in Vienna this week!

Best regards,
Andy

Keeping it decoupled...

dahacouk's picture

@dorgon
Do you mean www.mydomain.co.uk/drupal/user:mail is a "concept" or a class or property? I think you mean property, yes? mail as a property of user?

You seem to be supporting the idea of a drupal.org semantic "vocab" namespace? Cool.

So each module project will have to specify their own namespace and map this onto a normalised drupal.org namespace. I guess, in some ways, this is already part of the way there as variables are declared in the database.

And supporting the idea of "manually specifying URIs overriding the default URI mapping"? Double cool.

If that happens at every level within a given Drupal installation then I'm really happy. I should add that giving webmasters the power to modify or add semantic links to all classes, properties and instances within their own installation should be managed carefully. So, instances are easy to make links for whereas access to properties and classes are progressively more "difficult" - hidden behind "advanced user" and "are you sure?" buttons.

Cheers Daniel

vocabs vs. instances

dorgon's picture

basically I said "concept" for class or property.

Let me explain some terms. While "ontology" is a rather universal term, used to denote BOTH,
* concepts (classes, properties) + restrictions/rules/etc. AND
* instances (of classes and props, ie. "data"),
there is a common sense to use the term
* "vocabulary" for all the intentional concepts like classes, properties, etc. (also called TBox for terminological box) and
* "instances" or "individuals" (in OWL terminology) for the data that make up the knowledge base (also called ABox for assertions box).

Right, a user's mail address is a property of the class User. You should never use ":" inside URIs, this can cause confusion when using namespace prefixes, better write: www.mydomain.co.uk/drupal/user#mail or www.mydomain.co.uk/drupal/user/mail - note there is a difference when publishing vocabularies either with anchor URIs or slashes [1].

And it's always a good practice to include a "version date" into your URI! Read this [2] and that [3].

So each module project will have to specify their own namespace and map this onto a normalised drupal.org namespace
Again, separate between vocabularies and data. So, sb. providing a module or creating a new content type, will possibly create a new vocabulary to describe data (aka DB schema). So (s)he would introduce a new namespace. Yes, there should be some convention on that, like "http://" + website domain + "/vocabulary/" + YYYY-MM-DD + "/" + vocab-name + "/" + concept URI => here again, [1] and [2] should be consulted.

And supporting the idea of "manually specifying URIs overriding the default URI mapping"? Double cool.
hmmm...... Additionally, I would allow users to add properties like "owl:sameAs". There is actually no need to completely "override" the default URI, would make things more complicate I think.

And on the other hand, creating nodes means creating "data" which is aligned to a vocabulary. I think each URI of a node should remain equal to the URI scheme currently used: "http://" + website domain + "/node/" + NID - this is already a good approach!

Read this [4]! => it would be great if Drupal uses content negotiation to separate between delivering HTML pages and plain RDF data if the client requests for "Accept: application/rdf+xml".

Regards
Andy

[1] http://www.w3.org/TR/swbp-vocab-pub/
[2] http://www.w3.org/TR/cooluris/
[3] http://www.w3.org/Provider/Style/URI
[4] http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

on Dublin Core

bdarcus's picture

Great to see this work!

Keep in mind that the Dublin Core group is cleaning up their RDF by moving the core terms (in the namespace you use here) over to the dcterms namespace, and giving those (generally) explicit domains and ranges. So, for example, no more wondering whether a dc:creator or dc:publisher is a literal or an agent resource; it's now the latter. See here for more.

Given the timing, it probably would make sense to use these new properties.

roles and permissions

comfycat's picture

Hi scor,

I am just going through your posts on Drupal and SemWeb, very helpful stuff!

Did you notice that SIOC has now an module for roles and permissions? I think this module still needs some polishing, so maybe you can talk about your use case to Uldis (and possibly include Cosmin and Deirdre who are also interested in that)