Potential RDF use cases for Drupal

Posted by Arto on February 19, 2008 at 4:50am

Following up on my previous post introducing the in-progress RDF API for Drupal 6.x, I'm going to be, bit by bit, posting some of the materials that I've earlier put together internally for the project this is being developed for. Development of this project is supported through M.C. Dean and MakaluMedia on behalf of their clients.

First, some requisite background. Our use cases for RDF evolved out of several years of building a communications and collaboration platform serving the needs of an international community of intergovernmental agencies and organizations. The project as well as the potential applications described below reflect Ivan's and Chris's ideas and vision.

This environment presented us with a variety of challenges, such as deploying the application across multiple discrete organizations that operate under little or no centralized authority, supporting a wide range of operational scenarios, and finally managing or brokering large volumes of context-sensitive information delivered via a variety of protocols from a multitude of sources, in formats ranging from web-native to rich media files to device specific data streams.

Development of many well-known Drupal modules has been funded through this project while tackling these challenges; examples include OpenID, OpenSearch, LDAP provisioning, Timeline and Boost. Even more modules are yet to be released as open source, including e.g. a WebDAV server module and embedded instant messaging and webmail for Drupal.

Our platform is currently based on Drupal 4.7.x, and is probably one of the most complex Drupal systems around: we're making use of over 130 contributed modules and have a total code base size of nearly 600 KLOC (not a point of pride). Moving onwards to Drupal 6.x, our goal is to radically simplify the system by standardizing on RDF to allow for more precise expression and efficient sharing of information, and to utilize Exhibit as the central technology driving the user interface.

I mention all this to make clear the context of our use of and interest in Drupal and to illustrate why the use cases that follow are necessarily of an "enterprisey" nature, dealing with reducing the complexity of our system design, increasing its interoperability with external data sources, and improving the usability of our user interface for dealing with large quantities of multi-faceted information.

Here to the right are two slides from a presentation at our recent developer meeting (click on the images to see the full-sized version on Flickr).

They depict (on a high level) a system architecture enabled by the RDF support for Drupal, tying several Drupal instances together into a federated system via RDFbus (publish/subscribe messaging for transmitting RDF payloads over various transports including XMPP and Stomp) and SPARQL.

Most significantly, this loosely-coupled architecture provides a measure of technology neutrality which allows individual system components (i.e. shards of the datagraph) to be replaced with other interoperable systems, ranging, for example, from high-productivity web frameworks such as Rails or Django to legacy J2EE systems. Because, frankly, Drupal isn't always the right answer regardless of the question.

I will need to write up more public details on RDFbus, later, but here follow some initial quick thoughts on potential specific RDF API-based use cases for Drupal 6.x - the low-hanging fruit, if you will. The list ranges from the obvious to the speculative, in ascending order.

(Note that if any of the preceding was all greek to you, there may be something more concrete in what follows; generally, though, I should warn that if you haven't previously heard of RDF, or know RDF only from the context of RSS, then reading at least this 10-minute introduction to RDF is pretty much a requirement for making any sense of this stuff... other introductory materials, including videos introducing these concepts, are linked to from the RDF API's developer documentation.)

Generic metadata storage

Many (most?) Drupal modules need to store metadata in some form or another. Until now, each Drupal module has needed to implement its own metadata storage in an independent, redundant and incompatible manner.

In practice, Drupal's configuration variables have also come under widespread abuse for purposes of storing metadata in situations where developers needed to describe information but felt that creating (and in the future supporting) a whole set of custom specific SQL table structures was too heavyweight a solution.

Drupal clearly would benefit from a generic solution here, and indeed implementing unified metadata storage in Drupal has been discussed on the development mailing list several times in the past. The RDF API provides a solution to this with sufficient generality to cover any specific use case.

Faceted browsing

Exhibit displays structured data in the form of rich visualizations that can be searched, filtered and sorted using faceted browsing. Exhibit is the perfect user interface complement to the flexibility that RDF brings to play on a systems level.

While the possibilities are really endless, some of the more obvious applications of Exhibit may include navigating the user profiles by their organization membership, expertise, and social relationships; filtering event data by subject, audience, and timeframe, and graphically representing the events on both a map and a time grid; combining full-text search with taxonomy, group, and user- based filtering to locate content; and others. All the preceding using a common, consistent set of data manipulation and visualization tools.

This is one of those cases where a demo speaks more than a thousand words. If you haven't seen Exhibit or Potluck in action, go do so now. The Exhibit module will bring these abilities to Drupal through the means of an Exhibit-SPARQL bridge.

Some sneak peeks of the module performing its magic are available on Flickr.

Utilizing open, linked data

Utilizing the RDF API, the openly available big datasets can be used to enhance mashups built on top of Drupal.

For instance, any factual information which Wikipedia has can be retrieved or copied from DBpedia. As a trivial use case: instead of manually importing, say, the list of countries into a Drupal instance from a CSV or XML file, just pull the equivalent triples from DBpedia or the CIA World Fact Book using a single SPARQL query. Along similar lines, the GeoNames dataset has information on 6.5 million geographical location worldwide, including facts on every city and town; there are sure to be some intriguing possibilities here for spatial mashups.

From modules to widgets

The RDF API for Drupal will include a jQuery API for querying and changing the metadata stored in the Drupal instance. This opens up a plethora of new possibilities.

Even today, many UI-oriented Drupal modules live almost wholly on the browser side, maintaining only a tenuous, feeble server-side implementation responsible for taking care of persistent storage needed by the client-side functionality provided by the module. With the introduction of unified metadata storage for Drupal, it is conceivable to cut this server-side cord altogether; this is in line with post-Web-2.0 trends which are seeing more and more UI logic being transitioned from the server into the browser.

A perfect case example is the Image Notes module. The sole purpose of this module is to dynamically attach an annotation widget to photos displayed on a Drupal site, allowing users to provide descriptive text for bounded regions of the image (e.g. marking up the names of the people depicted in the photo). Using the jQuery API for storing annotation metadata, this component needs no further customized server-side code, meaning that it can become either an extremely lightweight Drupal module, or perhaps even be directly incorporated into the Drupal site's theming layer without any need for an actual Drupal module per se.

In general, widgets like this could conceivably be faster and quicker to develop than Drupal modules, be more amenable to quickly changing requirements, and not require any Drupal-specific expertise on the part of the developer - and many widgets could indeed work with any backend application as long as a compatible jQuery metadata API was provided.

RDF export functionality

With the introduction of the RDF API, every Drupal resource now has an associated machine-readable RDF description available via auto-discovery. If desired by the administrator, all the data stored in a Drupal instance can be exposed this way. This supplants previous solutions such as the Import / Export API.

(Implementing RESTful POST, PUT and DELETE semantics - perhaps interoperable with AtomPub - for uploading changes in RDF format is beyond the current scope, though; RDFbus covers much the same ground, but using an asynchronous messaging paradigm.)

Publish/subscribe aggregation

RDFbus obsoletes the existing XML-RPC-based Publish/Subscribe solution for Drupal, goes one further by not being tied into specific content types or data, and potentially enables notification and synchronization with any client application or device which can talk XMPP and understands triples.

Drupal vocabularies, categories, user accounts, groups, pages, blog posts, events, files; RDFbus is content-agnostic and will transport anything which can be described in RDF.

Distributed identity and trust

On the one hand, RDF allows describing user/group equivalences between two Drupal instances; on the other, RDFbus allows the same physical user account to be synchronized between these instances for purposes of Drupal modules that aren't RDF-aware. In addition, FOAF relationships between user accounts, groups and sites (user-to-user, user-to-group, group-to-site, and so on) provide the basis for a distributed network of trust.

This might provide similar functionality as LDAP, but in a manner that is decentralized, happens over the cloud, and is FOAF-compatible. (Note that OpenID is orthogonal to this setup; each user merely needs a URI, and that can equally well be provided by OpenID or something else.)

Distributed search

RDF allows mashing up any information about anything. SPARQL has built-in support for federated queries over any number of RDF data sources. OpenID + FOAF provide a global URI-based identity with trust metrics for access control and personalization purposes.

Something might perhaps be built on top of SPARQL that would not only obsolete OpenSearch, but make it seem crude in comparison.

Aggregated views/visualizations

Think of the benefits that the Views module provided for individual Drupal instances, and imagine something that takes this concept and runs with it a little further: aggregated for any number of Drupal instances or other RDF-aware data sources, and based on SPARQL instead of SQL.

Imagine what you'd want to see, in a tabular format or otherwise, and if the source data for it exists on the local RDF cloud, a SPARQL query can be written to traverse the graph, transform the results, and pull up the data for visualization purposes.

Given the aforementioned paradigm shift to widgets, this yet-to-be-named component (Sparkling Views?) could support richer, higher-level output than what the Views module does. Timelines, sparklines, piecharts; you name it, and if a widget exists, it could be plugged in.

Global hub/dashboard

Given all the facilities detailed above, a next step would be building a dashboard-like central hub interface for monitoring real-time activity in all instances (over the RDFbus) and performing aggregate queries (using SPARQL) over the full datagraph for e.g. obtaining metrics and analytics while respecting local privacy concerns. This would be useful both for management and system administration purposes in an organization relying on a distributed Drupal platform.

Comments

This is an amazing writeup

Posted by bonobo on February 19, 2008 at 4:10pm

This is an amazing writeup -- I'll be coming back here periodically to re-read this in order to get my head fully around this.

Thank you for sharing this out -- both the modules, and this background.

Cheers,

Bill

FunnyMonkey
Tools for Teachers

FunnyMonkey

I can think of a lot of uses

Posted by SamRose on February 19, 2008 at 6:53pm

I can think of a lot of uses for this. I think it's really going to help Drupal to incorporate semantic web function, because it's my opinion that this is the direction in which more people are going to collaborating online in the future.

I think end users are going to start realizing that this is one way to work around information overload, and make data and information more portable across sites and communities

Sam Rose
Social Synergy

Sam Rose
Hollymead Capital Partners
P2P Foundation
Social Media Classroom

Spanning 2.0 and 3.0

Posted by cameron.hunt on February 21, 2008 at 11:46pm

Arto, I believe you have presented the most workable, feasible, and valid approach to integrating standards-based semantic technology into existing collaboration and social systems. I believe this work (and user-oriented work like it) will drive practical standards and applications.

Kudos!

A pure RDF Backend API to support precisely this sort of APPs

Posted by jccq on February 22, 2008 at 1:52pm

Hi Guys,

we at DERI (University of Ireland, Gawaly) have a team of approximately 10 fully dedicated to the creation and improving of a backend service that allows applications to locate RDF data on the web.

The engine is called SINDICE ( http://www.sindice.com ) and currently contains almost 27 million well formed RDF files which can be looked up by URI, keywords or others. A pinging service allows YOUR rdf (produced by drupal or whatever esel) to be indexed in less than 20 minutes, as well as a structured data focused crawler.

Many use cases are already possible, e.g. looking up people by using the hash of their email address to find messages they have been posting around. More will be possible soon with the new forthcoming API.

We highly welcome your comments and requests on our ML to help us focus at best supporting YOUR new cool "web 3.0" plugin :-)

Giovanni

Services?

Posted by robloach on February 22, 2008 at 9:58pm

I guess RDF could be thought of as a protocol for web services. Does it overlap with the Services module? The Services module implements REST, SOAP, XML-RPC, and some other protocols so far for interacting with different systems.

Services API for RDF

Posted by Arto on February 23, 2008 at 2:31am

No, RDF is a data model, not a protocol. Incidentally, the RDF API module already implements the Services API, providing support for RDF queries and operations.

thanks arto for the great post! - rdf and web services...

Posted by dorgon on March 25, 2008 at 6:23pm

Yes, RDF is "just" a very powerful, yet simple data model. It can be used for more "intelligent" applications because these usually require carefully described data.
You certainly can send RDF graphs inside web service calls and there was also a big (maybe too big) research initiative towards semantic web services the last couple of years (search for WSML or OWL-S). They use SW concepts for service discovery and service orchestration - but I'm a bit skeptical about the latter use case for RDF & co.

The use cases described by arto are exactly those examples where you can really benefit from RDF. You can much closer work with your data. And today, data is really the nucleus of any application...

Thanks

Posted by ChrisBryant on February 24, 2008 at 10:11am

Thanks for sharing this writeup. It's great to see this being worked on and it's exciting the possibilities this will enable, especially with pushing and pulling data between systems and formats.

--
Gravitek Labs

Nice Job Arto!

Posted by cglusky on February 24, 2008 at 5:59pm

Great article! I have seen parts of it in different places, but this really glues it all together.
Thanks,
Coby

Content Staging

Posted by kbahey on February 24, 2008 at 7:23pm

Great direction Arto.

With this being the replacement for Publish/Subscribe, I hope that finally we have a way where there is a staging server where content is created on first, then after approval, it is pushed to the live site.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Not so fast

Posted by moshe weitzman on February 24, 2008 at 8:10pm

The Feed API and Feed Element Mapper are a compellng replacement for Pub/Sub. See this development seed screencast. As for deploying content, we have a 3rd promising project - Deployment plans and Services.

Nope

Posted by boris mann on February 25, 2008 at 6:16am

I love FeedAPI and FEMP. Sticking stuff into a feed and then pulling it back out again is not ideal. Pub/Sub used XML-RPC and worked by transporting Drupal node objects "natively".

If we hash out a Drupal RDF schema, then this will also be "native" mapping. Although, yes, this could be "embedded" in feeds with a Drupal schema / namespace as well.

The extra transports that Arto has built are the big one here. The poll model of feeds isn't that great (push!).

Anyway, these two will need to keep an eye on each other. I'm looking at this RDF to potentially be a really good import/export, with an extensible schema that supports all the Drupal constructs.

It is a trivial change to

Posted by moshe weitzman on February 25, 2008 at 12:12pm

It is a trivial change to let feeds push rather than pull. I would expect it to be a few lines that hook into Services ... I am really interested in RDF as an import framework. Perhaps someone could elaborate on that a bit. How are nodes represented in RDF? Do we have any of this functionality yet?

RDF Import and Export, plus RDF Schema

Posted by boris mann on February 25, 2008 at 7:25pm

See http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/rdf/

The RDF Schema is what represents the Drupal data, and is one of the main things I think we should discuss in the Drupal BoF. If we can get agreement on the schema and a way to extend it, then everything else is cake :P

Other folks do this really well

Posted by hendler on February 29, 2008 at 4:09pm

You should really work with DERI and the W3C.

For example, DERI has dozens of people working on standards in alignment with the W3C. Development can be an individual kind of mission, but the standards thing really needs to fit into a "bigger than Drupal" world.

You don't need to just wait and see what the next generation standards are, Drupal has the position of being able to adopt SIOC, OPML, FOAF, DC, and many others. What other CMS/Blogs choose to do is important, but by taking a lead, and prehaps even creating a cross-CMS consortium on this topic, a lot more could happen. Drupal, I think, has the strongest community to make this happen.

To recap:

Bring in the standards experts like DERI
Open the dialog up with other CMSs/Blogs, since this effort is not just an inter-Drupal kind of effort.

Any more schema discussions going on?

Posted by joemoraca on March 30, 2008 at 3:42pm

While all of the development that is going on is truly awesome. (I can't wait to play with SPARQL in Drupal) I think there is lots of work to be done just creating best practices for how to map your "fields."

I have been trying to understand RDFa and the various ontologies / microformats like FOAF, DC, GEO, vCard, hCard etc and how to implement them in Drupal (themes, modules, views ... xrefs from your CCK fields to ??)... I think working on examples now will benefit those of us that will try to use Semantic Drupal next year.

Mapping a complex user profile to RDFa is no simple matter for "non experts" like me.

A good resource I found was http://www.w3.org/2006/07/SWD/RDFa/primer/ with links to a few more on my website

Joe Moraca
http://www.webdevgeeks.com

Joe Moraca
WebDevGeeks.com

RDF Importing

Posted by patrickgmj on February 26, 2008 at 2:50pm

Along the lines of using open/linked data, I can see applications that such more localized info into Drupal. For example, do use something like Solvent to scrape data from existing web pages, possibly combine it with additional data on the way in (using SPARQL CONSTRUCTS), and import that entire dataset into Drupal.

More real-life example: I've used solvent to scrape data about the course schedule for the university I work at, and data for all the faculty. Those are just separate web pages the university produces, but rdfizing them and stuffing them into Drupal would give students a way to see more info about the faculty right along side the courses they teach. Then make a Timeline or Exhibit out of it. That could also turn into a built-in approach to blogging in classes. . . TONS of possibilities expand from there.

Thanks for this project--it opens up a lot of interesting doors!

RDFbus and agents

Posted by Danny Ayers on March 6, 2008 at 11:24am

Nice work!

The federated systems/RDFbus material caught my eye in particular, it's very similar to something I've been pondering for a while (e.g. see Graph Farming, some of the later openoffice slides here), so I'm eagerly awaiting more details :-)

For what it's worth, although the underlying code would probably look the same, I've been thinking more in terms of an agent-oriented rather than a bus-oriented abstraction. A typical 'complete' agent will contain a HTTP client, (access from) a HTTP server, a local RDF model and (typically hardcoded) behaviour.

All participating agents support HTTP (the bus is simply the Web), and through that they can also advertise the fact they support XMPP (or any other protocols as well, including direct code calls where the agents were in the same machine/VM) - the idea being that comms over HTTP are always possible, but agents with support for protocols more suited to e.g. push can communicate more efficiently whenever possible, and any performance advantages of shared hosts/VMs can be exploited.

The reason I favour the agent-oriented abstraction is to make system composition easier - I imagined most agents being very, very simple (e.g. you might have one devoted to OpenID resolution). Although complex monolithic agents could still interoperate, most complex system would be constructed by gluing lots of simpler systems together. i.e. simple things should be easy, complex things are possible (but might be slow).

http://dannyayers.com
http://blogs.talis.com/nodalities/this_weeks_semantic_web/