Following up on my previous post introducing the in-progress RDF API for Drupal 6.x, I'm going to be, bit by bit, posting some of the materials that I've earlier put together internally for the project this is being developed for. Development of this project is supported through M.C. Dean and MakaluMedia on behalf of their clients.
First, some requisite background. Our use cases for RDF evolved out of several years of building a communications and collaboration platform serving the needs of an international community of intergovernmental agencies and organizations. The project as well as the potential applications described below reflect Ivan's and Chris's ideas and vision.
This environment presented us with a variety of challenges, such as deploying the application across multiple discrete organizations that operate under little or no centralized authority, supporting a wide range of operational scenarios, and finally managing or brokering large volumes of context-sensitive information delivered via a variety of protocols from a multitude of sources, in formats ranging from web-native to rich media files to device specific data streams.
Development of many well-known Drupal modules has been funded through this project while tackling these challenges; examples include OpenID, OpenSearch, LDAP provisioning, Timeline and Boost. Even more modules are yet to be released as open source, including e.g. a WebDAV server module and embedded instant messaging and webmail for Drupal.
Our platform is currently based on Drupal 4.7.x, and is probably one of the most complex Drupal systems around: we're making use of over 130 contributed modules and have a total code base size of nearly 600 KLOC (not a point of pride). Moving onwards to Drupal 6.x, our goal is to radically simplify the system by standardizing on RDF to allow for more precise expression and efficient sharing of information, and to utilize Exhibit as the central technology driving the user interface.
I mention all this to make clear the context of our use of and interest in Drupal and to illustrate why the use cases that follow are necessarily of an "enterprisey" nature, dealing with reducing the complexity of our system design, increasing its interoperability with external data sources, and improving the usability of our user interface for dealing with large quantities of multi-faceted information.
Here to the right are two slides from a presentation at our recent developer meeting (click on the images to see the full-sized version on Flickr).
They depict (on a high level) a system architecture enabled by the RDF support for Drupal, tying several Drupal instances together into a federated system via RDFbus (publish/subscribe messaging for transmitting RDF payloads over various transports including XMPP and Stomp) and SPARQL.
Most significantly, this loosely-coupled architecture provides a measure of technology neutrality which allows individual system components (i.e. shards of the datagraph) to be replaced with other interoperable systems, ranging, for example, from high-productivity web frameworks such as Rails or Django to legacy J2EE systems. Because, frankly, Drupal isn't always the right answer regardless of the question.
I will need to write up more public details on RDFbus, later, but here follow some initial quick thoughts on potential specific RDF API-based use cases for Drupal 6.x - the low-hanging fruit, if you will. The list ranges from the obvious to the speculative, in ascending order.
(Note that if any of the preceding was all greek to you, there may be something more concrete in what follows; generally, though, I should warn that if you haven't previously heard of RDF, or know RDF only from the context of RSS, then reading at least this 10-minute introduction to RDF is pretty much a requirement for making any sense of this stuff... other introductory materials, including videos introducing these concepts, are linked to from the RDF API's developer documentation.)
Generic metadata storage
Many (most?) Drupal modules need to store metadata in some form or another. Until now, each Drupal module has needed to implement its own metadata storage in an independent, redundant and incompatible manner.
In practice, Drupal's configuration variables have also come under widespread abuse for purposes of storing metadata in situations where developers needed to describe information but felt that creating (and in the future supporting) a whole set of custom specific SQL table structures was too heavyweight a solution.
Drupal clearly would benefit from a generic solution here, and indeed implementing unified metadata storage in Drupal has been discussed on the development mailing list several times in the past. The RDF API provides a solution to this with sufficient generality to cover any specific use case.
Exhibit displays structured data in the form of rich visualizations that can be searched, filtered and sorted using faceted browsing. Exhibit is the perfect user interface complement to the flexibility that RDF brings to play on a systems level.
While the possibilities are really endless, some of the more obvious applications of Exhibit may include navigating the user profiles by their organization membership, expertise, and social relationships; filtering event data by subject, audience, and timeframe, and graphically representing the events on both a map and a time grid; combining full-text search with taxonomy, group, and user- based filtering to locate content; and others. All the preceding using a common, consistent set of data manipulation and visualization tools.
This is one of those cases where a demo speaks more than a thousand words. If you haven't seen Exhibit or Potluck in action, go do so now. The Exhibit module will bring these abilities to Drupal through the means of an Exhibit-SPARQL bridge.
Utilizing open, linked data
Utilizing the RDF API, the openly available big datasets can be used to enhance mashups built on top of Drupal.
For instance, any factual information which Wikipedia has can be retrieved or copied from DBpedia. As a trivial use case: instead of manually importing, say, the list of countries into a Drupal instance from a CSV or XML file, just pull the equivalent triples from DBpedia or the CIA World Fact Book using a single SPARQL query. Along similar lines, the GeoNames dataset has information on 6.5 million geographical location worldwide, including facts on every city and town; there are sure to be some intriguing possibilities here for spatial mashups.
From modules to widgets
The RDF API for Drupal will include a jQuery API for querying and changing the metadata stored in the Drupal instance. This opens up a plethora of new possibilities.
Even today, many UI-oriented Drupal modules live almost wholly on the browser side, maintaining only a tenuous, feeble server-side implementation responsible for taking care of persistent storage needed by the client-side functionality provided by the module. With the introduction of unified metadata storage for Drupal, it is conceivable to cut this server-side cord altogether; this is in line with post-Web-2.0 trends which are seeing more and more UI logic being transitioned from the server into the browser.
A perfect case example is the Image Notes module. The sole purpose of this module is to dynamically attach an annotation widget to photos displayed on a Drupal site, allowing users to provide descriptive text for bounded regions of the image (e.g. marking up the names of the people depicted in the photo). Using the jQuery API for storing annotation metadata, this component needs no further customized server-side code, meaning that it can become either an extremely lightweight Drupal module, or perhaps even be directly incorporated into the Drupal site's theming layer without any need for an actual Drupal module per se.
In general, widgets like this could conceivably be faster and quicker to develop than Drupal modules, be more amenable to quickly changing requirements, and not require any Drupal-specific expertise on the part of the developer - and many widgets could indeed work with any backend application as long as a compatible jQuery metadata API was provided.
RDF export functionality
With the introduction of the RDF API, every Drupal resource now has an associated machine-readable RDF description available via auto-discovery. If desired by the administrator, all the data stored in a Drupal instance can be exposed this way. This supplants previous solutions such as the Import / Export API.
(Implementing RESTful POST, PUT and DELETE semantics - perhaps interoperable with AtomPub - for uploading changes in RDF format is beyond the current scope, though; RDFbus covers much the same ground, but using an asynchronous messaging paradigm.)
RDFbus obsoletes the existing XML-RPC-based Publish/Subscribe solution for Drupal, goes one further by not being tied into specific content types or data, and potentially enables notification and synchronization with any client application or device which can talk XMPP and understands triples.
Drupal vocabularies, categories, user accounts, groups, pages, blog posts, events, files; RDFbus is content-agnostic and will transport anything which can be described in RDF.
Distributed identity and trust
On the one hand, RDF allows describing user/group equivalences between two Drupal instances; on the other, RDFbus allows the same physical user account to be synchronized between these instances for purposes of Drupal modules that aren't RDF-aware. In addition, FOAF relationships between user accounts, groups and sites (user-to-user, user-to-group, group-to-site, and so on) provide the basis for a distributed network of trust.
This might provide similar functionality as LDAP, but in a manner that is decentralized, happens over the cloud, and is FOAF-compatible. (Note that OpenID is orthogonal to this setup; each user merely needs a URI, and that can equally well be provided by OpenID or something else.)
RDF allows mashing up any information about anything. SPARQL has built-in support for federated queries over any number of RDF data sources. OpenID + FOAF provide a global URI-based identity with trust metrics for access control and personalization purposes.
Something might perhaps be built on top of SPARQL that would not only obsolete OpenSearch, but make it seem crude in comparison.
Think of the benefits that the Views module provided for individual Drupal instances, and imagine something that takes this concept and runs with it a little further: aggregated for any number of Drupal instances or other RDF-aware data sources, and based on SPARQL instead of SQL.
Imagine what you'd want to see, in a tabular format or otherwise, and if the source data for it exists on the local RDF cloud, a SPARQL query can be written to traverse the graph, transform the results, and pull up the data for visualization purposes.
Given the aforementioned paradigm shift to widgets, this yet-to-be-named component (Sparkling Views?) could support richer, higher-level output than what the Views module does. Timelines, sparklines, piecharts; you name it, and if a widget exists, it could be plugged in.
Given all the facilities detailed above, a next step would be building a dashboard-like central hub interface for monitoring real-time activity in all instances (over the RDFbus) and performing aggregate queries (using SPARQL) over the full datagraph for e.g. obtaining metrics and analytics while respecting local privacy concerns. This would be useful both for management and system administration purposes in an organization relying on a distributed Drupal platform.