Drupalcon Boston 2008 mashup demo

Posted by Arto on March 4, 2008 at 5:00am

This is the 5-minute "video from the future" demo presented by Dries in his State of Drupal keynote presentation at Drupalcon Boston 2008. The video demonstrates some of the mashup capabilities of an RDF and SPARQL-enabled Drupal as envisioned by Dries for the upcoming Drupal 7.x release. The version of the demo below includes the original narration by Ben Lavender (the audio from Dries's actual presentation is also available - the RDF material starts after the 52m:30s marker).

The production team consisted of Ben Lavender, Miglius Alaburda, and Dan Karran, who worked tirelessly and on short notice to put this together in time - great work, guys!

The work, both on this video and on my own ongoing development on the RDF, SPARQL and Exhibit modules for Drupal 6.x, was (and remains) generously sponsored by M.C. Dean and MakaluMedia.

A full hyperlinked transcript of the video follows:

This is the web as it was a few years ago - a web of pages. These pages are a lot easier to use these days, but making them work together is still difficult and time consuming.

RDF can change all that. As more and more applications become RDF aware, RDF can solve interoperability problems for your data and for everybody else's data, too. It does this by making that data accessible to machines in a way that they can not just parse, but actually understand.

This screencast is all about showing you how to make that happen, and how to make it happen without writing any code at all.

When we work with RDF in Drupal, we've got a lot of options for how our data can be formatted and where it can come from.

We'll start with a SPARQL query: here's what one looks like. It's kind of like SQL, but instead of asking for structured data from a database schema, it queries an RDF datagraph for facts. It can even do this across multiple datagraphs at the same time, which means that creating complex queries across disconnected data sources can be very simple. Facts are always stored as triples, so systems can always see the relationships, even when they've never seen these particular data structures before.

Take a closer look at our query: it's pulling in schemas and data from multiple unaffiliated sources in the same query. Also note that we don't have to know how this data is structured: we only have to know what vocabulary a SPARQL endpoint is using to get any information it contains. There are already plenty of standards for RDF vocabularies for describing particular knowledge domains or relationships, so this isn't usually a problem.

After our SPARQL query is written, it's saved as a node. It acts a bit like a saved search. Here's the output of our query, which is pulling in some information about countries of the world from Wikipedia.

To mash our data up with something, we'll need another data source to mash it up with. We've got a lot of options for this. There's plenty of RDF feed types that can be turned into viewable data, for example. And many kinds of data, such as Google Spreadsheets, also export data as XML or JSON, which is an information description scheme in Javascript. All of these can be imported into Drupal RDF applications.

For our first mashup, let's see what's going on in Boston during Drupalcon. For that, we've pulled in an RSS feed of events from a local website and converted it into to JSON. We've also got some friends in Boston we'd like to catch up with that we keep track of in a private spreadsheet on Google Docs.

To make our mashup, we'll create an Exhibit node. These nodes pull data into Exhibit, a free RDF visualizer from MIT's SIMILE project. Note that if we have private data from a Google spreadsheet or a custom JSON source, we can mix that data in with public data all into one mashup. Exhibit would show us plenty of interesting data without any configuration at all, but asking it to display a few relationships will show the power of the system. These two lines, for example, provide a text-based search and a browse by date function.

Here's our mashup. On the front page, Exhibit will show us all of the contacts we had in our contact list, but the map is where it starts to get really interesting: here, our two heterogenous datasets, events and contacts, are displayed on the same map at the same time, and we didn't have to do any sort of data normalization. We can easily see where the local events are, and where our friends are in relation to them.

These Royal Scots guards look interesting, and our friend Emelia is right here nearby: maybe she'd like to come, too.

Of course, if we only wanted to see what's happening tomorrow, we could do that with a click, too - it's pretty simple. It seems there's a lot of music to pick from in Boston, depending on our mood.

Here's another mashup, this time mixing three data sources: our original SPARQL query with country code and time code information, some information about Drupalcon Boston attendees, and some information about Drupal.org module maintainers.

This mashup can tell us a lot about Drupalcon attendees. We see a list of companies that people work for, and we can see who they've sent to the conference. We can also see what countries the attendees come from. But Exhibit does another neat thing: it doesn't really care what we're mashing up here. To Exhibit, this is just as much a mashup about modules as it is about people. That means Exhibit can show us some interesting things; for example, here's a list of modules created by Germans that are attending Drupalcon.

A quick look at this map can show us some other neat information. Our SPARQL query to the semantic version of Wikipedia has returned some attendees' home country information, such as capitals, geocoding information, and timezones. With Boston being at UTC-5, it's easy to find the most jet-lagged people at this conference: that would be these guys over here.

Remember, this information is stored semantically, so the system understood these facts before we showed them to you on a map. The more data sources we have, the more complicated and interesting the results of our queries can be. And there's no shortage of publicly-accessible data on the web and there's more being added every single day.