Data Visualization for RDF data

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
linclark.research's picture

I know I'm late on this, but I just read "If She Can Do It, I Can Too" a few days ago and was inspired. Looking for suggestions (even if it is a module that already does this) and a mentor. Cheers!

Synopsis
With the inclusion of RDF in Drupal 7 core, Drupal is being widely hailed as the best upcoming tool for producing content for the Semantic Web. However, for people to understand the power of the Semantic Web, we need to help them use the data that all of these Drupal 7 sites will be producing. Proprietary tools—such as Microsoft Pivot and the Adobe Flex—provide sleek visualizations of datasets. With this project, I aim to bring this kinds of advanced data visualization—with zoomable, data rich user interfaces—to Drupal. With these kinds of interfaces, Drupal will not only be the best way to produce content for the Semantic Web, but also the best way to consume that content and turn the data into something really interesting and understandable.

Project

Just as Views allows the user to create queries on their database and then specify how they should appear (grouped by a certain field, navigated with attachment views, etc), I would like to allow users to create queries on the 'database' of the Web, the Linked Data Cloud, and then visualize those queries in a very dynamic manner.

This project will have two parts. The part that drives development of this project is the data visualizer. While there are data visualizers freely available, such as the Google Visualization API, for richer data visualization many developers turn to proprietary or quasi-open source tools, such as Adobe Flex or, more recently, Pivot. However, these solutions require developer knowledge and exclude many content producers from creating rich data visualizations. In some cases, these solutions even exclude some users from being able to view the data.

At the same time, jQuery and CSS3 properties such as transform and animate can be used to achieve much the same rich interactive effect. These technologies have the advantage of being truly open standards with which many hackers are familiar and which can be easily analyzed in the browser. And because they work directly with page content that is in the DOM, customary Drupal theming conventions can be followed to change the visualization.

But to get the data in the first place, users need a user interface for creating SPARQL queries. SPARQL is a query language similar to SQL that is used to query RDF. As a language, SPARQL can be challenging for users unfamiliar with the Semantic Web to understand. However, to actually do interesting things with the Semantic Web you have to use SPARQL queries, so it becomes a chicken and egg scenario; people don't know how to use SPARQL, so they don't grok the Semantic Web, which means they can't figure out SPARQL.

One of the challenges of SPARQL from a usability perspective is that you aren't pulling information from data that is structured in a fixed schema. The data can be structured in an infinite variety of ways. Because of this, any interface must allow the user to explore the data structure of the dataset(s) being used and help the user understand how queries can be constructed, how you can join datasets, etc.

There are a number of existing query builders for SPARQL queries, or for other kinds of queries across federated data, including Yahoo! Pipes, Semantic Web Pipes, and MashQL. To generate the SPARQL query, I would either integrate Drupal with one of these existing interfaces or create a SPARQL query building UI in Drupal.

A good user interface for SPARQL queries would be helpful not only for a data visualization module, but could also be used for things such as the existing RDF Proxy module, which can be used to sync nodes with external datasources. With a good, intuitive SPARQL query builder, we could unlock the power of the Semantic Web for a lot of end-users.

Timeline

* May 24 - 31st - explore Views API and general best practices, explore Yahoo! Pipes and MashQL functionality
* June 1 - 21 - Code basic visualization features, test with raw SPARQL query and/or Views on internal database
* June 21 - July 12 - Code SPARQL query building interface, or integrate with existing interface
* July 12 - submit midterm
* July 16 - August 2 - Feed data from SPARQL query builder into database and expose to Views
* August 3 - August 16 - Test with example use cases, explore possibilities for further development
* August 17 - August 20 - polish up and submit final report 

Note: As a research Masters student, I have to maintain 20 hours per week of project time for my adviser over summer. I will also be presenting about Drupal at SemTech the week of June 21.

Mentors

Stella Power (stella) and Laura Scott (laura s)

Contact

lin.clark@deri.org
linclark on IRC

Difficulty

Medium

Biography

I am a Masters student at DERI Galway, the world's largest Semantic Web Research Institute. I started programming in high school, first learning Visual Basic and then teaching myself C++ and jumping into the Advanced Placement OO Programming course. I received my BFA from Carnegie Mellon University, majoring in Communication Design with a focus on interaction and exhibit design. While at Carnegie Mellon, I studied Java in the Intermediate/Advanced Programming course in the School of Computer Science.

I worked as a Web developer at the University of Pittsburgh, which is how I discovered Drupal. In the same position, I discovered RDF and the Semantic Web. I have become a passionate supporter of both and am actively involved in both communities.

I have contributed patches to Drupal 7 core and to a few contributed modules, as well as documentation edits and new docs pages. Most recently, I have been producing screencasts for Drupal 7. I strongly believe that the power of the Web is in the ability for non-developers to hack and create tools for managing, manipulating, and communicating their own information. Because the growth of the Web depends on hackability, I support open standards such as HTML, JavaScript, CSS, and RDF.

Comments

Just as Views allows the user

dawehner's picture

Just as Views allows the user to create queries on their database

Sure thats true. But with views3 you can define a new query backend. There are existing ones for apachesolr, flickr or twitter.

But you could write a sparql implementation. How awesome is this! You can query the web with views. The interface you describe is views.

But this would be definitive quite a challenge to bring the extra UI for browsing the data into views.

If you have written a views query backend there is a gsoc project to bring any kind of views data into feeds, and feeds can map and import it to any database table.

If you do it via views i could help to co-mentor.

Thanks for the suggestion...

linclark.research's picture

Thanks for the suggestion... I actually didn't realize that Views 3 allowed you to do that, I'm going to have to play with it more (and possibly screencast about it). Do you have any favorite blog posts or other resources on the topic?

I would be happy to explore whether you could create a sensible UI for SPARQL queries in Views. I would prefer to piggyback off of existing, widely used modules as much as possible. I wouldn't be ready to commit fully to Views, though, unless I could develop a solid user interaction in Views UI that really helps people understand the SPARQL query they are constructing. I really haven't played with the Views hooks, so I don't know the full possibilities yet.

The pluggable query part is

dawehner's picture

The pluggable query part is not really documented.

The documentation are examples availible:
One example is http://drupalcode.org/viewvc/drupal/contributions/modules/twitter_views/...

Its not that easy to understand but i would show two central functions:
query and execute.

Query builds the query.
Execute executes the query gets the data and store the data for views, to continue.

But this part is all code based... you sadly need some basic views coding knowledge, which you can find in views/help/api-*.html

I would be happy to explore whether you could create a sensible UI for SPARQL queries in Views. I would prefer to piggyback off of existing, widely used modules as much as possible. I wouldn't be ready to commit fully to Views, though, unless I could develop a solid user interaction in Views UI that really helps people understand the SPARQL query they are constructing. I really haven't played with the Views hooks, so I don't know the full possibilities yet.

I quick looked at the existing rdf module, which has a views integration. This has a views integration for the rdf database tables. Thats not what you want.
But it has some mechanism to query external websites. I'm not sure how it works.

Sadly the views ui is currently not really extendable, but think you would get stuff like

  • Styles
  • Caching
  • "Easy" Theming
  • The users would see something familiar
  • and so on.

Do you have any ideas already how a ui for browsing data could look like.
Perhaps it would be possible to use views here, too.
The user starts with setting a url, where the SPARQL should query. Then they run preview and they will get some fields.
After this they could add fields filters etc. and they see the new returned data. Thats the current workflow for many views users already.

The ideas I have for browsing

linclark.research's picture

The ideas I have for browsing the UI are influenced by Microsoft's Pivot (http://blog.ted.com/2010/02/demo_where_to_g.php) and other tools created in Flash. I wouldn't be able to create anything nearly that fully featured, but I think even some of the basic features like a zoomable UI and graphs that actually show images and text as datapoints would really add to Drupal's data visualization capability if it is implemented with open standards such as HTML, jQuery and CSS so it is hackable by Drupal devs.

For the SPARQL query building part, one of my advisers pointed out other tools in this space, including one created here at DERI called Semantic Web Pipes. It was based on Yahoo! Pipes, which I think has a good user interaction for this kind of query building. This got me thinking, it might make sense to do what the Yahoo! Pipes module suggests as a next step... to create a database table for the imported data (but not as a node, that kind of RDF import is handled by RDF Proxy) and then to use Views Schema or something similar to allow the user to work with the data in Views. I would need to figure out how to refresh the table, particularly if the structure of the imported data changed.

This would give me the benefit of the easy caching and the users seeing something familiar when arranging the layout of their items. It would also provide a unified interface for the data visualizer, so that you could provide the same kind of visualization of data in the database. I think the easy theming would come regardless of whether I integrate with Views or use/build a completely independent query builder, as would styles.

I'd love to see a sparql

fago's picture

I'd love to see a sparql query backend for views too (=without any interim db table). That way you get all the awesome features of views out of the box for querying sparql endpoints! Probably one needs to elaborate to find the best way to map views sql centric paradigm to sparql, but I do think it would be a very powerful solution. Maybe that way it would be even possible to such great data visualisation stuff as generic views plugins?

Currently views fetches the

dawehner's picture

Currently views fetches the "table" definition and caches it. So there is no real way of doing it dynamic.

You could remove the cache in your plugin, but we need to figure out the exact same problem on the mongodb views backed, (and i try to help on both) i will figure something out :)

Indeed, that would be

fago's picture

Indeed, that would be cool.

For sparql maybe one could just insert all known predicates as fields and join to different graphs via relations. That could work if we have used vocabularies of a graph configured somewhere. But of course the dynamic extension / addition of a vocabulary would be nice.

I'm not sure how the user

linclark.research's picture

I'm not sure how the user interface for this would work in Views. For instance, if you are using DBpedia as one of your sources, there are 8,000 predicates to choose from. If you can't change the way that the Views UI works, then I'm not sure how to help the user make sense of 8,000 predicates.

If anyone does have any ideas for how the user interaction would work, I would be glad to hear them because it would be great to use Views.

daunting task

eMPee584's picture

now if only those predicates would have other dimensions of categorization besides rdf:type and rdfs:domain some sort of AJAX swarming could be done.. Maybe a kind of graph distance evaluation (examining a focused predicate and nearby ones, up to a distance of n hops) ?
...
What's your idea on managing this sort of complexity? This key question is not constrained to the views.module context, and surely one of the most difficult of your metier: how to not overkill the user with all available information ^^

"Obstacles are those frightful things you see when you take your eyes off your goal." -- Henry Ford (1863-1947)

I'm not sure, I would need to

linclark.research's picture

I'm not sure, I would need to explore different options.

I think it could be helpful as the user explores to show examples from the dataset that are prototypical of a class or property. This would be hard to do for DBpedia, but smaller datasets can use the void vocabulary to specify example resources.

There was also a prototype of a tool developed by scor and a fellow researcher at DERI, Renaud Delbru, that was an AJAX text field that would autocomplete a class or property, searching for the string in both the term label and in the description. That could be really helpful too.

But I really do need to do more exploration into this, so any suggestions of successful examples of this kind of user interaction or related research would be more than welcome.

Update?

discursives's picture

How are things looking on this for you Lin?

I am a bit starry-eyed with the Views3 options being tossed around.

I've been consulting as a renegade information architect for clients who want to use Drupal and I've found the process of watching clients grab hold of their 'web operating system' to be one of the more fruitful exercises I've enjoyed.

I have seen that the ability to put the steering wheel into the hands of the user is the most important, but it always seems that creating a customize cockpit for each client is cost prohibitive.

As interfaces become more tactile and screen real estate is remapped to heavier tactile interaction it seems readily apparent to me that users will be more able to jump into complex query construction simply because they will have a more immediate familiarity with their environment.

Imagine that a user is introduced to the query space through a gradual tactile process. They can choose an icon to represent websites containing data they need, and another to show sites without that info (but still in the search set, for example.) In the process they are teaching themselves that they need to include and exclude, and all the while they are creating their own interface.

Swinging arms, personal icons, and custom spectrums that represent the types of ranges you see in the pivot demo aren't so far away. At the moment a step by step outlining some of the basics needed to perform a successful SPARQL query is a good start.

Alex Rollin
http://alexrollin.com

The project has been accepted

linclark.research's picture

The project has been accepted and work starts next week. I haven't yet figured out precisely which direction to go, but will post here in the next week or so with a URL where people can track the work and give input. Thanks for the interest :)

Cheers,
Lin

Congratulations

milesw's picture

I'm extremely excited to see what comes out of this project. There is a huge need for these kind of tools, both inside and outside the Drupal domain, and while I probably can't contribute much in the way of programming, I'd like to offer any assistance I can here. Congrats!

Wiki page

linclark.research's picture

I have added my progress page.

http://groups.drupal.org/node/72888

Have you considered configurable HTML5 views

MehmetOrun's picture

HTML5 has great options to visualize RDF data similar to what Pivot does.

If you use the nested block option (I recommend not visualizing more than 2 levels deep), then you can look at showing related nodes as the next navigation path as an attributes accessible by each block.
http://thejit.org/static/v20/Jit/Examples/Treemap/example1.html

Alternately, you can use the navigatable graph view of HTML 5 that would provide nice options.
http://thejit.org/static/v20/Jit/Examples/Hypertree/example1.html

I found this to be nice visualization techniques for information without over crowding the field. By having navigation controls, you can limit the amount of information I believe, similar to how Pivot does it with Silverlight. I haven't had time to dabble with making this a PHP configurable/Drupal DB config add-on but thought I would share it for your consideration.

In the mean time, I will follow the progress of your project with interest :)

Google Summer of Code 2010

Group organizers

Group categories

Important Announcement

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: