Importing and round-tripping RDF in Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Neil A Harris's picture

Importing catalogues

We need to import arbritary catalogues (spreadsheets) into RDFable Drupal nodes.

We suggest the following as a way forward: before implementing this, we want to see if this would be useful to the broader Drupal RDF community, and whether it is compatible with other existing proposals:

Option 1: Dynamically create content types

For each import format dynamically create a node type that matches the field structure. Provide an RDF mapping for the created node type to enable the imported items to be tagged up with RDF.

Option 2: Define a generic content type for imports

Spreadsheets are imported into two node types. The 'header' includes the spreadsheet file and information about it's structure. A second content type is used to import each row. A row is a list of attributes taken from the column data for that row.

We also need to consider round-tripping issues, as follows:

RDF round trip

Need to produce RDF for all the nodes in the database, but also need to ingest RDF data. We want to query/browse this RDF data.

Drupal objects -> RDF

This is dealt with by the RDF api in D7, and mappings created from our imported nodes.

RDF -> Drupal objects

Two cases:

1) We have enough RDF data about a particular RDF subject to map to one of our Drupal node types then we import and create a node of that type.

2) We don't have enough, or we have too much RDF. We generate a generic RDF "placeholder" node for this. This is basically a property list (a list of properties and values) in a format that maps in a straight-forward way to and from RDF.

The placeholder nodes exist to bring external RDF descriptions into the Drupal world to allow a common consistant view of everything in the system from a programming view point, they are not intended to be exposed to end users. When they are required by end users a new Drupal node type can be created as necessary.

Querying, browsing and editing

Views integration will allow us to define interface to query and browse the RDF data, for our defined node types and the RDF placeholder nodes. Nodes will be editable by the standard Drupal mechanisms, but editing of placeholder nodes is a developer/debugging feature only and not intended as a feature for end users.

Details remaining to be worked out

Since we are using an object-oriented structure, the natural model for this structure is RDF quads, as opposed to triples. The mapping of RDF quads to and from CCK nodes needs to be considered carefully. The assignment of URIs for objects also needs careful consideration: we need to liase with other Drupal RDF developers on this.

Comments

RDF -> Drupal objects

JeremyFrench's picture

I have been looking at this recently.

Placeholder nodes seem a sensible idea. Depending on your needs. I have set up nodes with a title and one field which is the URI of the object we need.

I then make various queries in hook_node_view and add items to the $node->content array. This requires some serious caching to keep it performing as SPARQL queries can be slow, but it keeps the system very flexible. It also keeps the node data only as old as the cache.

I have been working on a demonstrator app using this process and will hopefully be publishing this in the next couple of weeks.

Sounds good

milesw's picture

Jeremy,

Looking forward to checking this out. I've done a little work on this kind of thing too. In my case I'm only running a single query to get all the properties for the URI saved with a node. Sounds like what you're doing may be more interesting.

Are you working with Drupal 6 or 7?

D7

JeremyFrench's picture

Working with D7. Generally without issue. The RDF hooks work well. I tend to reinstall my module to update the RDF mappings, which is a little annoying. I am sure there is a better way but I'm not quite sure what it is.

My biggest frustration with D7 is the lack of prior art. For example I know that I am doing something a little weird when putting fields into $node->content but can't find a better way to do it. I guess that is part and parcel of being an early adopter.

"the lack of prior art" :)

mitchell's picture

This sounds similar to what I'm working on. Would you please post your code?

This is somewhat similar to

scor's picture

This is somewhat similar to what Lin does in sparql_views (pulling data via SPARQL and displaying it as a view) and different from RDF SPARQL proxy which stores data in nodes (requiring to have a node structure matching the data schema you're importing). I was talking about their pros and cons with Lin recently, we found out they each had their advantages/disadvantages depending on the use case.

I tend to reinstall my module to update the RDF mappings, which is a little annoying.

rdfx should make this easier if you want to keep your mapping in files à la views or features, but for now you can run

<?php
    $mapping_info
= array('mapping' => $rdf_mapping, 'type' => 'node', 'bundle' => 'your_bundle_name');
   
rdf_mapping_save($mapping_info);
?>

For example I know that I am doing something a little weird when putting fields into $node->content but can't find a better way to do it.

We have a nice new Field API in core, why don't create a uri field and hide it in the display settings?

D7 Field API

milesw's picture

I haven't worked with D7 yet, but I recall reading about the new Field API opening possibilities for remote data as fields. Looking briefly at the documentation for the CRUD API and Storage API it seems quite possible. Has anyone experimented with setting up fields to represent remote RDF properties?

For the SPARQL contrib module

scor's picture

For the SPARQL contrib module I'm working on a SPARQL 1.1 interface with drivers to query and update of RDF endpoints over HTTP, allowing to store RDF in any store of your choice (because its on HTTP, it does not matter whether its local or remote). You could also swap out the sql_storage and rely entirely on RDF for storing your data if you like. The examiner.com folks are doing something similar with their NOSQL MongoDB DBTNG driver. I'm hoping to have some preliminary demos of the interface ready for DrupalCon CPH.

Sparql Fields

JeremyFrench's picture

We have a nice new Field API in core, why don't create a uri field and hide it in the display settings?

Having sparql for the Field storage sounds very interesting, it was somthing I considered in the early days when looking building my demo. As I wasn't too familiar with the Field API, I decided to forgo that in favour of getting something working quickly.

One thing you may find is that ARC2 dosn't understand some of the SPARQL 1.1 concepts. There is also some disparity between engines with some of the syntax virtuoso and 4store handle grouping in different ways for example.

This is somewhat similar to what Lin does in sparql_views

SPARQL views looks very exciting, It is on my list of things to look at.

I was talking about their pros and cons with Lin recently, we found out they each had their advantages/disadvantages depending on the use case.

That would be an interesting thing to hear. I can see arguments of speed over persistance.

One of the things I am looking at now with regards to speed, is using squid to cache outgoing sparql requests, this will help a little with speed and scalability. I have varnish working for this using some url rewriting but this is not very flexible I have to reconfigure the server every time I add a new endpoint.

Squid is designed to work as a proxy so should be easier to get working.

One thing you may find is

scor's picture

One thing you may find is that ARC2 dosn't understand some of the SPARQL 1.1 concepts. There is also some disparity between engines with some of the syntax virtuoso and 4store handle grouping in different ways for example.

I'm well aware of that, that's why I started working on an abstraction layer with RDF store specific drivers to work around these disparities.

Very nice. I had thought that

JeremyFrench's picture

Very nice. I had thought that such a thing was needed. Will it work in a similar way to the D7 query api?

yes, a DBTNG for RDF. It

scor's picture

yes, a DBTNG for RDF. It should be generic enough to be used outside Drupal too as a PHP library.

mongodb_dbtng

manishgarg's picture

Hi scor,

Do you know what happened to mongodb_dbtng which was being developed for examiner website?
Is your dbtng library similar to that?
Since you are way ahead on the curve, do you know about performance and ease of use differences between 4store, ARC, or other graph dbs which work with php.

thanks

I'm not familiar with the

scor's picture

I'm not familiar with the progress of mongodb_dbtng, you should ask chx, the maintainer of that module. RDFDB has a different API than the core SQL DBTNG, but it shares the same idea of abtracting the language driving the storage of data (SQL/SPARQL). Re ease of use, from easier to more complex to install and use: ARC2, 4store, virtuoso. There are more but these are the ones I've tried the most.

Virtuoso integration

klokie's picture

rdfproxy looks like a much-needed module for working with triple stores. Unfortunately I don't see any support for creating new graphs or inserting data into them. Have I missed something?

Has anyone had success inserting data into a Virtuoso store via HTTP? Using a simple Drupal wrapper module to proxy the communication with the triple store, I can only insert up to 5 or 6 nodes at a time via HTTP get requests (probably due to the excessively long URL query strings). WebDAV seemed to be the way to go, but in this case I receive a "HTTP/1.1 409 Conflict" response from Virtuoso. Hopefully someone else here has been more successful?

RDFDB

scor's picture

That's what the RDFDB library is for, storing RDF data in any RDF store (currently supports Virtuoso, 4store and ARC2). The sparql_ep module will relies on RDFDB in its Drupal 7 version.

Great - it looks like just

klokie's picture

Great - it looks like just what I was looking for! Would it make sense to advertise the Drupal module at http://drupal.org/project/rdfdb so others like me could find it more easily?

I'll go check out rdfdb now, interested to see if it could be ported to Drupal 6.x for the site we're currently developing.

cheers

Importing events...

jp2020's picture

Using Drupal 6.19

Guys, I am not sure if I am posting this question properly or if this is the right place to write it, any help would be greatly appreciated.

I already have events created by the site users.

Problem: I do not know how to "bring"/"import" events from 3rd party sites such community organizations and so forth. Once in the database, i would map them just like the other events in the system. How do I bring events from sites like http://www.cancer.org/.

Thanks,
JP

Semantic Web

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: