Importing catalogues
We need to import arbritary catalogues (spreadsheets) into RDFable Drupal nodes.
We suggest the following as a way forward: before implementing this, we want to see if this would be useful to the broader Drupal RDF community, and whether it is compatible with other existing proposals:
Option 1: Dynamically create content types
For each import format dynamically create a node type that matches the field structure. Provide an RDF mapping for the created node type to enable the imported items to be tagged up with RDF.
Option 2: Define a generic content type for imports
Spreadsheets are imported into two node types. The 'header' includes the spreadsheet file and information about it's structure. A second content type is used to import each row. A row is a list of attributes taken from the column data for that row.
We also need to consider round-tripping issues, as follows:
RDF round trip
Need to produce RDF for all the nodes in the database, but also need to ingest RDF data. We want to query/browse this RDF data.
Drupal objects -> RDF
This is dealt with by the RDF api in D7, and mappings created from our imported nodes.
RDF -> Drupal objects
Two cases:
1) We have enough RDF data about a particular RDF subject to map to one of our Drupal node types then we import and create a node of that type.
2) We don't have enough, or we have too much RDF. We generate a generic RDF "placeholder" node for this. This is basically a property list (a list of properties and values) in a format that maps in a straight-forward way to and from RDF.
The placeholder nodes exist to bring external RDF descriptions into the Drupal world to allow a common consistant view of everything in the system from a programming view point, they are not intended to be exposed to end users. When they are required by end users a new Drupal node type can be created as necessary.
Querying, browsing and editing
Views integration will allow us to define interface to query and browse the RDF data, for our defined node types and the RDF placeholder nodes. Nodes will be editable by the standard Drupal mechanisms, but editing of placeholder nodes is a developer/debugging feature only and not intended as a feature for end users.
Details remaining to be worked out
Since we are using an object-oriented structure, the natural model for this structure is RDF quads, as opposed to triples. The mapping of RDF quads to and from CCK nodes needs to be considered carefully. The assignment of URIs for objects also needs careful consideration: we need to liase with other Drupal RDF developers on this.
Comments
RDF -> Drupal objects
I have been looking at this recently.
Placeholder nodes seem a sensible idea. Depending on your needs. I have set up nodes with a title and one field which is the URI of the object we need.
I then make various queries in hook_node_view and add items to the $node->content array. This requires some serious caching to keep it performing as SPARQL queries can be slow, but it keeps the system very flexible. It also keeps the node data only as old as the cache.
I have been working on a demonstrator app using this process and will hopefully be publishing this in the next couple of weeks.
Sounds good
Jeremy,
Looking forward to checking this out. I've done a little work on this kind of thing too. In my case I'm only running a single query to get all the properties for the URI saved with a node. Sounds like what you're doing may be more interesting.
Are you working with Drupal 6 or 7?
D7
Working with D7. Generally without issue. The RDF hooks work well. I tend to reinstall my module to update the RDF mappings, which is a little annoying. I am sure there is a better way but I'm not quite sure what it is.
My biggest frustration with D7 is the lack of prior art. For example I know that I am doing something a little weird when putting fields into $node->content but can't find a better way to do it. I guess that is part and parcel of being an early adopter.
"the lack of prior art" :)
This sounds similar to what I'm working on. Would you please post your code?
This is somewhat similar to
This is somewhat similar to what Lin does in sparql_views (pulling data via SPARQL and displaying it as a view) and different from RDF SPARQL proxy which stores data in nodes (requiring to have a node structure matching the data schema you're importing). I was talking about their pros and cons with Lin recently, we found out they each had their advantages/disadvantages depending on the use case.
rdfx should make this easier if you want to keep your mapping in files à la views or features, but for now you can run
<?php
$mapping_info = array('mapping' => $rdf_mapping, 'type' => 'node', 'bundle' => 'your_bundle_name');
rdf_mapping_save($mapping_info);
?>
We have a nice new Field API in core, why don't create a uri field and hide it in the display settings?
D7 Field API
I haven't worked with D7 yet, but I recall reading about the new Field API opening possibilities for remote data as fields. Looking briefly at the documentation for the CRUD API and Storage API it seems quite possible. Has anyone experimented with setting up fields to represent remote RDF properties?
For the SPARQL contrib module
For the SPARQL contrib module I'm working on a SPARQL 1.1 interface with drivers to query and update of RDF endpoints over HTTP, allowing to store RDF in any store of your choice (because its on HTTP, it does not matter whether its local or remote). You could also swap out the sql_storage and rely entirely on RDF for storing your data if you like. The examiner.com folks are doing something similar with their NOSQL MongoDB DBTNG driver. I'm hoping to have some preliminary demos of the interface ready for DrupalCon CPH.
Sparql Fields
Having sparql for the Field storage sounds very interesting, it was somthing I considered in the early days when looking building my demo. As I wasn't too familiar with the Field API, I decided to forgo that in favour of getting something working quickly.
One thing you may find is that ARC2 dosn't understand some of the SPARQL 1.1 concepts. There is also some disparity between engines with some of the syntax virtuoso and 4store handle grouping in different ways for example.
SPARQL views looks very exciting, It is on my list of things to look at.
That would be an interesting thing to hear. I can see arguments of speed over persistance.
One of the things I am looking at now with regards to speed, is using squid to cache outgoing sparql requests, this will help a little with speed and scalability. I have varnish working for this using some url rewriting but this is not very flexible I have to reconfigure the server every time I add a new endpoint.
Squid is designed to work as a proxy so should be easier to get working.
One thing you may find is
I'm well aware of that, that's why I started working on an abstraction layer with RDF store specific drivers to work around these disparities.
Very nice. I had thought that
Very nice. I had thought that such a thing was needed. Will it work in a similar way to the D7 query api?
yes, a DBTNG for RDF. It
yes, a DBTNG for RDF. It should be generic enough to be used outside Drupal too as a PHP library.
mongodb_dbtng
Hi scor,
Do you know what happened to mongodb_dbtng which was being developed for examiner website?
Is your dbtng library similar to that?
Since you are way ahead on the curve, do you know about performance and ease of use differences between 4store, ARC, or other graph dbs which work with php.
thanks
I'm not familiar with the
I'm not familiar with the progress of mongodb_dbtng, you should ask chx, the maintainer of that module. RDFDB has a different API than the core SQL DBTNG, but it shares the same idea of abtracting the language driving the storage of data (SQL/SPARQL). Re ease of use, from easier to more complex to install and use: ARC2, 4store, virtuoso. There are more but these are the ones I've tried the most.
Virtuoso integration
rdfproxy looks like a much-needed module for working with triple stores. Unfortunately I don't see any support for creating new graphs or inserting data into them. Have I missed something?
Has anyone had success inserting data into a Virtuoso store via HTTP? Using a simple Drupal wrapper module to proxy the communication with the triple store, I can only insert up to 5 or 6 nodes at a time via HTTP get requests (probably due to the excessively long URL query strings). WebDAV seemed to be the way to go, but in this case I receive a "HTTP/1.1 409 Conflict" response from Virtuoso. Hopefully someone else here has been more successful?
RDFDB
That's what the RDFDB library is for, storing RDF data in any RDF store (currently supports Virtuoso, 4store and ARC2). The sparql_ep module will relies on RDFDB in its Drupal 7 version.
Great - it looks like just
Great - it looks like just what I was looking for! Would it make sense to advertise the Drupal module at http://drupal.org/project/rdfdb so others like me could find it more easily?
I'll go check out rdfdb now, interested to see if it could be ported to Drupal 6.x for the site we're currently developing.
cheers
Importing events...
Using Drupal 6.19
Guys, I am not sure if I am posting this question properly or if this is the right place to write it, any help would be greatly appreciated.
I already have events created by the site users.
Problem: I do not know how to "bring"/"import" events from 3rd party sites such community organizations and so forth. Once in the database, i would map them just like the other events in the system. How do I bring events from sites like http://www.cancer.org/.
Thanks,
JP