Importing and manipulating the (semantic) data in Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Deno's picture

Dear members of the Semantic Web & Views development,

I have been playing with the idea of using Drupal as rapid development environment for (semantic) web applications, rather than CMS. The idea is not as far-fetched as it may seem at first:

  1. CCK allows me to define a database model. Core functionality in D7.
  2. Feeds parser lets me to pre-fill the data in nodes defined in step 1 - including the possibility to fetch data at regular intervals, update nodes and drop old nodes.
  3. RDF-related functionality allows me to associate the data saved in pre-defined nodes with ontology terms.
  4. Views allow mixing of data from different node types
  5. It is also possible to programatically define data (e.g. automatically fill in a field) both in the node and in the View
  6. Finally, there are plenty of modules that let me do just about anything with the data once I have it in nodes & views. Some exampels include e.g. visualization on a map, and setting up a web server (SOAP, REST, ...) that exposes my data to the outside world.

My experiments were partially with D6 and partially with D7, mainly because some of the modules I wanted to try do not exist in D7 version (yet).

The main issue I have is with feeding the data from view to a node. Lin Clark has explained how to do this with SPARQL Views => PHP array => Feeds => nodes workflow, but this unfortunately only works with some obscure alpha version of feeds for D6. I am sure it is possible to expose view data as e.g. RDF feed, and then fill in the nodes using e.g. Xpath or Querypath Views parser, but this appears quite ineficcient to me. Since nodes are the main Drupal data type ("entities" in D7?), not being able to save data to nodes is really annoying.

What bothers me is that this (view => node) seems like a very elementary and often needed functionality to me - so why isn't it implemented? Am I missing something obvious, such as "hey buddy, nodes aren't needed at all, jou can do it all with views", or "there is a simple & working way to store view content in nodes"?

Secondary issue is with foreign keys (node references) that I need to link two node types in a view => in D6 at least I wasn't able to get this working automatically with Feeds, but it's quite easy to calculate the nid of the foreign key at node generation time, so this is a minor issue. It would be even better if I could link two node types in a view using a common taxonomy term, or even better using the RDF tags associated with node fields in addition to node references.

thx in advance for all ideas & suggestions. If this is a FAQ, I would be quite happy to RFD, but I didn't not find anything in this direction so far...

cheers
Denis

Comments

http://esciencenews.com/

ben.hamelin's picture

I ran across this site last year in the case studies. I have no real programatic response to your post, just thought you might be interested to see this site which I believe was designed and implemented in a fashion similar to that you have described.

http://esciencenews.com/

Might be worth the networking opportunity at least to discuss with the developer.

Good luck!
bh

Hey buddy, it's not clear

milesw's picture

Hey buddy, it's not clear what you're trying to do. ;)

It sounds like you're trying to import existing semantic content? Are you getting it from an endpoint using SPARQL views?

1: Nodes (or entities) are designed to store content, and Views is designed to display that content. Views is not meant to be a content creation tool, it's meant to output existing content in crazy ways. Because Views is so awesome, people have managed to bend it to do all kinds of neat tricks (SPARQL Views).

2: Not sure what you mean when you say "link two node types". Are you talking about a SPARQL view? A node view?

Just an experiment...

Deno's picture

Hi Miles,

I have been using Drupal as CMS for years, and recently realized that this is just a tip of the iceberg. I'm quite active in research projects that deal with environmental informatics and informatic support for the crisis management. Much of the software used in these projects are prototypes that one trows away afterwards, and that basically act as a "glue" that takes data from one source, processes it and lets the next service in the chain use the results. They don't need to work very reliably or be very performant, but we need to build them quickly. Possibility to importing existing content (semantic or not) is needed all the time, and so is the possibility to access the information through various web service interfaces. Other stuff that is inherently present in Drupal (various ways to present data and interact with users - especially the possibility to annotate the data with RDF) is very usefull too.

I believe that one can easily produce this kind of prototypic software using drupal core plus a couple of exotic modules. If I can prove this to work well enough for our needs, it should be possible to put part of the project budget towards improving of the modules I need in the future - for now this is just my hobby.

Therefore, I decided to set up two experimental sites (one D6 and one D7), and played with them at home during last month or so. Among other things, I have imported some content from DBpedia using SPARQL views, and stored it in nodes. I also imported some content from GBIF data site using Xpath feeds parser, and stored this one in another type in nodes. Then I combined the two node types using views, and experimented with as you put it "outputing in crazy ways". You can see the basic ideas working on my D6 experimental site:

  • First node on this page is of the "animal description" page, imported from DBpedia through SPARQL Views => PHP array => Feeds. Others are "animal observation" type, imported from GBIF data site using Xpath feeds parser: http://www.havlik.org/d6/?q=taxonomy/term/28

  • This page is a tabular presentation of the SPARQL view: http://www.havlik.org/d6/?q=cats_info
    Last two rows are calculated using the "Customfield: PHP code" (don't see how to get the "href" part of the html link without it).

  • This page shows the data from another view which combines the data from both node types (english name is from "animal description"):
    http://www.havlik.org/d6/?q=cats_table

I hope I'm a bit clearer now. :-)

Basically I'm very happy with the import-functionality shown above, but I don't see how to do it on D7 site. And even on D6 site the SPARQL import (in fact the PHP array feeds parser) only works with alpha16 version of the feeds, whereas the XPath feeds parser requires the latest version of the module.

Even if we forget the whole SPARQL part - a possibility to produce a custom data set (some parts imported from elsewehere, others calculated on the fly, third submitted by users to the site...) using views and then save this data to a node is IMO very convenient, so I am curious if there are any good alternatives to php-array approach. Exposing the view as XML and parsing it with XPath feeds parser does not look very elegant...

Sorry, I may be a little slow

milesw's picture

Sorry, I may be a little slow here. It's clear now what you've attempted so far, but I'm still not sure what you're trying to accomplish in the end. I think what's most confusing to me is the way you're talking about the role of Views.

Finally, there are plenty of modules that let me do just about anything with the data once I have it in nodes & views.

Even if we forget the whole SPARQL part - a possibility to produce a custom data set (some parts imported from elsewehere, others calculated on the fly, third submitted by users to the site...) using views and then save this data to a node is IMO very convenient, so I am curious if there are any good alternatives to php-array approach. Exposing the view as XML and parsing it with XPath feeds parser does not look very elegant...

As I mentioned before, Views is really designed to display content already stored within the Drupal ecosystem (typically as nodes). Only recently, with Views 3, has the concept of pluggable backends come into the picture, which makes things like SPARQL Views possible. So to answer your question "What bothers me is that this (view => node) seems like a very elementary and often needed functionality to me - so why isn't it implemented?"...because that's the complete opposite of what it's designed to do.

It seems you're familiar with Feeds already. With Feeds you can write plugins to import from just about any type of data source. Once you have all your content saved nodes, then you can output in any format you want using Views. Again, I'm not sure what end result you're aiming for.

A while back I wrote a Feeds-based D6 module for importing RDF and creating nodes. It's pretty rough, but might be worth a shot: http://drupal.org/sandbox/milesw/1085078. Screencast #2 gives a good outline of how it works.

RDF importer

Deno's picture

Dear Miles,

Your RDF importer could indeed be a part of the puzzle. I'll take a look at it, thx. Frankly I'm not convinced that SPARQL (or any other) import should be realized in views if we have Feeds for that, but this is how SPARQL views and VARQL work today.

I still believe that Views is (even if it wasn't initially meant or designed to be) more than just a convenient way to show data one already has in the system, and that saving a view to a node is very convenient. Thanks to your posts, I know also understand why this hasn't been done so far - the important part is that I don't need to look for existing solutions since none exist today.

As for the "end result", that's difficult to say. As mentioned, I'm not after the idea of building a particular web site, but more after the idea of having an environment that allows me to quickly build service prototypes. additional (web)GUI interface for humans is a nice to have, and helps in interaction with the users.

regards
Denis

Miles, this is fantastic!

johngriffin's picture

Miles, this is fantastic! Just what I was looking for. I've found that the RDFImporter feeds plugins work well and allow me to import and sync nodes from RDF.

John Griffin
http://atchai.com

Glad to hear it. Feel free to

milesw's picture

Glad to hear it. Feel free to post any issues or ideas in the queue. After hardly looking at the module for a few months, I'm starting to get interested in developing it further. There's a lot of room for improvement.

Yesterday I discovered there's a patch for Feeds that adds a node reference mapper. Using this mapper you can effectively preserve the relationships between RDF resources when importing them as nodes. I'm hoping to put together a new screencast next week.

That sounds really cool,

linclark.research's picture

That sounds really cool, looking forward to checking this out.

New screencasts

milesw's picture

I uploaded a couple new screencasts. The first one outlines how to import from a SPARQL endpoint and how to optimize things for recurring imports.

The second one demonstrates how to map special types of content (dates, links, files etc.) using the mappers included with Feeds. It also demonstrates the nodereference mapper I mentioned before.

Links to all screencasts are available on the RDFimporter sandbox page.

RDF SPARQL Proxy

johngriffin's picture

Hi Denis,

I have a very similar need - to create nodes in Drupal that are to be synchronised with data in a triple store. We'd like to run a sparql query that will bulk create these nodes, and have some mechanism for keeping them synchronised. I thought I'd just point you in the direction of:

http://drupal.org/project/rdfproxy

Haven't tried it yet but I'm about to! Let me know how you get on and feel free to get in touch as I'll be working on this over the next week.

John

John Griffin
http://atchai.com

Excellent example tutorial

Jimmel's picture

Excellent work Miles. I like what your doing. Its only by giving clear concise tutorials using tangible examples that the semantic web will be taken up by those not completely immersed in semantic web technology. Its good to see someone attempt to simplify the theory. Seeing this working example has encouraged me to think it is worth delving deeper into the subject and allocating precise time to it. Hopefully it will encourage more people to become interested. Keep up the good work.

rdfimporter

cimlvl's picture

Excellent tutorial from Miles. This really shows the value of the module.
The next step is real time integration (defining a query as data source which is executed on request from the user providing real time data).

Greets,

Luc

If you are looking to do a

linclark.research's picture

If you are looking to do a live query on a datasource, I'm not sure that Feeds is the sub-system to build upon. It does a great job of syncing content to a site, but AFAIK is really meant for just that—storing a local version of content from a feed or file. When it checks for updates, it will either delete all existing local content and reload or search in the database for the pieces of content it is updating. As such, it wouldn't have good performance for live querying without some modification.

To run live queries, the Drupal sub-system most suited is Views. SPARQL Views enables you to do that... and Miles has a little mistake in his module's description, the Drupal 6 version of SPARQL Views could query RDF files and RDFa on Web sites as well. The Drupal 7 version can't yet, because it integrates with Remon's SPARQL endpoint registry, but it will be a very simple addition and is on my todo list.

RDFImporter seems to me (and I'm saying from the perspective of someone who hasn't tried it yet) to be a great solution for syncing. To get that effect with SPARQL Views, you need to join Views to Feeds using Feeds View Parser, which isn't really actively maintained... so this is a great addition to the suite of tools. But I think it is probably best to use it strictly for syncing, if I correctly understand the way Feeds works.

Yes, it's good that you make

milesw's picture

Yes, it's good that you make this distinction. Views is definitely more suited to doing real-time. For displaying up-to-date semantic content, SPARQL Views is much more appropriate. In fact, when importing the same set of data, I have a feeling that SPARQL Views + Feeds View Parser is going to be more performant than RDFimporter.

My interest is in content reuse -- one of the promising 'features' of the semantic web. Ideally someone managing content in Drupal should only have to make a few clicks to import some remote RDF with the option of keeping content in sync with its origin. Additionally, in some cases it would be a nice bonus if they could modify the content and send it back to the origin. This is the type of thing I'm interested in building. So far all I've done is taken some twine and strung together ARC2 and the Feeds module.

SPARQL Views is an extremely powerful tool. However, it is very much a developer-focused tool. Familiarity with both SPARQL and Views is not something your average content manager will have. If you're a developer needing to bring semantic content into Drupal, I would suggest you first try SPARQL Views (especially now that I'm reminded you can query RDF and RDFa files).

You're right about

linclark.research's picture

You're right about familiarity with SPARQL being a prereq for the D6 version, but it's actually not the case in the D7 version. I haven't had a chance to document it properly yet because I just started porting it a few weeks ago and just stabilized it on Friday, but the D7 version is much much more in line with the Views way of doing things.

Basically, for each endpoint you have field options (just like you do for node fields and field API fields in Drupal) and you just select the fields you want and apply filters and arguments. People can contribute different Views data definitions for different endpoints or people can make their own using the Field API.

It's a major shift from the way things were done in D6. I hope to have a new screencast out about it in 2 weeks. It will go into alpha within the next week (it could now, but I need some changes to go into RDF UI and SPARQL first).

i'm on a similar track

svild's picture

i tried 2 years ago to make a "semantic" library about a collection of old theatrized fairytales on LPs etc, having full relations between the notions (artists, authors, performances/pieces, issues, labels, etc), all over a moinmoin wiki, and i managed to do it although with quite a few hacks (python is easy, moin is not). Now as it has grown, i'm looking into reworking it into a less-demanding and more-social library of metadata + integration with the actual audio recordings + images (kept on a separate statically generated page), with all them ratings, tweets, who else likes this, what else he likes, other related stuff (texts, products, whatever) etc.etc. The wiki gave me freedom to type anything in but the semantic is still for human-eyes-only.

so far i see drupal7 is probably the way to go, still i want to ask does it have these things: a correct list of who-points-to-me (i.e. backref relation), inclusion of other nodes ito a node, some way to use abbrevations (so people type A.Miln and it appears AlanMiln), eventualy a chooser from them abbreviations/available items to reduce mistakes. Metadata-schema is my own plain-text (which i can parse into anything).

if u want to see the wiki, it's at www.svilendobrev.com/gramofonche, and the (static) list (with ~same metadata) at www.svilendobrev.com/detski/zvuk/ (though it's all in bulgarian. Eventualy my whole site would go into drupal from its current static multilang form.)

ciao (sorry if this isnt right place to ask)

SvilenDobrev.com - software by people, for people

If you are looking for a

linclark.research's picture

If you are looking for a Semantic Wiki, I would actually suggest something like Semantic MediaWiki. The RDF in Drupal core doesn't have support for RDF in free text, it only produces RDF tags around fields.

Does this RDF importer works on Drupal 7.0 ?

sharsha15's picture

Please help me here. Are they specific to only version 6.0

How to enable RDF UI and SPARQL on my Drupal 7 site.

sharsha15's picture

I am new to DRUPAL programming.

Can some one help me how can I load SPARQL on my DRUPAL 7 site in EC2 and connect to the RDF repository on my local desktop. And how can I do querrying in DRUPAL 7 site.

Any tutorial or hints can help me do this.

Thanks & regards,
-Solomon

Views Developers

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: