RDF right way to fetch wikipedia?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
jospBln's picture

Hi,

before I start with RDF, one question:

I'm searching a way to use short teasers and images of wikipedia articles on my website. I would like to have a small block, where people get a short information around the topic sailing. About different sailing boats, about sails.....

I would like to fetch the image url, and the first maybe 300 words of the article.

Am I right to start with RDF for this or should I use another way?

Best
Joerg

Comments

You can do this in multiple

linclark.research's picture

You can do this in multiple different ways. If you are just looking to pull a few items from the page and you don't need to set up more complex relationships between the information, you may want to consider using YQL.

There is a YQL Query Backend for Views, http://drupal.org/project/yql_views_query. I imagine this would be quite helpful for your use case.

Thanks, sounds great, I'll

jospBln's picture

Thanks, sounds great, I'll have a closer look!

DBpedia

murrayw's picture

I'd recommend taking a look a DBpedia which is is a semantic version of Wikipedia. It contains titles, abstracts, links to images and a whole bunch of other things which you might find of interest.

DBpedia contains information on subjects found in Wikipedia:
http://dbpedia.org/resource/Sailing

It returns different representations depending on content negotiation:
http://dbpedia.org/page/Sailing (HTML - easiest to view)
http://dbpedia.org/data/Sailing.json (JSON - easiest to parse)
http://dbpedia.org/data/Sailing.rdf (XML)
http://dbpedia.org/data/Sailing.n3 (N3)

You can use it as a source to either manually copy the content you want (HTML version) or script the download (JSON) if you have a lot of content you wish to copy.

There are a number of properties of interest:
- rdfs:label
- dbpedia-owl:abstract
- dbpedia-owl:thumbnail
- foaf:depiction

From the sounds of it I reckon that each DBpedia resource would map nicely into Drupal nodes. You could define a "dbp" content type to store this data and then build your Views on top of that. My Uriverse project did something similar.

A couple of quick notes on images. 1. Probably best to download and import into Drupal as image files because you get the benefits of ImageCache. 2. Not all images are CC or public domain in WP and so you need to be careful if you can't rely on fair use as WP does. Older versions of DBpedia had links to the license page on WP but this newer version doesn't seem to. You will have to check the WP page manually to discover that.

Managing Director
Morpht

So after discussing this with

linclark.research's picture

So after discussing this with one of the former DBpedia folks, it looks like YQL might not be as easy as I thought. He pointed out that it is hard to get an abstract from the page on Wikipedia because it might have the table of contents or other things in it.

So DBpedia probably is the best route. However, there are some things to keep in mind.

The general DBpedia public endpoint is only synced with Wikipedia on a sporadic basis, but there is a version that is synced with Wikipedia quite frequently at http://dbpedia-live.openlinksw.com/sparql/.

However, you probably do not want to run the query live on DBpedia because the DBpedia public endpoint isn't reliable to the level that you would want for a production Web site or for a Web application. So to do this, you will want to have synced nodes in your system which can update with the endpoint (either at cron or when you click a button).

I actually just recently did this for the International Semantic Web Conference site at http://iswc2010.semanticweb.org/. I used Views to get the data in and Feeds to create the nodes. You can set the DBpedia URI as the GUID in Feeds, which will make it so nodes can be updated based on any changes in the original data.

The modules necessary for this are:

  • SPARQL Views
  • Feeds View Parser
  • Feeds (I think alpha 16)

I will see if I can make a Drush .make file for this today. I am planning to do a screencast on how to get this all to work within the next week, so I will ping back here when that is posted.

Thanks for the info on which

yaph's picture

Thanks for the info on which modules to use for getting DBpedia data into a Drupal site. Looking forward to your screencast.

Cool, they are almost done, I

linclark.research's picture

Cool, they are almost done, I still need to put in a few things before I post them to YouTube, but I can post the rough cuts for download later today

Proxy caching

JeremyFrench's picture

However, you probably do not want to run the query live on DBpedia because the DBpedia public endpoint isn't reliable to the level that you would want for a production Web site or for a Web application.

This is true, and synced nodes is a good way to go. However I have also found that proxy caching can help with this as well. Very similar idea to synced nodes, but done at a different level in the application. Actually it can help a little with performance even if you have a local sparql endpoint, so long as you have a fairly limited set of queries that you run.

Screencasts :)

linclark.research's picture

ok, they are still rough and missing some bits, but these should get people going pretty well I think.

Save the following to a .make file and run Drush Make to get all the necessary modules.

core = 6.x
api = 2

projects[] = "drupal"
; Modules
projects[] = "admin_menu"
projects[] = "feeds_view_parser"
projects[] = "views_php_array"
projects[feeds][version] = 1.0-alpha16
projects[sparql_views][download][type] = "git"
projects[sparql_views][download][url] = "git://github.com/linclark/sparql_views.git"
libraries[sparql_views][destination] = "modules/sparql_views"

Videos

Scroll to the bottom of the page for the download links.
Installation and Query Building
Feed setup and import

I just watched your 2

yaph's picture

I just watched your 2 screencasts Lin and they opened my eyes on how to approach the problem of getting DBpedia data into a Drupal site.
This is really excellent work you've done here. Thanks a lot for sharing!

Awesome screencasts

milesw's picture

Great job, these really demonstrate how powerful your module is. Great introduction for those new to working with SPARQL too! :)

Also look at the linked_data module

febbraro's picture

Lin's great work is far more flexible as it uses Views and maintains the paradigms that folks have come to know and love.

That said, I wrote the http://drupal.org/project/linked_data module as a way to build some straightforward SPARQL or MQL queries and give you View-like theme options. There is an example module contained within that shows how to parameterize the queries, etc. Thought you might be interested. (Great work Lin!)

Semantic Web

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: