[Final] Views plugins to output node lists as XML/RDF/JSON/XHTML

Events happening in the community are now at Drupal community events on www.drupal.org.
allisterbeharry's picture

Abstract

This is a proposal to extend Views with the goal of enabling Drupal semantic data sharing and interop out-of-the-box by writing views style-plugins that can spit out node lists as XML, OPML, RDF, JSON, and XHTML.
Only local images are allowed.
For example if you created a view with a list of events and details, you could with one click generate a page containing events in the hCalendar microformat or as items in Exhibit JSON format. Similarly lists of user profiles could be output as FOAF profiles in RDF or in the XHTML hCard format. Lists of forum posts could be output as RDF documents using the SIOC vocabulary. Or a simple list containing books from a personal book collection could be output as OPML. This idea is motivated by the accelerating activity around Semantic Web data sharing. The ultimate realization of this idea would be to enable a Drupal site operator to share datasets from her site already constructed through the Views UI in one of several open and reusable formats based on XML, XHTML or JSON, which would be consumable by other Web agents. In the screenshots below, a views list of audio tracks has been enabled to render as JSON or OPML or raw XML.
Only local images are allowed.

Only local images are allowed.

Only local images are allowed.

Benefits to Drupal/open-source community

This project will provide a way for Drupal site operators to open up their site to semantic data sharing and processing without requiring new tools or skills to be learned. Views is one of the most well-known and understood Drupal features and being able to use Views alone to expose site data in formats like RDF and JSON will significantly decrease the learning curve and effort involved in making a Drupal site Semantic Web-aware, and increase the uptake of Semantic Web data sharing. Given the large user-base Drupal enjoys, this will also significantly benefit the entire net-wide Semantic Web activity as a whole. This project can be thought of as a 'front-end' complement to the work going on in the Services and RDF APIs, which are targeted at developers mostly. However developers too will benefit from this project as it may lead to a common library for node serialization in different formats that projects like the Services and RDF APIs could share.

Project Details

I was really blown away by the "Video from the future" from Drupalcon 2008 showing the type of computing possible by opening up data stores using standards like RDF and microformats. It was the first time that I was really seeing what the Semantic Web was supposed to be about and it got me interested in researching the core technology base for data representation, querying and presentation in this next step for Web applications. I got to understand the core standards for representation like and "triple stores", extraction of RDF from existing documents through GRDDL, querying through languages like SPARQL, and agents for consuming data like MIT's Simile Exhibit and the Disco browser

Hanging out in the introduced me to work that had been done in Drupal with the RDF API and Web Services API and PHP toolkits for enabling data interop like ARC and Triplify. One common motif was that data should be transformed and shared from existing sources, instead of attempting to implement top-down manual markups of datasets which would be way too costly. This is the reality and focus of most Semantic Web activity today. So this lead me to thinking about how to share node data in Drupal using existing functionality.

If an axiom of Drupal content management is that data is separate from presentation, why not reuse Views and instead of generating HTML lists from node objects, generate reusable and transformable XML, JSON, and XHTML. Views already has the ability to use style-plugins that can render node data in a arbitrary way - one can create a chart using the Google Charts API and a current SoC 2008 idea is to standardize a Chart view type. The views_rss plugin can generate an RSS feed from a view and the Views Bonus Pack can export views in CSV and .DOC format. So to extend this idea - I have started to write Views plugins that render node lists as :

  1. A raw XML document or an OPML document, using field-labels as attributes for each node element;
  2. A structured XML that maps node fields to elements and attributes using a specified schema like Atom ;
  3. A JSON data document which can either be a canonical representation of the data or use a predefined serialization format like Simile/Exhibit JSON;
  4. An RDF data document that can use one of the following three vocabularies:
    1. FOAF
    2. SIOC
    3. DOAP

    Each of which requires a defined set of fields, to be present in the view.

  5. An XHTML document which contains nodes marked up in a microformat like hCard or hCalendar(http://microformats.org/wiki/hcalendar)

Each XML or XHTML document could include a GRDDL profile and a pointer to a transformation fileindicating that RDF data may be extracted from the document.

Implementation

The proposal specifies building 4 modules:

  1. views_xml.module
  2. views_json.module
  3. views_rdf.module
  4. views_xhtml.module

Each module implements:

  • *_views_style_plugins to provide Views types for the formats specified above:
  • *_views_arguments in a like manner as views_bonus_export.module to provide links and icons for existing views to render the nodes in the format specified above:
  • theme_views_*to perform the actual serialization of the node data to the format specified. Serialization can use either only the node fields defined in the view or all the node fields; this data extraction method is specified by a parameter.
  • *_views_arguments to provide needed options for the serialization - for example for RDF documents specify the vocabulary to be used : FOAF, DOAP, SIOP...or for XHTML documents specify the microformat: hCard, hCalendar, Geo... to be used for rendering each node.:

I've already written working proof-of-concept code available in my CVS sandbox which implements rendering Views as raw XML, OPML and JSON. I've posted screenshots in my application post in the SoC-2008 group. Based on this experimentation I believe that all the objectives 1-5 stated above are doable in the SoC timeline. One area of major effort will be in RDF serialization as RDF conceives of data as 'graphs' and the 'flat' nature of node data will have to be processed to fit the RDF model. However several PHP toolkits exist for working with RDF including the well-known ARC. Even if I'm not able to complete all the formats I have in mind, the existing formats implemented as well as design patterns gleaned from the project will provide an excellent point to pick-up and continue development after SoC - each format is implemented around a basic design which I believe is proven to be sound.

Deliverables

  1. views_xml.module
  2. views_json.module
  3. views_rdf.module
  4. views_xhtml.module

Each module implements:

  1. *_views_style_plugins to provide Views types for the formats specified above;
  2. Only local images are allowed.

  3. *_views_arguments in a like manner as views_bonus_export.module to provide links and icons for existing views to render the nodes in the formats specified above;
  4. Only local images are allowed.

    Only local images are allowed.

  5. theme_views_* to perform the actual serialization of the node data to the format specified. Serialization can use either only the node fields defined in the view or all the node fields; this data extraction method is specified by a parameter.
  6. *_views_arguments to provide needed options for the serialization - for example for RDF documents specify the vocabulary to be used : FOAF, DOAP, SIOP...or for XHTML documents specify the microformat: hCard, hCalendar, Geo... to be used for rendering each node.
  7. Only local images are allowed.

Each module will implement DrupalTestCase and provide unit tests for each piece of functionality.

Timeline

I favour an iterative development process; I would like to release new versions of the 4 modules every two-three weeks which fixes the bugs from the previous releases and implement new pieces of functionality incrementally. By July 14th I expect to have working implementations of the majority of formats I want to support.

May 26th - June 8th
views_xml - OPML, and Raw XML
views_json - Exhibit JSON
views_rdf - FOAF rev. 1
views_xhtml - hCard

June 9th - June 22nd
views_xml- structured XML using Atom schema rev 1
views_json - Canonical JSON
views_rdf - FOAF rev 2
views_xhtml - hCalendar

June 23rd - July 13th
views_xml - structured XML using Atom schema rev 2
views_json - JSONP/JSON in script (http://code.google.com/apis/gdata/json.html)
views_rdf - DOAP, SIOP rev 1
views_xhtml - Geo

July 7th - July 20th
views_xml - structured XML with arbitrary node-field to schema mapping rev 1
views_rdf - SIOP rev 2
views_xhtml - GRDDL embedding rev 1

July 21st - August 3rd
views_xml - structured XML with arbitrary node-field to schema mapping rev 2
views_xhtml - GRDDL embedding rev 2

August 4th - 18th
views_XML GRDDL embedding rev 1 and 2
Complete docs and tests.

Bio

I'm currently in the 2nd year of BA program majoring in Mathematics and Linguistics. I participated in SoC 2007 with Drupal, working on the DAST project which was a lot of fun and will hopefully start gaining more traction with all the activity around testing and auto-deployment. I think I am a good choice to work on this project because I have a lot of enthusiasm for the Semantic Web and what is possible for this next generation of the WWW and enabling one of the best CMSs on the planet with this technology will be a very exciting and rewarding endeavour for me.

Comments

This is a great project

bonobo's picture

This is a great project --

+1 on this

I also think that for reasons of getting some solid deliverables completed by the end of SoC, choosing some specific formats to target would be ideal. Then, if there's time left over, additional views plugins supporting additional formats could be created.

Cheers,

Bill


FunnyMonkey
Tools for Teachers

omg

jpetso's picture

Major coolness. The idea is awesome, I'd greatly enjoy it being implemented :D

XML / RSS

alex_b's picture

Looks very good.

How far would this module help with creating a user defined RSS feed? Or would it really just provide the bare bones functionality of creating an XML from views and then leaving the specific format up to other modules or theme overrides?

Alex

If RSS-XML is a a supported

allisterbeharry's picture

If RSS-XML is a a supported view type by the module then yes you could just specify what fields you want and the View would just be an XML format document using the RSS schema. If RSS-XML is not a supported format (although it would be a very straightforward format to generate - mostly duplicating views_rss) then you could use the "Raw XML" view format and have your module transform it to RSS using your theme overrides or implementing views_post_view

Incidentally, how is a 'user-defined RSS feed' different from creating an ordinary view and creating the feed for it?

I'm not too clear on the

moshe weitzman's picture

I'm not too clear on the relationship between your hook_views_arguments() and the fields that the admin put into the View. These overlap a lot, no? I would think that the style plugin could do validation on the usual fields array to make sure all the required bits are present. Not sure the best way to proceed here.

We might throw in CSV as an output format too.

I'm misusing arguments here

allisterbeharry's picture

I'm misusing arguments here as I'm not using them to pass in fleld values for filtering or sorting. Instead I was envisioning using an argument that specified the document type based on the view type - so for example if you selected view type as 'XML Document' - I was thinking you could set an argument 'Document Type' that would select the type/schema of the XML document to render - say 'RSS-2'. This probably isn't the best way to go - I should just have all the document types in one place - in View Type and leave arguments for what it was intended for.

IIRC, Views_bonus uses

bonobo's picture

IIRC, Views_bonus uses arguments exactly as you describe -- you can have both a view type and a views argument that generates the doctype --

This adds a level of flexibility, as you can specify arguments ahead of the final doctype arg to drill down into content.


FunnyMonkey
Tools for Teachers

OTOH Views Bonus pack does

allisterbeharry's picture

OTOH Views Bonus pack does the same with the CSV and DOC export - using arguments to select the file name.

So this isn't a misuse of arguments.

doh meant this for moshe's comment above - same thing you said basically.

Sounds great, tonnes of

catch's picture

Sounds great, tonnes of potential uses. +1

Sounds Awesome

robloach's picture

This would be a very handy system. The Services module's views.getView service allows you to parse view data into AMFPHP, JSON, REST, SOAP or XML-RPC data. I don't believe there is an OPML server around though.... The Services module doesn't provide views.getView as view types though, so this might be a very handy alternative.

Views Bonus export module does some of this...

webchick's picture

http://drupal.org/project/views_bonus

I think you already touched on it above, but could you further elaborate on how this is different from the views_bonus_export module? I assume it's not overlapping, because Rob Loach just responded saying it's a good idea. ;)

Views Bonus

robloach's picture

Ah yes, I forgot about Views Bonus. Views Bonus implements exporting as CVS and DOC. I believe there's a patch in the issue queue to export as XML.

How different is this going

catch's picture

How different is this going to be between views 1 and 2? Is there enough meat for a full project? On the face of it, it seems maybe not.

I see some value in getting

bonobo's picture

I see some value in getting some focused dev on both porting existing functionality, and getting json/microformat support. Additionally, it would be nice to get atom working once and for all -- add in the ability to create an atom feed, and add fields to a view that become part of an xml payload within an atom feed, and I think this would be very useful.

Although atom wasn't mentioned in the original post...

This was started a while back, then got switched to a GHOP task, but was never completed --


FunnyMonkey
Tools for Teachers

How much of this is covered

allisterbeharry's picture

How much of this is covered by Views 1/2? I haven't seen anything like it it so far.

The only similarity is with

allisterbeharry's picture

The only similarity is with the export feature of Views Bonus - my idea basically does for XML/XHTML/JSON what views bonus does for CSV and DOC. The rest of Views Bonus is about different HTML presentation styles for node lists, not different renderings formats for node lists.

extra RSS fields

kvantomme's picture

I guess this is the RDF views plugin that several people have been talking about. I have heard people say that XML export should be implemented with the services module, but I think a lot of users would love to see an implementation in the views framework.

So +1 from me.

Probably you want to set a more limited goal first and then implement additional features when those are implemented.

One of the coming days I will be contributing a module that allows you to add extra RSS fields based on the fields you selected in the View UI. It also supports nested tags. For this module I have been doing research on this so I'm happy to be a use case/big picture mentor for this project if it gets chosen.

--

Check out more of my writing on our blog and my Twitter account.

I haven't read about the RDF

allisterbeharry's picture

I haven't read about the RDF views plugin but it sounds like a similar idea, although this woulb be much more lightweight that something that plugs in the RDF API. This RDF plugin doesn't convert data to RDF graphs or interlink with other URIs or allow you to do SPARQL queries, it justs generates a RDF/XML document from a node list.

The views service Rob Loach mentioned returns a list of nodes generated by views that could then be serialized in different format. Perhaps the plugins could share the serialization code or maybe there could be a common module for just taking node lists and serializing them to different formats?

There is no actual RDF

kvantomme's picture

There is no actual RDF plugin, it's just kept popping up as an idea in Boston.

If you could make a more flexible friendly UI approach to outputting node lists in different formats, this would be a boon.

--

Check out more of my writing on our blog and my Twitter account.

Overlap

agentrickard's picture

There is -- though it may not be obvious from the description -- overlap with http://groups.drupal.org/node/10012

We have a developer (grndlvl) who has already developed some of this functionality.

--
http://ken.therickards.com/
http://savannahnow.com/user/2
http://blufftontoday.com/user/3

From View to Service

rpfilomeno's picture

Hi, Alex_b pointed me out that there is a slight similarity between this project and Views as Web Widget. After much thought, it would be possible to reuse this if rather than a View, this could also be exposed as a service too?

There are a couple of ways

allisterbeharry's picture

There are a couple of ways the two modules could cooperate. The code that goes into serializing views into the different formats could be factored out into a common module, call it node_meta.module say, that the Services module use to implement the different output servers Right now inside views_json.php there is

<?php
views_json_exhibit_render
($vid, $nodes, $type)
?>
and like functions which could be factored out into a separate cells.

A second way is to have the widget use the views themselves. The ability to create data sources using the Views GUI builder might remove the need for a second user interface for building the widget; just build your view and export the data as JSON or XHTML. If you need more functionality like authentication or pre-processing then Services would be the way to go though.

I think the best approach would be to have the web widget consume data sources from URLs instead of tying in nodes directly in PHP. This way it can connect to either Services or Views data sources.

Custom style plugin for views

asif vhora's picture

how to create Custom style plugin for views.

Please help me..

Semantic Web

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: