Dublin Core integration for CCK module

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
anthonyoliver's picture

Adding Semantic Web support into CCK

Motivation

The web is growing at an exponential rate, which in turn means the total amount of data is as well. For instance look at the amount of blogs and video data that has been generated within the past couple years. Taking into consideration it will continue to grow at the same rate then it won't be long before finding a way to manage and organize that data will become unmanageable using today's conventional standards. One of major concerns of data that is being placed on the web is that is understandable by humans, but not understood by machines. To do this we extend the human readable data by giving it meta data, or data that machines can understand. This is commonly referred to as the semantic web. To describe the semantic web, here is a quote from Tim Berners Lee (Inventor of the World Wide Web) saying:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize. “

With this being said it is easy to see he has a vision for the future of the Internet. A side note to mention is that Tim Berners Lee also runs Drupal for his personal blog. If we could get someone like this to promote Drupal as a platform for semantic web applications it could help Drupal's growth rate and adoption tremendously. Now semantic web stuff is still relatively new and getting something like this into Drupal could help it take it far ahead of the curve in the area of content management systems. Also large sites like Wikipedia (ranked 11th in the world total traffic date of proposal) are currently gearing up to do something similar in hopes to bring information to the Internet in this fashion. Currently there is a debate about what standards will be used to access and control the semantic web. One of the currently popular/preferred standards is dublin core (http://www.dublincore.org).

The idea of this proposal is to have drop in dublin core support for the CCK module. The module would give the ability to auto-generate RDF data from CCK node's by mapping fields to corresponding terms defined by Dublin core (http://dublincore.org/documents/dcmi-terms/). This module I believe is somewhat of a necessity even just looking at the growth of Drupal (http://acko.net/blog/drupal-org-explosion-and-trends). Incorporating something like this will help make data more managable especially on large organizations such as Drupal, or say Ubuntu (which has just moved it's website to Drupal), as well as a number of other various large organizations out there.

Deliverables

  • Give CCK the ability to generate RDF documents
  • Map various fields of CCK nodes to Dublin core terms (http://dublincore.org/documents/dcmi-terms/)
  • Allow default configuration per CCK node type (auto-assign data if not present)
  • Allow manual overriding of meta data per CCK node type if allowed via access control

Remarks

I have discussed this proposal with Robert Douglass and he seemed very interested. There has been a lot of discussion around drupal about implementing something like this. Discussions can be found here:
http://drupal.org/node/30937
http://drupal.org/node/84751

There is currently the relationship module: http://drupal.org/project/relationship
but this module hasn't been upgraded to 5.0 and hasn't had updates in a few months, nor does it integrate with CCK which I believe should since it's now part of drupal core.

Msmiffy (http://drupal.org/user/118479) was supposedly working on this module, I tried to contact around the beginning announcement of Google Summer of Code but haven't heard anything back. The user has only been registered for about a month and no activity within a couple weeks so I don't know what will become of the Dublin core module they are supposedly working on.

Timeline

April 10 – May 27:
* get CVS account
* start talking to mentors about development plan and needed routes
* start playing with semantic web tools and get in depth knowledge about dublincore
* Talk to Robert Douglass about design and database schema and design plans he recommends for the project
* start going through http://drupal.org/contributors-guide and looking at keeping code secure, methods for submitting, test cases, etc.
* Start tearing apart CCK module and other CCK contributors modules to get a better understandering of CCK.
* Get in touch with domain experts about Dublin Core and Semantic web

May 28 – June 15:
* Write a mock up module for CCK.
* Get a basic alpha version of module going
* Keep tight contact with mentor(s) and make sure heading right direction
* Get code reviews from mentors

June 15 – July 9:
*Get beta version of module working and submit to Google for midterm review

July 16 – August 20:
* Make changes according to midterm review.
* Polish up and any code and make design changes if needed, have mentors & peers review code for security/bug fixes
* Find some beta testers for the project and get feedback and make adjustments.
* Submit module extension for final evaluation to Google.

August 31:
* Announce module to Drupal Community

Future

Eventually it would be nice to have a dublin core search module that you could drill down data through. It would also be nice to incorporate it within the content management part of drupal, allowing you do clean ups or archiving very easily on drupal sites that have massive amounts of data. Integration with views would also be very nice, this in conjunction with views would probably offer some really unique ways to display custom data to people and maybe eventually start to become something of an intelligent agent (http://en.wikipedia.org/wiki/Intelligent_agent) to display certain types of data in certain ways to users.

About Me

I am currently obtaining my degree in Software Engineering from Michigan Technological University (one semester left). I have been involved with programming since about the age of 14 when I got a home certification in PC Repair. I am a heavy user of open source software (all software, including operating system is open source). I have been the president of the local Association for Computing Machinery (ACM) chapter, as well as a part of the local Linux user's group (LUG). I have worked as a Summer Youth Teacher at Michigan Tech teaching Java and C++ to younger students, as well as a Computer Science Learning Center Coach. As a Senior Design project I worked in conjunction with the Navy on a project dubbed, Seabase, in which we used MATLAB to stabilize a payload on a sea crane. Worked for Professor Dr. Charles Wallace at Michigan Tech helping develop a XML case study tool for Software Quality Assurance. Worked within the Robotics Enterprise at Michigan Tech as well as doing embedded programming for hardware/software integration. I have worked for a Vision Guided Robotics company doing software development in Visual BASIC 6/.NET as well as routine IT person for them at the Houghton Innovation Center Office in Houghton, MI. I am now currently working for a Roadway Asset Management Software company that makes the software the Michigan Department of Transportation uses and finishing up getting my degree.
I have been involved with Drupal for about 8 months now. I am very active within the Drupal support channel on IRC as well as submitting bug reports for Drupal. Recently I have been getting more involved within the community by testing new modules, staying up with the Drupal dojo, listening to the lullabot podcast, and have been considering writing a module for a while now. I am in the process of starting my own consulting business, which of course would be using Drupal as the platform for web development so you can expect me to stay active within the Drupal community. On Dries learning curve chart (http://buytaert.net/drupal-learning-curve) I am about at the theme and module development section. To show my initiative I have started learning how to write a basic module on my own time and trying to learn the basics of jQuery and have created a Drupal install at http://xamox.net/drupal/ specifically for this. I also just pre-ordered the Pro Drupal development book that comes out April 9th and believe this would be a good supplement/resource for the summer of code project. Taken into consideration my activity within the Drupal community over the past 8 months, as well as my Software Engineering background, I believe I am a strong candidate to be a summer of code student, if not for this project any other.

Comments

This video is also

anthonyoliver's picture

This video is also informational:

Tim Berners Lee on Semantic Web


http://xamox.NET

Some other thoughts --

bonobo's picture

How much of this can be accomplished using the Computed Field module: http://drupal.org/project/computed_field -- I'd bet this would get things moving pretty well --

Two other things to consider: in addition to Dublin Core (or instead of Dublin Core), there's the LOC metadata standard -- http://www.loc.gov/standards/mets/

This metadata also seems ripe for sharing using OpenSearch -- this creates the opportunity for creating more flexible document/data repositories

what's the current status?

schuyler1d's picture

what's the current status of this? did the SoC get picked? if not, are you doing it anyway :-)

Dead Project?

mgifford's picture

I'm guessing this is a dead project outline.

I had wondered if perhaps a vocabulary had been developed for RDFa from the Dublin Core Metadata Initiative. Didn't find it here though:
http://drupal.org/project/rdf

UPDATE: I added some code for Dublin Core here - http://openconcept.ca/blog/mgifford/adding_dublin_core_metadata_to_drupal

Mike

OpenConcept | CLF 2.0 | Podcasting

SoC 2007

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week