Media Module Metadata

jmstacey's picture

We are currently discussing how metadata should be handled by the Media module. Help us by joining the discussion!

aaron and I (jmstacey) had a disussion in #drupal-openmedia and the transcript is attached for viewing. Further reading can be found on the issue Meta-Data Plan and Structure.

The setup: The Media module will provide an API and central data store to fit a wide-range of media management tasks. This will provide a common base that many of the media management modules that wish to become a part of a unified Media ecosystem may build upon.

The question: The RDF module was originally suggested and we certainly do not want to reinvent the wheel. However, data stored in RDF may not be convenient for retrieval in the manner that we require for this this project (I only looked at the DB schema so I may have missed something).

Our sway: Aaron and I are currently leaning towards a new generic lightweight metadata engine that essentially provides a few hooks and a key/value storage system. This route would provide the Media module (and other modules) with the needed central store, provide other modules in the ecosystem a flexible metadata system, and make mapping metadata to RDF via a bridge module easier.

I've boiled things down to three options, but feel free to add alternatives as part of the discussion.

  1. Don't handle metadata. Let each module deal with it themselves. - Far from ideal
  2. Fully depend on something like RDF - Possibly severe performance and ease-of-use implications
  3. Create a generic lightweight metadata engine - Best so far

Discuss!: We want to hear your thoughts and suggestions, particularly if you are a module maintainer who will eventually be using the Media API.

AttachmentSize
6-3-09_metadata_transcript.txt16.91 KB

Comments

We've put a lot of thought

kreynen's picture

We've put a lot of thought into how we're structuring metadata in the Open Media System. Most of our discussions have been happening on GDO, but there are other discussions going on with PEG and the Alliance for Community Media that haven't been as public. Some information about the secret ACM standards work was mentioned at a session during the ACM East conference...

http://blip.tv/file/2162139

According to Rich Desimone (who heads the ACM project) they have a 15 page document about their metadata standard, but since that document isn't public there is no way to know how compatible it is with the metadata standards we've been working on or how much work it will be to support it. The ACM took the approach of modifying the PBCore standard to meet their needs. We've taken the opposite approach and have been working with Jack Brighton to influence an update to the PBCore standard to meet our needs. We believe the ACM's approach will ultimately fail because PEG use alone isn't enough to define a standard. Unlike the ACM's centralized system, Open Media System using sites will be sharing only metadata with a centralized server and and the network won't be limited to public access stations. We will be limiting the videos shared to content that uses a Creative Commons license, so including licensing info in the media metadata is important to us.

I've presented about our metadata work at the American Library Association's national conference to their media librarian working group. I was hoping they'd have answers to these problems, but we're already ahead of where their systems are at.

Currently the locations using the beta version of the Open Media System use a modified version of the European ESCORT standard's genre list (PDF) to categorize their shows. We've submitted a proposal to the Open Web Foundation to help organize a governing structure to maintain that genre list. Jack is working with the PBCore standards group to add support for multiple genre authorities into the PBCore standard. We are working to transition from an XML export of the taxonomy to a service that is driven from www.openmediaproject.org.

I know that Media isn't being developed specifically for PEG's metadata needs and that open tagging works for the web 2.0/DruTube style sites, but I'm hoping what comes out of GSoC works for Drupal using PEG stations too. If the method you end up using for handling metadata takes genre authority into account as well as licensing and geocoding, it should work well for us. For us, "works well" means if we move a video from one PEG station to another, we'll know exactly where that new video fits in the archive and when it would air... like a library knows where a book from another library belongs on their shelves.

I'm not sure if it is better to get just something basic for metadata into Media, shoot the "library standard", or just ignore metadata and let other modules (like Open Media's Project and Show :) handle the metadata. I've cross posted this at http://www.pbcoreresources.org/. I'm hoping someone from that group offer more advice.

PBCore in a metadata module

jackatwill's picture

I do like the PBCore standard for AV and related objects, and it would be wonderful to have support for it in the Media module via a Metadata module or whatever develops from this. PBCore is light and simple enough for humans to grok, and to be useful in mapping to other standards and expressions. As Kevin Reynen and I have discussed, I do think the PBCore standard needs a bit more work in certain key areas. Now that it's officially part of the American Archive project, PBCore is getting more funding and support, and the user community seems to be growing rapidly.

It's the pbcore user community that's most effectively defining how PBCore is used, and how it needs to evolve. Kevin mentions the issue of the genre authority, where you have a particular controlled vocabulary of genre terms, and a name for that controlled vocabulary. It's already valid within PBCore to have multiple genre terms for a media object, with each genre term paired with the genre authority from which it originates. This is also true for Title and Subject values, where each value is matched with a titleType or subjectAthorityUsed element. This comes in handy when you want a PBCore record to include user-generated tags, in which case the subjectAuthorityUsed value would be simply "keywords".

The point is the Media/Metadata module could facilitate the creation of these values, where Title, Subject, and Genre values are paired with title types, subject authorities, and genre authorities. Ideally the user could select a preexisting genre authority, whereupon they could simply select a genre from that list. You could choose to lock down the genre authority so users could only pick genre terms from one list, to aid interoperability in a "closed" system like the ACM PEG channels. You could also have multiple genre terms for one media item using multiple genre authorities, which aids interoperability with other "closed" systems. In exchanging media and metadata between server instances, each server could have a default genre authority it looks for, and disregard genre terms based on other genre authorities.

So if I as a producer want my stuff to play nicely in the ACM system, I'll select ACM genre authority and add the appropriate value from that list of terms. I could add multiple genre authority terms from different genre authorities, say Fox News (ha) and PBS, and now my metadata is useful to those systems.

The module then becomes a very powerful cataloging system, capable of creating librarian-grade shareable metadata.

Bringing in getID3, especially for extracting technical metadata, would be useful. Ideally you could create a new list of terms and give it a name, then share that with other systems. Or you could import an existing controlled vocabulary. And I haven't mentioned geotags and microformats, support for which would be very useful.

This seems complex enough that it should probably be handled in a separate Metadata module. My interest in this would be an outcome where we can effectively catalog and easily exchange metadata and media among Drupal systems, which are likely to grow rapidly in public and community media.

Jack Brighton

Licensing?

aaron's picture

http://www.pbcore.org/licensing/index.html
http://creativecommons.org/licenses/by/2.0/

What is the compatibility between the CC-BY license and GPL? Before we could distribute it within a contributed module, that would need to be resolved.

Aaron Winborn
Drupal Multimedia (book, in October!)
AaronWinborn.com (blog)
Advomatic (work)

Aaron Winborn
Drupal Multimedia (my book, available now!)
AaronWinborn.com
Advomatic

Incompatible

aaron's picture

Hmm... At first glance, it appears it's not compatible:

Creative Commons Attribution 2.0 license (a.k.a. CC-BY)

This is a non-copyleft free license that is good for art and entertainment works, and educational works. Please don't use it for software or documentation, since it is incompatible with the GNU GPL and with the GNU FDL.

Creative Commons publishes many licenses which are very different. Therefore, to say that a work “uses a Creative Commons license” is to leave the principal questions about the work's licensing unanswered. When you see such a statement in a work, please ask the author to highlight the substance of the license choices. And if someone proposes to “use a Creative Commons license” for a certain work, it is vital to ask immediately, “Which one?”

This would need to be resolved before anything could be released with Drupal. Any chance of the PBCore community dual-licensing their work?

http://www.gnu.org/philosophy/license-list.html#OtherLicenses

Aaron Winborn
Drupal Multimedia (book, in October!)
AaronWinborn.com (blog)
Advomatic (work)

Aaron Winborn
Drupal Multimedia (my book, available now!)
AaronWinborn.com
Advomatic

PBCore licensing could be changed

jackatwill's picture

Chances are very good for dual licensing PBCore to allow its use in a distributed module. This would align directly with the purpose of PBCore in the first place, and the interests of the community. I'll see what boulder needs to be pushed to get it rolling.

Jack Brighton

Jack Brighton

A note on PBCore and GPL

jackatwill's picture

Developer of the fabulous pbcore vermicelli application offers the following comments, while noting he is not a lawyer:

As I understand US copyright law, software which adheres to the PBCore standard simply can not be considered a derivative work of the standard itself[*]. So the permissions which one might need to redistribute the standard are simply not relevant.

Furthermore, even if one did consider the license of PBCore to be relevant, it is not necessarily the case that PBCore should be licensed under the GPL in order to be included in a GPL-licensed work; the FSF publishes a list of Free Software licenses, many of which are compatible with the GPL.

[*] Exceptions to this would be if you were writing software which somehow used the XSD or otherwise directly included large parts of the specification, for example as help text.

Still, I intend to follow up with the folks at CPB, where lawyers can be found in some abundance...

Jack Brighton

Jack Brighton

That's great news! Sounds

aaron's picture

That's great news! Sounds like we'll be able to go forward on this. Can't wait to hear what they have to say about the matter.

Aaron Winborn
Drupal Multimedia (book, in October!)
AaronWinborn.com (blog)
Advomatic (work)

Aaron Winborn
Drupal Multimedia (my book, available now!)
AaronWinborn.com
Advomatic

I just wanted to put this

jmstacey's picture

I just wanted to put this somewhere so that I didn't forget. I envision the Metadata module as being a very general foundation level tool for attaching metadata to a particular resource. The system would be capable of handling any kind of key/value data, be it EXIF or the name of the stream wrapper used. Use of Metadata would be left to the desecration of the various modules, and them in turn to the requirements of the organization (e.g. strict or free-tag style). This prepares the way for a comprehensive search and filter system to be implemented by modules such as Media, and easier queries in Views. Additionally, bridges can be written on top to provide mapping to specific formats (e.g. RDF).

We may want to provide some format recommendations, but ultimately I think it should be left in the hands of the various module maintainers and trust that a relatively well-known standard will be used when appropriate. We need to keep the system extremely flexible. For example, an S3 module might use Metadata to determine whether a particular resource is mirrored on Amazon S3.

Ultimately, Metadata would consist of a database table and several API functions and hooks to add, edit, delete, and query metadata.

Something to consider would be integration with Taxonomy, but I don't think we want to clutter up the system with this kind of information.

keep it simple

bkinney's picture

My feeling is that you should either leave metadata out completely, or keep it completely open. Neither one offers much by way of inter-operability, but at this point in time, I don't think you should let that slow you down!

SoC 2009

Group categories

Admin Tags

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week