Making Metadata Work

Posted by johnthatcherjr on October 23, 2009 at 3:02am

In Denver, after continued review of the development spec for the theme recommendation engine, we have some concerns about the amount of meta data to collect from producers. More specifically, we are trying to find the balance in the amount and meta data fields to request and/or require. The stations that have provided feedback have similar concerns.

With a lot of discussion, we have narrowed down the genre list to about nineteen. I have attached another updated genre listing to this post. The listing is also below. I don't present this list with an assumption that it is finalized. I think that we all kind of need to chew it over and figure out what works best to meet our goals. Those goals for the final product are to,

Simplify the produce experience,
Create a system where all station control scheduling via theme blocks,
Facilitate sharing, and
Create an archive of public access content with accurate metadata.

Using the new list please the relationship between the listed genres and your station's theme blocks.

Comments

Listing

Posted by johnthatcherjr on October 23, 2009 at 3:06am

Doesn't look like I can attach and xml file to a post of GDO. I will e-mail the list to the listserv. This list is typed too.

Art and Design
Culture/Ethnicity
Education/Schools
Entertainment
Environment & Animals
Film & Animation
Gender/Sexuality
History & Biography
Home/Garden/Food
How-to/Style
Lifestyle/Health
Music
New Age/Spiritual/Religous
News & Politics & Business
Nonprofits & Activism
Science & Technology
Sports/Hobbies
Travel & Events
Youth

We continue to have our

Posted by civicpixel on October 23, 2009 at 6:30pm

We continue to have our debate here regarding simplicity of collection vs accuracy of metadata and are looking forward to seeing the results of the mapping exercise at other stations.

Building on what John mentioned, our debates are focusing around:
1. At some undefined point, asking for more metadata == less metadata / less accurate metadata received. Ex: Our current ESCORT implementation asks for a great deal of metadata but none of our producers are willing to spend the time filling it out accurately. It's a bell curve, and we're in disagreement internally over where the best balance is. Some of us think it's collecting very little metadata, others are pushing for more.

The more metadata we have, the more accurately we can automatically categorize content received via the sharing system. In Denver, the argument is being made on one end that we have to have at least the reduced topic list + the audience list in order to make the 'recommendation engine' work and automatically place new content into a station's custom theme blocks. On the other side of the argument we have people suggesting that the additional accuracy obtained from the audience list is not great enough to require producers to select from the additional options, and that we should focus on having just the topic list and assume that in any case there will need to be some manual intervention with some shows to make sure they end up in the appropriate theme block at a station. We also have someone here arguing for no topic list, and just mapping theme blocks to theme blocks, making the ingest form for the producer extremely simple.
With initial mapping results, we're starting to see another problematic situation where different stations would map the same show into different items on the topic list. If this happened too frequently, it would make the topic list fairly useless, and would be an argument towards just mapping theme blocks to theme blocks.
Projects vs Shows and metadata entry. With the new metadata system, we are going to make sure that any metadata entered into the project will automatically propogate into shows so producers can avoid double entry. This is great unless a project picks a general topic say 'cultural issues', but has shows that vary significantly on topic like GLBT, environmental, etc. In this case, it is highly likely that a producer would just leave the show metadata the same as the project metadata, at which point having a more fine-grained list of topics/audiences is defeated.
Finally, we are frequently coming into a debate on whether the automated system should play a role as content curator or whether that role should remain largely with the station directors / outreach / human-to-human interaction. If it needs to play the content curation role, then more metadata is necessary -- i.e. if you have a cultural block but want to be making sure that a good chunk of latino/a programming is automatically scheduled into that block, then you would need to have the audience metadata in order to do that. Otherwise a producer that had a show about well being and lifestyle, albeit primarily latino/a focused, is often likely to pick the well being and lifestyle theme block as opposed to the latino/a theme block. The alternative is for a station's staff to interact more with producers, have a greater understanding of the content being ingested, and make manual adjustments to the automated schedule when necessary.

Hopefully I didn't taint any of that with my personal opinions, anyone feel free to correct me if I did!

Open Media Foundation

Remind me again....

Posted by PetePO on January 28, 2010 at 5:37pm

I've been trying for the life of me to remember why media rss isn't being used for metadata/file transports... I know there's a reason... Anyone out there to remind me?

I've been reading up on the Participatory Culture Foundation RSS Standards and they seem really nice and simple: https://develop.participatoryculture.org/trac/democracy/wiki/RSSFeedSpecs

I'm also having trouble finding sections of Media RSS syndication that are missing for community media - EX: Want to rate your content similar to abc? ok. MRSS can do that. Just want to flag it as adult so it is categorized as an after-hours show? ok. MRSS can do that too. Oh - wait. you don't want to mark anything and have all content treated the same? ok. you get the idea. Check out http://video.search.yahoo.com/mrss and hopefully you'll get as excited as I'm getting.

MediaRSS is being used... for

Posted by kreynen on January 28, 2010 at 6:34pm

MediaRSS is being used... for the media elements. The issue isn't using RSS, it's defining the common metadata elements required to make RSS work. It's easy for Tevlue to extend RSS with custom elements because they control every device creating those feeds. It's harder to do in open source where any Open Media user can add and alter the original configuration or add/rename taxonomy terms.

MediaRSS allows <media:category> to define a scheme which could be something like PBcore, ESCORT, MPAA or any number of other category authorities. Without an authority, category becomes open tagging and we've seen the result of that approach in Denver. Unfortunately we were never able to agree on a common set of terms.

Everything is we do is complicated by the lack of a group capable of managing a standard. Something that seems simple like defining adult content is complicated by the fact that MediaRSS use of <media:adult> as a true/false is deprecated. Unlike <media:category>, there isn't a standard for indicating whether a rating is MPAA, PBcore or a custom checkbox added to the content type with CCK.

Take a look at how Boxee has extended RSS with elements they use. I think this is much closer to what we're going to need that what PCF is doing with the addition of correctly defining Creative Commons Licenses in the RSS.

I probably missed something

Posted by ekes on January 28, 2010 at 6:55pm

I probably missed something from this comment, I've read up, but still feel like I'm parachuting into a conversation. But...

Both media:category and media:rating have (optional) scheme attributes.

It's clearly better to have a recognised scheme for rating. Otherwise what does the definition 'adult' actually mean?
In the case of category it is always an issue to agree a set to work to - been down that one often enough. Maybe it would make sense for sites to do much like they do with RSS2 categories and put the scheme as the site's own taxonomy by default, unless there is an external one chosen?

The Media module extends

Posted by aaron on January 28, 2010 at 7:06pm

The Media module extends files & streams as fieldable entities, which means you can now add fields to any media on the system, such as with taxonomy, text fields, node references, etc. Should make things much more usable from an architectural standpoint in this direction.

Aaron Winborn
Drupal Multimedia (my book, available now!)
AaronWinborn.com
Advomatic

The Creative Commons module

Posted by kreynen on January 28, 2010 at 9:39pm

The Creative Commons module now adds the licensing to the RSS as well. Unfortunately beyond Creative Commons for licensing, we struggle to find common ground on elements with authorities that can be easily mapped.

As far as ratings, I was responding to PetePO...

Just want to flag it as adult so it is categorized as an after-hours show?

It's just not as simple as flagging content as adult. While you can define a rating with a scheme like <media:rating scheme="urn:v-chip">tv-y7-fv</media:rating>, someone still has to write code to determine what that means to the system pulling that content in as well as establish the framework to add this metadata consistently to both the user UI and the markup in the feeds. Authorities for scheme only work if I can map a rating using v-chip to my system's PBcore based rating.

Beyond rating, we include information in om_show like locally produced and give preference to content that was created at Denver Open Media when scheduling. Locally produced in Denver is a checkbox that stores a 1 in the database. In addition to the Locally Produced value, we include geotags to define the locations in the video content. Somehow we need to convert that Locally produced 1 into something other Open Media Systems can recognize so when content from Austin airs in Denver we have some way to give them credit and additionally aren't re-downloading content from Austin we originally sent them.

While I'm looking forward to moving from a Media Mover/Filefield configuration to Media, most metadata belongs to at the show level... not the MPEG, H.264, flv, ogg, etc or other media element of files associated to the show.

We still have a lot of work to do on the sharing portion of the Open Media Project, but we need to get at least 2 sites using a common codebase they are capable of maintaining themselves before we need to worry about transferring videos between them. Most stations we worked with in the beta phase lacked the technical skills required to manage an Open Media configuration that includes encoding. We've added a section defining the technical skills required for each Open Media feature set. The metadata stored in an om_show node doesn't have much value without a properly encoded video.

The metadata discussions between the stations that participated in the beta stage of the project turned classic example of what color to paint the bike shed. It's easy enough to have an opinion about what should be in a genre list, but harder to understand why standardizing on authorities for schemes is so important when sharing content. The last attempt at coming to some type of agreement resulted most stations (including Denver) acknowledging that it's more important to them to use genres that make sense to their local users than to use a list that would be more helpful when sharing.

Kreynen's Creative Common's module posting

Posted by fmoses on January 29, 2010 at 5:12am

Wonderful wisdom. Thanks so much for the "what color to paint the bike shed" link. Why not have it both ways - a general list with a coarse cut to break the logjam and the option of local lists?

On the other hand, even the Supreme Court had problems with the Adult issue - see I know it when I see it.

Optional RSS Elements?

Posted by PetePO on January 29, 2010 at 4:44pm

I'm way out of my element here. Pun intended. But, if I've been reading the correct articles on media RSS it seems that you can include/exclude whichever elements are of most importance to you. If this is right the metadata question seems to me to be more of a Cathedral and Bazaar issue. I don't know if any of us have enough social capital to build a new cathedral these days so I was thinking of the metadata being more mailable than it sounds. I can't find anything in the boxee rss standards that works with the either/or element that's found in the yahoo media rss primary element:
Added <media:adult> element to distinguish content of an adult nature.
If my assumptions on this are correct, and my assumptions are often not correct, then the media rss feed an organization would need would be based on it's primary cross-posting service.

Bazaar != chaos We're not

Posted by kreynen on January 29, 2010 at 6:41pm

Bazaar != chaos

We're not trying build a Cathedral. The easiest way to leverage the limited resources we have is to use existing standards as building blocks in each configuration to facilitate communication.

@fmoses We already have it both ways. Every station defines their Timeslot Themes. You can see Denver Open Media's Timeslot Themes and the schedule grid that shows the lead Theme as well as its Pairing...

http://www.denveropenmedia.org/timeslot-themes
http://www.denveropenmedia.org/timeslot-calendar/2010-01-29

One of my tasks is to display a pie chart to display how the hours of programming are categorized. The idea is that the programming ratio should closely mirror the type of content the station has been adding over the last 6 months.

http://www.denveropenmedia.org/stats

My hope for a standardize genre list wasn't to find a list of common themes across multiple stations. That would just result in producers selecting the same information twice. Instead we want something more specific that might allow a station to split a theme up or something that is considered part of a Mexian-American theme in another station would air in Denver's Cultural theme.

A good example theme that most stations have is Music. While a station just starting theme based scheduling may start with a general Music category, they may find that between music videos, recordings of local concerts, music documentaries, and live music related studio shows they have too much content for that theme. Rather than just expanding the amount of time allocated to Music, they may decide to break the content up into Music Videos and Live Performances. Unless you've captured additional metadata about the show, the only way to do this is to go through hundreds of shows and manually recategorize them... or in Denver's case, have an intern do that. It is easy to merge 2 themes into one. It is much more difficult to separate them.

My experience has been that only the people who understood the value for using both locally produced Themes and universally excepted genres are the people who've had to make sense of large video archives and/or categorizing content from outside sources into their system.

From a development perspective I don't care what the definition of adult is or how you sub categorize music related videos. I just need some level of consistency to know that Denver's 101010110010111 matches Austin's 111011010010101. I know that there are several projects trying to programmatically derive meaning like Calais, but I think those are still 5-10 years away from being useful.

Comments

Listing

We continue to have our

Remind me again....

MediaRSS is being used... for

I probably missed something

The Media module extends

The Creative Commons module

Kreynen's Creative Common's module posting

Optional RSS Elements?

Bazaar != chaos We're not

Open Media Project

Group organizers

Group categories

Audience

Module

New groups

Group notifications