Help define scope of the API for Derivates API for Media for GSoC project

Posted by kreynen on April 29, 2011 at 5:19pm

While there were several Media related project proposed, only one made the list of 20 Google Summer of Code projects approved by both Drupal and Google. Janez Urevc's Derivates API for Media project will...

...implement Derivates API for Media Library (Media, Styles, ...) ecosystem in Drupal 7. This API will provide a flexible, extensible and abstract API to implement derivation engines for different types of files.

This is a great opportunity to improve on the work Arthur Foelsche did on Media Mover in D6. The next 3 weeks are designated for students to research their project and connect with the community. Janez will be getting familiar with several media related modules including Media Mover and Video, but he also needs advice on the features of the API and project's scoping.

In many ways, I see this project being similar to the FeedAPI -> Feeds Transition.

What features should be included in the API and what is implemented by modules using the API?

If you used Media Mover in D6, what did you wish it did better/differently?

If you are already maintaining a D7 media module that works with 3rd party CDNs or encoders, what features would you want from a Derivates API?

Comments

Metadata?

Posted by pdcarto on April 29, 2011 at 5:40pm

I'm not sure if this fits in here, so ignore if appropriate. But it seems to me in any API which deals with moving files around, you should also deal with moving metadata, and any API which converts files, should also deal with converting metadata.

Media Mover

Posted by darrick on April 29, 2011 at 6:09pm

We are in the process of reconfiguring how we use Media Mover. We want to use watch folders for everything and have Drupal sync to the contents of those folders. I.e. for h264 VOD media mover would pull in newly encoded videos and tie them to the om_show record. If the encoded video is deleted from the folder then the om_show record would be updated. It would be good if Media Mover could implement the second part. I.e. Harvest files in om_show records which no longer exist in the watched folders and then remove the reference.

Most of our encoding is done outside of Drupal via droplets located on our NLE systems. We sync metadata between Drupal and Cablecast system. The Cablecast system basically has watch folders for the video files it uses. We are exploring ways of stashing more metadata directly in the video files to more easily associate metadata records with the files in our watch folders.

Hello also from me

Posted by slashrsm on May 2, 2011 at 6:22pm

I'm glad to hear your suggestions. I am sure, that we'll be able to add a great new features to media.

We are currently in a research phase, where we try to identify exact features that will be initially supported by this project. Architecture planning will follow afterwards. Start of coding is planned for end of May, so most of the things should be clear by then. I welcome everyone interested in this to actively help with feature requests, architecture recommendations, ...

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

I chatted briefly about this

Posted by civicpixel on May 9, 2011 at 9:18pm

I chatted briefly about this with kreynen in IRC and it sounds promising. I've worked quite extensively with Media Mover, and I'm continuing to develop the Internet Archive module as I implement it in various places. From my perspective I would love to see the Derivatives API first focus on providing solid derivative storage:

remotely and locally stored derivatives
multiple providers (example: storing two h264 derivatives one generated by a local encoder and another by archive.org)
multiple file types, I know this is already in the plan which is great.. for internet archive purposes I would be storing 4 different video formats, two different image formats, and several audio formats.
ability to inform engines to delete / re-derive derivatives (ex: user uploads a video and it is derived, user later discovers they need to cut a segment which means they need to be able to delete the derivatives, replace the source file and have it re-derived without deleting the node/metadata and starting from scratch. Another example is a derivative encodes improperly at archive.org and needs to be re-derived.
ability to delete source/original file without losing its derivatives (ex: media center has a large raw source file that they store temporarily before sending off to archive.org, once the encoding at archive.org finishes and the derivatives are available most centers opt to delete the original raw file)
For metadata, I think it would be best to focus on derivative specific metadata storage and let nodes/entities handle the more general stuff since it varies so widely.
Just as reference, the derivative information Internet Archive stores now that I think would be relative to this API includes: status (queued, transferring, transferred, validated, derived, failed), derivation start date, derivation completion date, item / bucket name, md5/hash, url, file name, filesize, format
File validation (that it exists, is > 0 bytes), remote & local -- my experience with both media mover & IA is that sometimes files "succeed" that actually ran into a problem that neither module was expecting thus the file is never really completed, sometimes files locally get stored with 0 bytes, sometimes a file gets stored somewhere temporary and disappears later. Having the ability to verify the existence of files makes it much easier to maintain a clean archive
a friendly UI for derivatives management

This would really allow me to strip all of that out of the D7 Internet Archive module, think of it more as an engine for the derivatives API, and really focus on all the other parts (managing transfer queues, better user settings, better reporting / problem solving, easier setup, etc). Unless there is a stable 7 port of Media Mover that could be rolled into this during the timeframe of the project with help from arthurf, I think focusing on a wizard process to handle file harvesting, derivative queues, file movement/transfers, "steps", etc in addition to the storage items would be really hard to pull off correctly.

That's all I can think of at the moment… I would love to help in any way that I can, if the storage aspect becomes solidified I can start work in the next month on building a 7 version of internet archive to leverage it. Look forward to hearing more!

Open Media Foundation

Blog post about thoughts of

Posted by slashrsm on May 30, 2011 at 2:34pm

Blog post about thoughts of this project: http://janezurevc.name/derivates-api-feature-list

Based on my thoughts, speaking with other people, ...

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

Help define scope of the API for Derivates API for Media for GSoC project

Comments

Metadata?

Media Mover

Hello also from me

I chatted briefly about this

Blog post about thoughts of

Open Media Project

Group organizers

Group categories

Audience

Module

New groups

Group notifications