Open Media Project and Archive.org join efforts to offer free video encoding, file storage, and VOD for Public Access TV stations

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
civicpixel's picture

About three months ago I posted about Denver Open Media's initial results integrating with Archive.org via the Internet Archive module -- at that point we had transferred 348 shows to archive.org. As of today, we have now transferred 2746 shows, almost 2TB of data. The majority of denveropenmedia.org's on demand video content is now progressively streamed from archive.org, and more than 2/3rds of our video content (including broadcast files) is available via our collection at Archive.org.

Even more exciting is the fact that we turned off our video encoder and disabled our previous video workflow process (media mover) two weeks ago. Due to amazing support from Archive.org, particularly the work of Tracey Jaquith, we no longer have to do any encoding locally. Any file submitted to sub-collections of the Open Media Project collection can now apply to have uploaded content encoded, not only for the web, but as a broadcast quality MPEG2 based on the ACM Preliminary Video File & Metadata Standards.

Our video ingest workflow at DOM is now:

  1. Producer exports/saves their raw video from DVD, Final Cut, or any source, to our local storage (RAID)
  2. Producer creates a show on denveropenmedia.org, selects their raw video file and enters information about their show.
  3. The Internet Archive module harvests the raw video using a standard Drupal view, and transfers it to Archive.org using their S3 API
  4. Archive.org encodes the file in a series of new formats including an OGG, MPEG4, broadcast MPEG2 and generates thumbnails.
  5. Once Archive.org's encoding process is complete, the new download submodule of the Internet Archive module downloads the broadcast MPEG2 file and stores it locally, in our case to a filefield on the show node, ready to be scheduled on our broadcast server.

In theory, the above workflow combined with the internet archive modules remote submodule should allow a media center to handle media file storage, backup, encoding, on-demand streaming and sharing, provided they meet a few requirements.
1. Because of the file sizes, you need a decent internet connection (5Mbs or more ideally)
2. A basic web server with apache and access to your local video file storage.
3. A Drupal website hosted locally or by a remote webhost

Then, in order to submit content to the Open Media Project Collection, all that is required is that the files have a creative commons license.

From the beginning, this solution was designed to facilitate content-sharing among stations, espcially automated, rules-based sharing that could automatically share top-rated content, or subject-specific content across stations. I believe a few other stations are storing or starting to store content on archive.org (Humboldt, BAVC, CTN), and some have committed to Creative Commons licensing but I'm not aware of anyone storing metadata & licensing with their content yet that would make it easy for us to automatically pull it down here at Denver Open Media. If anyone is interested in sharing, please reply here and I'll be happy to make minor modifications to the module or try to help you overcome any other barriers you might be worried about.

Brian

Comments

Looks exciting

jdcreativity's picture

Brian, congratulations to you and the DOM crew for getting this solution developed. I was talking to my coworker the other day about how great it would be to share files that were in an editable format. You are doing that.

I'm a ways off from implementing this solution, although we do manually upload files to the Internet Archive. http://www.archive.org/details/eastoncat We'd like to position ourselves to share in as many sharing projects as possible.

Does this solution exist independent of your playback server? That is, having a Leightronix vs. a Telvue - all are equally capable of rolling into this solution? Also - when you say "2. A basic web server with apache and access to your local video file storage." does that mean a local web server?

Lastly, how do you find the video playback/VOD solution on the Archive? I find it slow sometimes but perhaps there is a way to tweak it.

Hey Jason, thanks for the

civicpixel's picture

Hey Jason, thanks for the comment!

It's great to see you're uploading files to Archive. I think quite a few stations are currently uploading content to Archive, often with Creative Commons licenses. Maybe somebody already has a list going, if not I'll start one and post it up here.

We are currently running both Tightrope and Telvue servers at DOM, both of which are working great with this solution. Theoretically any playback server that supports the ACM Preliminary Video File & Metadata Standards should work with the files derived by Archive.org.

In regards to #2, that does mean a local web server -- but not necessarily one running your Drupal site. The internet_archive_remote module was specifically created to support the configuration of a station hosting their website on a remote host, while storing the broadcast files locally. The module only requires a very basic web server (no Drupal install, just a simple php script) with access to your local files so that it can send the transfer commands to move files from your local storage to archive.org. I've tested the remote submodule a bit successfully, but I don't think anyone is using it currently. If someone does want to experiment with it, I'm happy to assist.

In regards to VOD playback, we haven't seen any playback slowdowns -- but it could be happening once in awhile and we're just not noticing... Archive.org recently went through a physical storage migration which has made things a bit rocky in all aspects for the past week or so but everything seems to be calming down now.

jhauser14905's picture

PEG stations and Community Media Centers contributing MPEG2 Video to the Internet Archive

Brian, I pulled this list together on 3/11/2011 from queries using the advanced search interface on archive.org

format is CMC - Location - year first upload - # of videos on archive.org as of 03/11/2011

Community Media Archive collection:
Access Humboldt - Eureka CA - 2008 - 1,946
Worcester Community Cable Access - Worcester MA - 2005 - 1,625
Falmouth Community TV - Falmouth MA - 2009 - 166
BAVC - San Francisco CA - 2011 - 54 - still in testing
Easton Community Access TV - Easton MA - 2010 - 19

individual collections:
DOM - Denver CO - 2010 - 2,818
CMAP - Gilroy CA - 2010 - 83

generic opensource_movie collection:
CTN5 - Portland ME - 2011 - 28
GRCMC - Grand Rapids MI - 2008 - 188
GRTV - Grand Rapids MI - 2005 - 278 some overlap with GRCMC
Public Media Network - Kalamazoo, MI - 2009 - 81
other - Kalamazoo MI - 2006 - 222
KBVR - Oregon State University - 2010 - 90

not MPEG2 but MPEG4 versions:
Mendocino Coast TV - Fort Bragg CA - 2009 - 517
SCVTV - Santa Clarita Valley CA - 2009 - 1,171 via blip.tv ftp interface to archive.org
Community Access - Yellow Springs OH - 2010 - 107

the media centers listed use a variety of approaches from manual uploading via ftp to fully automated via ftp or S3-like interfaces.

the videos range from the traditional broadcast length (30 or 60 minute run times) to short format video.

i'm sure there are others, especially pioneers that started uploading early on but aren't uploading any more, or centers that upload into the generic opensource_movies collection instead of to their own collection. it's been at least a year since i took a good look at the contents of the blip.tv collection and there may be other stations (in addition to SCVTV) using that interface.

if centers/stations want to get their own collection set up on the Internet Archive, and their videos moved from the generic opensource_movies into their new collection, I can help.

John Hauser
Special Project Manager
Access Humboldt
john@accesshumboldt.net

Hi John, This could not have

jpiazzo's picture

Hi John,

This could not have come at a better time for us. We, Open Stage Media, P.E.G. in Schenectady New York have been experimenting with the Archive, particularly our Government programming. I have currently been posting as the City of Schenectady - all over the place I think ;-).

Anyway, I would love Open Stage Media to be a sub-collection under community media (and even have our own sub-collections, under OSM for Public, Government, and Arts & Education...

Still not completely up and running as to how to set this up. We are manually uploading at this point - but hope to move to an web / automated system integrated with our new Drupal based website. Anyway, any help or advise you can give me would be appreciated.

BTW - is there a way to "edit" or add files to a record beyond the 3 day limit. Particularly with our government meetings - it often takes me a while to collect the ancillary documents I like to upload along with the video?

I can be reached directly at jpiazzo@proctors.org.

This is great but...

inertialacoustic's picture

I've submitted content via the internet archive module successfully. However, I would like a user to submit a higher quality video. Our current PHP limitations lock us down at 50MB, is there a way to make us talk DIRECTLY to the s3 bucket we create with archive? If that's the working case, then we can upload higher quality (4-5GB) videos.

Is that possible?

There is no support for

civicpixel's picture

There is no support for directly uploading content to Archive.org in the current version of the module. I'm thinking about exploring that in the D7 version which I started on a week ago, but it's probably a ways off as it's not a simple task. Many of the files our organization sends to archive.org are in the 4-8GB range, but we get around the PHP limitations by having users either ingest the file here at the station locally, or upload it to our ingest directory via FTP -- then we use http://drupal.org/project/filefield_sources to allow the user to select the file from the ingest directory and attach it to the node. After that there are no PHP upload limitations to worry about.

I would love to help.

inertialacoustic's picture

I think this module has a lot of potential.

If we could get the user to communicate with archive without the use of their server, this module would be tremendous.

Could we do it with some form processing language other than PHP?

If you use an upload process

kreynen's picture

If you use an upload process outside of Drupal in D7, you can use http://drupal.org/project/media_feeds and http://drupal.org/project/media_archive to display the content on your Drupal site. This really reverses the workflow. Instead of the files passing through Drupal -> Archive.org, each producer gets their own Archive.org account and simply registers the feed.

inertialacoustic's picture

http://doc.s3.amazonaws.com/proposals/post.html

This seems like a good and valid use of direct s3 application. However, I think this isn't necessarily the most secure way of processing such information as people's Access keys would become public with a little bit of hacking.

That post method would be

civicpixel's picture

That post method would be useful, unfortunately I don't think archive.org has any plans in the near future to set that up. I have been experimenting with using the jquery upload plugin to send directly to archive's S3: http://aquantum-demo.appspot.com/file-upload, which does work. The challenge is getting it cleanly integrated with media_field and I'm only intermediate at javascript development. I'll probably keep working on it in my spare time, as I'm also interested in being able to "chunk" uploads using the blob api (so that our producers can upload large files remotely with resume support in their browser) which that library also allows... but if you or someone else has the energy to integrate a library like that handling the actual S3 transfer is not all that difficult.

Get back to me about that!

inertialacoustic's picture

I would love to help out however, my javascript skills are good.

This module might be of some help and is currently in development. I'm sure you've messed with it, Uploadify.

This would allow a nice looking progress bar too.

It's nice that uploadify is

civicpixel's picture

It's nice that uploadify is already integrated with filefield.. that would probably make this a much easier project for the current D6 version of internet_archive. With a bit of fiddling it's not too hard to use uploadify to send straight to IA. I'm trying to focus energy right now on the D7 version and creating an engine to support the Derivatives API effort but would be happy to provide support to anyone adding uploadify/s3 support to the D6 version. I wish there was a D7 port of Uploadify started, that would make it even more appealing as I think porting that to media is going to be no small adventure...

see my comment in the main thread of this topic

jhauser14905's picture

i had intended my comment as a reply to your item, but it went into the main thread.

take a look at http://www.archive.org/help/abouts3.txt instead of amazon documentation.

Just to clarify

jdcreativity's picture

for us laypeople we are talking about at least two different approaches to working with the Internet Archive.

Humboldt pulls the videos off the server at moves them onto the archive. I'm not entirely clear (and it probably should be at this point) if there is one or more specific Drupal modules involved in your workflow.

Denver pulls the raw files onto the Archive and uses specific modules - this is the Open Media Project as it was envisioned. I'm not clear (again, I should be) how many Open Modules are dependent on this flow.

And where does the CC uploader tool fit?

How does the ACM shared server projects fit? How great would it be to check off on my Telvue a button for the Internet Archive and boom. But if I was doing more of the nuts and bolts programming outside of Telvue, maybe a better solution exists.

The Internet Archive module

civicpixel's picture

The Internet Archive module is not dependent on any other modules in the OMP.

It is however highly encouraged to use it in combination with the Open Media Show and Creative Commons modules. With Open Media Show enabled, the Internet Archive module stores the fields/data listed at the bottom of this post with files uploaded to Archive.org. Enabling Creative Commons results in storage of licensing information at Archive.org. With those two additions, media centers can subscribe to content with a given license, PBCore Genre, series, etc and have it automatically downloaded and integrated into their workflow.

Link to Archive's S3-Like API Documentation

jhauser14905's picture

please refer to this document rather than any Amazon documents if you're going to discuss/investigate uploading to archive using the s3-like interface without PHP or Drupal.

http://www.archive.org/help/abouts3.txt
it shows examples of using curl to upload items to archive.

several of us have used the techniques shown to upload hundreds of multi-gigabyte videos to archive.org successfully.

and as noted, "POST and COPY aren't implemented."