Drupal 8 media sprint report

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

As a follow up to the Prague media/file handling core conversation on Wednesday a group spent most of the contribution day brainstorming and finding concensus on a way to clean up the media situation in contrib for D8. As Crell suggested during the core conversation we have an opportunity to challenge current assumptions and "blue sky".

D8 media sprint

The initial architecture simplifies media handling and is centred around a new "media item" entity type who's bundles can be fielded with a variety of field types to provide the actual media resources.

Media architecture

In short the file entity module will be no longer necessary for media handling solutions under this model and media browsers, widgets and so on will build on top of this lightweight entity. We are taking advantage of entity reference being provided by core in D8.

It will no longer be assumed that the media resources on a media entity will even be files opening up the possibility of external resources and media resources like tweets that will not be managed by Drupal's managed files system. This is similar to how the Scald project. Representatives of that project were involved in the sprint as well.

A set of initial meta tasks have been sketched out and will be added to the new media entity project page shortly and include:

Core media entity module:

  • media item entity
  • entity reference field widget for embedding media items into other entities
  • media item field formatter for these entity references
  • example module that provides an image field based media item entity bundle

Some non-media entity contrib work was also identified such as WYSIWYG support and inline entity creation / editing.

The group agreed to follow the "first Friday" initiative and have future meetings during the first Friday of each month. Stay tuned to this group.

Comments

Very intersting...

cosmicdreams's picture

In Drupal 8, we have the entity reference field in core. What more will we need in order to reference media item entities to other bundles?

Sounds like most of this work will be getting the field formatter and field widget right.

Software Engineer @ The Nerdery

That's indeed sort of the

mallezie's picture

That's indeed sort of the plan, to use the entity reference field, and create different formatters and field widget.

One thing is that we will

snufkin's picture

One thing is that we will need to support is referencing via other fields, e.g. the way media solves embedding is a special markup in the textfield. I am not convinced that this belongs conceptually to media entity, but it needs to be solved regardless.

What is the purpose of Media

Dave Reid's picture

What is the purpose of Media module in this architecture picture then?

Senior Drupal Developer for Lullabot | www.davereid.net | Gittip me!

There definitely is!

slashrsm's picture

Idea is to slice the ecosystem into smaller, decoupled pieces. That will give us a very solid foundation for different modules to work on. We will definitely need modules that will provide various parts on top of that (UIs, various plugins, ...). We will probably need something that will glue all those pieces together to make installation, configuration and usage easier. That is definitely one example where Media module could definitely fit in.

Decoupled components will bring many benefits. The whole ecosystem will be much easier to maintain and it will hopefully help us to distribute load among more people. It will also allow different approaches on the same basic foundation. Imagine today's Media, Assets, Scald, etc still existing with a bit different experience, but living on a same set of basic APIs. This will save us resources, allow integration and still enable people to innovate and build solutions that fit their needs and expectations.

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

I'm not sold on needing a

Dave Reid's picture

I'm not sold on needing a media entity. It's two layers of referencing. People want to have nodes and Views and other crazy stuff in their 'media' browser. Where does it stop?

Senior Drupal Developer for Lullabot | www.davereid.net | Gittip me!

It is not just about nodes

Volx's picture

It is not just about nodes and views as media, there are other media types (or at least things that some people like to be handled as media) that are not based on a file, like Tweets, Facebooks statuses. Other types of "media" may consist of more than one file, like galleries, images prepared for reponsive websites and retina displays, where images styles may not be good enough for some people.

Also some people may need two layers. Think about stock photos, where the licensing may be handled for a specific file, but the file is used as part of different "media" items where some metadata like title and caption are completely different.

And yes some people even see articles as media items, why should it stop there? I agree with you that this a bit strange and I would not recommend doing it, but let others do it if they choose.

The point is to find a common basis to work with, that allows to implement as much use cases as possible. Of course it is important to not expose the end user to all the referencing and have a great UX, but that is absolutely possible.

I'm very inclined to agree

willyk's picture

I'm very inclined to agree with Dave on this. I would not be intermingling nodes, views etc. into the media browser. If we look at other leading and competing platforms that's not what they do from what I've seen.

What other "leading and

kreynen's picture

What other "leading and competing platform" has something that's the equivalent to Views?

Embedded widgets

Devin Carlson's picture

In regards to embedding media into textareas using WYSIWYG, I've been investigating a solution that uses the new Widgets API that was recently added to CKEditor.

It solves a lot of the problems that Media is currently trying to deal with, such as the WYSIWYG editor often stripping out valuable information, being able to easily move the embedded widget around and having a context menu for editing the widget's properties.

Just what I was thinking

martin_q's picture

Yes, I was excited by the new CKEditor widgets API that was demo'd in the CKEditor for Developers BOF at Drupalcon Prague, and I think it will make things nice and simple (or at least nicer and simpler) for media embedding!

A few questions

lslinnet's picture

So if I am understanding this correctly, the basic gist of it is to have a 3 layered setup where the node references a media entity which references a resource.

How is this different than what the file_entity allows you to do today? or is it basically just an idea of extending the file_entity to be a resource_entity handling the URI and meta data for that single resource?

Have the complexity of multivalue fields been considered on the media_entity? should it be possible to have a multivalue field which points at a resource type (file, youtube, twitter, website)?

I am not quite convinced by the outlined approach here, but seems like there is a potential to adjust it and end up with a solid foundation.

If File entity is extended to

jcisio's picture

If File entity is extended to be a Media entity, then from the OOP point of view, it is no longer File entity. There were some discussions in the past:

https://drupal.org/node/1874994#comment-6917816
https://drupal.org/node/1602218

A media is atomic: one media is one tweet, not two. However, a node, or any entity can reference to multiple media. So a media collection (read, "gallery") can reference to multiple media, each can be a tweet.

Code Smell

cosmicdreams's picture

So, to me, anytime I see a 2 levels of references it seems like a code smell. What does two levels of references give us that we can't get with one referred entity + fields?

Software Engineer @ The Nerdery

It needs to be seamless for a user

slashrsm's picture

There are various use cases that will leverage from this approach. Imagine slideshow, photo or video gallery. The gallery itself can definitely be treated as a single, reusable media item with it's own metadata. This requires reference to multiple files, which cannot be done without that.

There is also argument about the external media items. Some people think that those do not belong into file entity.

It is similar approach to Drupal commerce and it's products, product displays, line items, ... They proved that it is possible to make good experience with proper UIs.

We have this very powerful APIs in Drupal code (Entity API) and I think we shouldn't be afraid to use them.

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

Hmm I'm of the mind that this

redndahead's picture

Hmm I'm of the mind that this is taking it too far. I wouldn't consider video galleries media. They are views. Slideshows are views also. Why not have the media browser be able to handle that? Not sure why we need another entity to define that a view is a media item. Going down this road how different is this from entity reference?

Exactly the problem with with

mErilainen's picture

Exactly the problem with with current media solutions. They are trying to provide users with fixed tools, when site builders want to solve special requirements.

As an example, let's say I want a gallery which:
- Has a title
- Description
- Location (name/tag/address/?)
- Thumbnail image
- Author
- Video field
- 0-n images

So it's definitely not a simple view.

Image (media_entity) might have an uploaded file, or linked image. It can also have a separate title and caption fields. Geo-coordinates. Date. Exposure.
Videos might be local also, or linked from Youtube, Vimeo and the like. If they don't need any extra fields, they can be just simple fields of the gallery content type. But the media_entity works like entity_reference with some additional field formatters and display mode support for other modules to extend.
Naturally this concept needs iterations, because it was the work of one day during the contribution sprints.

What user?

yoroy's picture

What user do you mean? It's very different what 'seamless' means for a developer working directly with these APIs, a site builder working with the UI's or a content creator working with the subset of UI that site builder exposed for her.

'Seamless' can become the direct opposite of 'drawing the border between APIs' if you don't make a clear distinction between the roles (dev, builder, creator) you are argueing for.

I was referring to site

slashrsm's picture

I was referring to site builders and content creators.

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

One of the arguments to make

larsmw's picture

One of the arguments to make an extra Entity type was to be able to swich storage backends. Much of this functionality might already be in the Storage Api module.

A second argument is to simplify some solid base functionality which can be easily extended to more advanced scenarios. The goal is to have a few "simple" modules with a solid basefunctionality on top of which we can build more advanced implementations.

mongolito404's picture

The URI of a File entity can be changed. And the URI of a file entity does not need to be a local file. AFAIK, the URI of a file must be supported by a PHP Stream Wrapper. So, if one was to provide Stream Wrappers for Amazon S3, YouTube and Vimeo, a File entity could be created firt with a local file URI, then update to use a file stored on S3, then a YouTube stored video and finally a Vimeo stored video.

Keep the separation between "media type" & "provider"

yched's picture

My two cents from an evaluation I made a couple months ago to choose between Scald and Asset on a client site:

One critical piece of the architecture that should be retained is the separation between the media type (e.g "video") and the various providers for this media type (local file, youtube, vimeo...)

Each media type needs to be able to support an arbitrary number of "provider" plugins - and the mechanism that supports this pluggability needs to be provided by the "core media system", rather than be (if at all) reimplemented by each media type separately (which is IMO what scald got right and asset got wrong)

That's exactly what we are

larsmw's picture

That's exactly what we are talking about as resources in the diagram! :-)

The diagram doesn't make it

yched's picture

The diagram doesn't make it fully clear whether two resources in resp. a youtube field and a vimeo field will have something in common (both are treated as "video" medias) or will be completely unrelated (which would suck)

Implementation details of

slashrsm's picture

Implementation details of this part are not defined yet. There is an issue that tries to deal with that: https://drupal.org/node/2100515

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

Some discussion about that

slashrsm's picture

Some discussion about that topic also started in https://drupal.org/node/2103293.

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

Good point

slashrsm's picture

I added your comment to the issue that deals with this.

https://drupal.org/node/2100515

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

Image metadata & more ideas to consider

miro_dietiker's picture

I completely miss the consideration of structured metadata of the origin media file (such as JPG / EXIF or also videos with information like length, codec, ...) This should be extracted, stored and available for further processing / query.

Also, the concept of media collections seems missing. I'm not so sure, this should be passed to something like a media type Gallery with a N-media reference. Should the media system itself possibly introduce reusable collections?

Further topics i miss is
- Mass upload
- How to deal with different scenarios of private media (user owns with use only for himself) VS huge media collection (a managed repository that needs global order/structure/tags)
- Cropping and further effects (per use, per field instance, ...)
- Media text representation (for indexing/search) for instance of PDFs

Media entity's fields vs. PHP Streams

mongolito404's picture

I was not in Prague and did not participate to the discussion, so this may have already been discussed.

If I understand correctly, the idea of the Media enttiy type is to allow different ressources providers to be implemented (local files, youtube vidéos, etc.). So at the lower level, the field types are ressource providers, developpers wanting to add support for new providers use the Field API. And at the Media API consumer level, the Media entities provide the known Entity APIs to manage the media and their meta-data.

But Drupal already offer two layers: Files and PHP Streams. At the lower level, a File in Drupal reference an PHP Streams using a URI. And a PHP Stream is just a seekable, readable and optionnaly writable stream of bytes. So at the lower level, ressources providers are Stream Wrappers. And for a Media API consumer level, the File entities provide the known Entity APIs to manage the media and their meta-data.

For developpers, using PHP Streams may be harder than working with the Field APIs. But it avoid adding another layer and provide a clear, basic and simple definition of what a ressource is (and is not). Also, it provide a standard interface to the ressources, allowing, for instance, generic meta-data extraction from media ressources (eg. extracting EXIF from a Stream). Something that the Media entity can not achieve without adding yet another (custom) layers.

Have the PHP Streams and Stream Wrappers considered at the Prague Media meetings ? Except the, maybe, harder API and new concepts to learn for developpers, what are the reasons for not using them?

Note: If I'm not mistaken, the current Media 7.x-2.x module implements, or aim to implement, (remote) ressources providers (such as YouTube of Flickr) as PHP Stream Wrapper.

It can be, or not

jcisio's picture

The whole point is we don't do any assumption here. And we don't want to overtake the file_managed table.

File -> PHP stream is exactly like Media entity -> media resource, which itself a stream or not, we don't want to know. It's up to the resource provider to decide. Of course, a resource provider can decide its "media resource" is a "reference to a file", and only here it adds another layer to be able to reuse some contrib modules or core functions.

This is not to say that there is no other problems building Media on top of File Entity.

But in Drupal 8, files are

mongolito404's picture

But in Drupal 8, files are (classed) entities so there is no need to overtake the file_managed table?

Conceptually, yes the "File Entity - PHP Stream" relation is like the "Media entity - Media Ressource" relation. And that's why I'm questionning the need to introduce the "Media entity" and "Media Ressource".

Yes, using PHP Stream introduce the assumption that a media ressource is a stream of bytes. But that does not seem unreasonable. It also clearly define where the meta-data of a media is stored (ie. the File entities). Which again does not seems to be a unreasonable assumption. And if a media actually contains its meta-data, extractors could be added as plugins. And because the extractors deal with PHP Stream, they are naturally decoupled from the media ressource provider. An ID3 meta-data extractor does not need to known anything about the Stream Wrapper.

Some media ressources use cases:

  • A Tweet could be a media. The Stream Wrapper provide the JSON representation of the tweet as a byte-stream. A Tweet meta-data extractor can extract the author, time, etc. information fron the JSON. While a Tweet formatter could format the Tweet's JSON.
  • A image hosted on Flickr can be provided through a generic HTTP Stream Wrapper. Meta data could be provided through a generic EXIF extractor or by a generic oEmbed extractor (able to detect that that a HTTP URL to a Flickr file can be resolved by the Flickr oEmbed end point).
  • A video hosted on a YouTube could be provided through a dedicated PHP Stream Wrapper which does not provide the byte stream for the video itself. Instead it provide a JSON representation of the video, containing it meta-data. Dedciateds YoutTube formatter and extractor are used.
  • Or another PHP Stream Wrapper could expose a YouTune video as the stream to the actual video. Generic video meta-date extractor and formatter can then be used. Without preventing specialized extractor and formatter to be used too (because they see the YouTube URL and can act accordingly).

On the WYSIWYG side, media embedding could be based on oEmbed. decoupling the plugin from both Drupal and the way media are implemented in Drupal. Off course, the Media would expose a oEmbed endpoit for all the managed media. For remote media ressource hosted on a platform which itself expose an oEmbed endpoint (Flickr, YouTube), this endpoint could simply act a proxy (enventually decorating the results with additional data) While for a media ressource from a non-oEmbed provider (local files, S3, etc.), the Drupal side would use its formatters and extractors to provide the oEmbed data.

As I said, it is a generic

jcisio's picture

As I said, it is a generic approach, and it does not prevent you from using File entity, or Stream wrapper (read: URI, not File entity). But it does not force you to use them neither.

Media entity vs File entity was discussed long time ago. I've just compiled a list of interesting points at https://drupal.org/node/2101855 (disclosure: I'm a Scald co-maintainer).

I really appreciate your

slashrsm's picture

@mongolito404: I really appreciate your comments and you definitely have some very valid points.

I wouldn't agree that "Media entity -> Media resource" and "File entity -> PHP stream" are the same kind of relations. They are definitely similar, but they are not the same. PHP Stream makes a lot of assumptions about the thing it brings to PHP. Media resource do not. I have this feeling all the time that we're trying to put everything in file_managed (i.e. PHP stream) although we have smaller or bigger problems with that approach. I admit that there are use-cases that can be debatable (YouTube, Vimeo, Flickr, ...), but I still believe that there are lots of them that simply don't make sense with PHP stream:

  1. Tweet. Yes, it could definitely be saved locally in a JSON format and used like this. But this is a IMHO a hack (it is similar to saving serialized entity in a entity reference field). Tweets can change, be deleted, retweeted, .... Tweet URL/ID is the only thing that should be stored. Everything else should be fetched from the source.
  2. Photo gallery/slideshow can be often treated as a single media item (with meta-data in context of entire gallery, etc...). This could also be implemented via PHP stream if we would create a tar archive and reference that from file_managed. You'll probably agree that this is not the way to go.
  3. Video in different formats. It is effectively one video, but you want to serve it in LQ, HQ, YouTube, Vimeo, archive.org.... With this use case we cannot even use the tar approach as we have combination of local and remote streams, which of course cannot be stored in a same local archive.
  4. Remote articles. Yes... there are sites that want to treat articles as media. I admit. It is a but funky and even I was surprised when I heard this. But it is a valid use case when you think about it. One thing that we should never do is make assumptions about how are people going to use our software.
  5. Most of bigger publishers have 3rd party pubflow systems that they want to integrate their sites with. I believe that "Media entity" approach makes this kind of integrations much easier than "File entity" approach. That's, again, simply because there is no assumption about the resource it will be referenced.

I am completely aware that there are concerns about this approach. The biggest approach is probably increased complexity, which I strongly believe that can be minimized. For users, content creators, site admins and site builders can be almost completely saved from this complexity if we provide simple and consistent UIs. For developers it is a bit harder, but it can be still reduced with good documentation.

There is another conclusion from Prague that I also find very important, which is not related to media entity at all. I really believe that we should go with the approach of decoupled, unified and integrated components. Media is such a big problem space that can definitely not be handled by a single solution. With this approach we don't need to do that anymore. We can create different solutions that just glue together different components in a different way. We could have two general solutions for example. Simple one would provide most of important features (media library, mass uploads, ...) but with simpler configuration (simple display configuration on a field formatter level). Super-duper-funky-advanced solution would provide similar features with more complicated and powerful configuration (entity display configuration as we know from File entity, ...).

This way we satisfy very broad set of sites, while still relying on same API, which makes both solutions exchangeable.

Sorry for such a long comment. :)

Janez Urevc - software engineer @ Examiner.com - @slashrsm - janezurevc.name

I continued the discussion at

mongolito404's picture

I continued the discussion at https://drupal.org/node/2099735

tl:dr: There are two approaches: media as files (aka. mimetyped streams of bytes) and media as structured contents managed in a library. IMHO, the current architecture addresses and favour the second approaches, adding unnecessary complexity for those in need of the first one. And it does not need to be that way (ie. the second approach could be build on top of the first, while the reverse is not true).

Thanks jcisio, for the links.

mallezie's picture

Thanks jcisio, for the links. I followed the discussion in prague, but i'm not aware of any of the exact technical implications;
Thanks mongolito404 for your input, it raises some valid points. We should really use existing tools, and not aim to make another possible solution, if one already exists.
An interesting read is het Scald FAQ https://drupal.org/node/2101855
If i map this article to the media-discussion in prague. Sharthanded the discussion in prague, came to some similiar conclusions.
Perhaps it's not really the right conclusion, but that's how i understood it.
Main concern and consensus in the discussion was:
"Media implies a media is a file, Scald does not. Scald believes file is just a special case for a media entity."
Which sort of lead to the 'very simplified, not entirely correct' conclusion: we like the Scald approach, but wan't it in a more Drupal-agnostic way. With the known drupal 8 terminology, and UI.

I am not certain what is

aaron's picture

I am not certain what is intended to be accomplished here. I have always liked the idea of having a unified approach to storing file/media entities, but what is different between the proposed media entity and file entity? The way that I read this, as it stands, is a difference in semantics. Please correct me if I'm wrong.

Aaron Winborn
Drupal Multimedia (my book, available now!)
AaronWinborn.com
Advomatic

See

mongolito404's picture

See https://drupal.org/comment/7940615#comment-7940615 for a discussion on why file entity is not enough. The main reason, as I understand it, is that a media is not one and only one file. Even if you use a very broad definition of files as "mimetyped streams of bytes."

For instances:
- A Youtube video is not a file.
- A tweet is not a file.
- For some user, a .mpg file, a .mov file and a .srt file, togheter are a single video media.

Using the File entity force a media to be a file (which may, or may not, reference other files) and each file to be a media. Using a separated entity type allow a media to be zero, one or multiple files, without needing each files being a media on their own.

Media

Group organizers

Group categories

Group events

Add to calendar

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: