Next steps for Drupal 7 WYSIWYG

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Gábor Hojtsy's picture

Now that the #input_format key is in Drupal 7, I'd like to propose a hook_content_editor() or something which would at first, let Drupal get the "markup support" information from visual editors. That would let tinymce or fckeditor tell Drupal that they support HTML, so Drupal would offer them to be set up for input formats "of this kind". Also, bbdeditor would report it supports "bbcode", etc. This would be great to help admins set up editors for formats. There are a few interesting bits here though:

  1. How would Drupal know a format is "of HTML kind"? If the format contains only one HTML filter, it is obvious. But if it allows HTML and markdown format for example, both editors should be allowed? What if HTML is escaped, then markdown is applied and then HTML is filtered for possible XSS generated by markdown? A human could tell it is a markdown format, but a machine will see HTML forbidden and then HTML filtered, huh :) We can always build a state machine to run through with the filters set up, but does it worth it? :)

  2. There are generic editors like bueditor, where buttons can be set up to generate either HTML or bbcode, or whetever other markup. These could not be put under the limitation of one input markup format.

So I am not entirely sure whether we can reliably and generally execute on this idea, however it would help in configuration to reduce the possible options in setup. If we think people might not have too many editors set up on one site, this might not be a real issue after all.

Anyway, I think next step for Drupal 7 core is definitely a way in Drupal 7 database to associate editors with input formats, even if we don't provide filtering with the above mechanism for the admin. What do you think?

Comments

Input format associations

sun's picture

The last major improvement was to tie client-side editors to input formats instead of form elements (textareas). This means, that we now are prepared to load and display a different editor for each input format, or none at all, f.e. if "PHP code" is selected. (btw, this works for D5 and D6 already)

This has a great impact on the issue you raised. It resulted in four tasks already on the task list:

  1. Wysiwyg implements so called "profiles", which define the editor to use, the configuration for the editor, and most importantly, which buttons/plugins are available. Since editors are tied to input formats now, the administration UI must allow to associate Wysiwyg profiles with input formats.
  2. When the Wysiwyg profile configuration is directly tied to an input format, it probably makes sense to move and integrate the complete Wysiwyg profile configuration into the input format configuration page. Hence, if a user alters the allowed markup of the HTML filter, the available editor buttons/plugins are directly altered, too. This would not even require proper degradation of the involved JavaScript, since client-side editors per se require JS to work at all.
  3. Since there is not only HTML filter of Drupal core, but also HTMLpurifier and htmLawed, Wysiwyg API has to be able to gather information about the allowed markup from various input filters. This has been discussed in http://groups.drupal.org/node/15643. I am not yet sure how this could happen, but basically there are two options: either Wysiwyg API implements support for certain input filters only, or for example, we enhance hook_filter() with $op 'allowed markup' and maybe also 'required markup', the former providing structured information about markup a filter allows, and the latter to allow an input filter to tell which markup it requires. While being slightly OT, 'required markup' would primarily help users to not setup input filter combinations, which do not work out at all, f.e. having Image Assist's inline image filter in front of the HTML filter that may be configured to strip DIV, SPAN, and IMG.
  4. Last but not least, since editors/Wysiwyg profiles are tied to input formats, we might be able to rip the complete access configuration for Wysiwyg profiles, since input formats are granted to certain roles already. I still have to think about possible use-cases for having more than one Wysiwyg profile associated with one input format, but also Nathan (quicksketch) agreed in an IRC discussion that we probably can leave this for contrib.

Also, to specifically reply:

Now that the #input_format key is in, I'd like to propose a hook_content_editor() or something which would at first, let Drupal get the "markup support" information from visual editors. That would let tinymce or fckeditor tell Drupal that they support HTML, so Drupal would offer them to be set up for input formats "of this kind". Also, bbdeditor would report it supports bbcode, etc. This would be great to help admins set up editors for formats.

IMHO, it should work the other way around: users define input formats, and may or may not associate an editor to each input format. If editors "depend" on an input format, there is minimal need to validate the assocation itself programmatically. Because, from a UX perspective, it would be illogical to setup a BBcode input format, and associate a HTML editor like TinyMCE/FCKeditor to it.

1) How would Drupal know a format is "of HTML kind"? If the format contains only one HTML filter, it is obvious. But if it allows HTMLand markdown format for example, both editors should be allowed? What if HTML is escaped, then markdown is applied and then HTML is filtered for possible XSS generated by markdown? A human could tell it is a markdown format, but a machine will see HTML forbidden and then HTML filtered, huh :) We can always build a state machine to run through with the filters set up, but does it worth it? :)

Hm. While markdown is rather an edge-case, this brings me back to the hook_filter() $ops in point 3. above.

2) There are generic editors like bueditor, where buttons can be set up to generate either HTML or bbcode, or whetever other markup. These could not be put under the limitation of one input markup format.

Given that Wysiwyg profiles (editor settings) are tied to input formats, a user is able to setup different configurations for BUeditor for different input formats.

So I am not entirely sure whether we can reliably and generally execute on this idea, however it would help in configuration to reduce the possible options in setup. If we think people might not have too many editors set up on one site, this might not be a real issue after all.

Actually I'm rather working towards encouraging users to have and use several editors on one site. My vision is along the lines of having a fully-fledged HTML editor for admins on "Pages", an editor that understands PHP code for developers on a PHP code input format, a BBEdit(or) on "Forum posts" and comments/replies that users can change into a BUeditor/jWysiwyg/markItUp editor if they prefer so, simply by switching the input format.

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
netzstrategen

make it work

Gábor Hojtsy's picture

The last major improvement was to tie client-side editors to input formats instead of form elements (textareas). This means, that we now are prepared to load and display a different editor for each input format, or none at all, f.e. if "PHP code" is selected. (btw, this works for D5 and D6 already)

Ok, I understand you moved over this and now look for more exciting stuff to do, but this is still the next step for Drupal 7 core, and lamenting on the next details is not an option until this is not achieved. If this works with Drupal 7, you'll get many more eyes on your code, since Drupal 7 would be on par with your progress and have the same next step problem you are at now. Until then, Drupal core is behind and those who'd work on that will not reach your questions just yet.

...

sun's picture

Tying editors to input formats and allowing to load different editors and editor configurations on a single page was definitely not that easy as it may sound. So, having that functionality at all is a major milestone, but still has to mature.

There is not much code/functionality behind this association - effectively, all forms are scanned for form elements added by filter_form(), and if one is found, the configured editor profiles are loaded and attached to the available input format(s). #input_format in D7 simplifies the form processing, but until now, the API needs its editor profiles (not the editor integration files) to work at all, so I currently do not see any parts of the API that could be separated out. Anyway, the following screenshot shows how the new profile configuration form and input format association looks:

Coming back to point 1) of your list:

If we want to limit the possible options to editors that are actually compatible to the language used in an input format, I think the most trivial approach would be to add an "input language" selector to the input format configuration in Drupal core. By passing the language information to hook_filter() implementations, filters would be able to expose or hide themselves for certain languages. The default (and probably only) language in core would be "HTML", but for example, PHP module would add "PHP" to the select list, and a contrib module could add "XML" or even "BBcode". Markdown filter, however, would still be a special case, since it defines grammar, not language. If I'm not mistaken, it could work with HTML, XML, BBCode, but not PHP.

Based on this information, Wysiwyg API was able to limit the available editors - or depending on the editor - automatically re-configure the editor to use that language. For instance, TinyMCE has built-in support for XML, FCKeditor seems to support BBcode and Wiki-syntax, and based on point 2) of your list, BUeditor seems to support arbitrary languages.

This would effectively mean that input formats would no longer be just containers for input filters, but start to describe the format / language.

Interestingly, I just posted a note about what makes up an input format in UMN usability: Rename 'input formats', which already contains the point "language", although I didn't thoroughly think about it like in this post.

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
netzstrategen

Nice to see!

magico's picture

@sun: I was testing your last API and I must say that the "input format" is nice piece of work around one of my ideas. By the way, the development done by fall_0ut (WYMeditor) was the demonstration of this idea (perhaps you could give him some credit...)

  1. Agreed. This was implemented within WYMeditor
  2. Yes, I already asked for this. We need a way to have different configurations easily integrated in the input format
  3. The last barrier is always the editor. Do you think that editors (like they are know) will be able to use different kinds of markup implemented within filters?
  4. Like within WYMeditor, the editor does not need to control permissions through the main access control, because the input format will do that already. IMHO, one input format will be associated with only one editor.

I agree with all you said.

Nice work!

...

sun's picture

Thanks for your feedback. Most often I just need a few thumbs up/down to be sure how to proceed. :)

Re: Credits: Well...

  • the whole plugin architecture is forked from Panels, thanks a lot to merlinofchaos and sdboyer
  • the idea of identifying an input format selector/formatting guidelines in a form is based on WYMeditor's incomplete implementation, Gábor's write-ups about input format support, #translatable's crazy approach on walking through forms and also the #input_format patch that already hit D7
  • the idea of using function callback stacks for attaching/detaching editors is based on core's Drupal.attachBehaviors() by someone
  • the about-to-be-altered profile configuration is almost completely based on the obsolete TinyMCE module, which had about 10 maintainers until now
  • support for TinyMCE 3 was first coded by Katherine Bailey
  • ...

If I would add credits for each and every single thing... oh well. Instead, I'm rather thinking of removing credits altogether.

btw: I've tested only a few editor modules in contrib so far. It was Hannes Lilljequist (zoo33) who originally pointed me to WYMeditor module's approach. So if anyone knows of worth-to-consider features/approaches in another editor integration module in contrib, please let me know!

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
netzstrategen

Glad to help

magico's picture

Impressive list...

btw, just for the record... http://drupal.org/node/125315 (comment #29, #31 and #36)

heh. I must have skipped

sun's picture

heh. I must have skipped the first follow-ups when I worked on that issue ;)

Well, if you already imagined that one year ago, I'd be highly interested in a "magico's vision of input format handling and editor support in Drupal" document here at g.d.o :) (like my own about Wysiwyg/Inline API)

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
netzstrategen

Backwards

dragonwize's picture

From a user perspective, this seems backwards:

  1. Configure Drupal allowed/disallowed (tags, attributes, etc) filter
  2. Add editors to formats
  3. Configure editor buttons with the buttons that use only the configured tags set in #1

Users, even most developers, do not know what exact HTML or other markup a each editor will use to achieve an effect. Some may use spans, divs, inline styles with certain attributes, old style HTML4 tags, markup, markup with attributes, etc. However, each editor defines (or can be found by testing) how it spits out code.

I think it makes much more sense to use this process:

  1. Configure a format with an "auto-config" filter
  2. Add an editor to the format
  3. Select editor buttons which automatically configures the allow/disallow filter in #1 with the appropriate lists for the editor
  4. Allow additional configuration to either the "auto-config" filter for admin overrides (like never allow applet), or stack another filter to control admin overrides. This could also be moved into step #1 where it would then limit the buttons as is being described in this thread but then it would be an advanced feature instead of a required one.

This makes it very easy to use for users and developers alike. It also has the benefit of being about to quickly and easily change your configuration by just adding or removing buttons or configuration options with your editor.

Different things

markus_petrux's picture

When one configures the editor, it is thinking about features, but when one configures the input filter, it is (or should be) thinking about security.

If the input format was auto-configured from editor features, it could happen that certain tags and/or attributes are enabled, for example onmouseover, etc.

Security

dragonwize's picture

Security is defiantly something of great importance. However, very few tags are security related issues, especially when you start considering markup tags like bbcode, etc. But tags like p and span are not a security solution either.

What I am suggesting is something similar to what WYSIWYG filter has done currently. Basically allow all tags and attributes to be configured through the auto-config but have a black list of security related tags. In this manner, all the developer has to do is to worry about what level of security he wishes to impose. This admin blacklist filter would come with defaults of best or common security practices as does the current Drupal HTML filter.

While I say that the auto-config will allow all tags, that is a bit of a misnomer. It will only be allowing the tags the editor needs to accomplish its job. It will just allow the editor to configure any tags to allow.

Also since the editors themselves are not equipped to spit out exactly the tags and attributes they need, this all has to run through an API hook system that will provide that information for a particular editor. Which will be done when the editor is prepared to work with Drupal, like it is done in WYSIWYG API now. At that time the Drupal community will be scrutinizing security issues, if any, with a particular editor feature. As with everything in the world of open source, the larger editors like TinyMCE and FCKeditor will be more secure because more people will be looking at them, they are probably more secure already because of that fact.

Input filtering before input

sun's picture

Please also note that during the work of integrating further client-side editors I discovered that most of the smaller editors (i.e. not TinyMCE or FCKeditor), do not have a built-in security filter. TinyMCE (and I think also FCKeditor) has a configuration option that sanitizes the contents of a textarea in front of rendering it in its IFRAME. Other editors do or may not have this feature at all. This means that we need a solution that works independently from any editor.

That said, at least TinyMCE's sanitizer is horrible and a pain to configure. I would be happy to disable it when we found a better solution that works for all editors.

Daniel F. Kudwien
unleashed mind

Daniel F. Kudwien
netzstrategen

Can Wysiwyg API provide the bridge?

markus_petrux's picture

The input filter could give you the data to setup the valid_elements option of the editor, if supported. There are also other options that the input filter could provide. For example, the invalid_elements option of TinyMCE, and maybe rules for class names, ids, ... as in example, these options are already managed by the WYSIWYG Filter.

Sorry, I haven't followed the latest changes of the Wysiwyg API. Is there an API that the input filter could use to configure the editor?

Editor cleaning & design considerations

dragonwize's picture

As sun has mentioned above, one of the big issues with passing valid and invalid elements to an editor is that many do not have filtering capabilities and many of those that do don't do it well. Also we would be putting the security of filtering in the hands of a third party, not something that I think most site admins would be willing to accept.

Beyond the security discussion limiting tags, attributes, etc is also high on the designers list of things to control. Every site is different but a good designer will limit to some degree what can be used in a site or sections of a site. A bold tag is harmless but it may not be appropriate for that part of the site. Same thing for lists and all other tags. Outside of the security of the site, the design comes in second in keeping things correct. Many editors allow you to turn them off to use the code underneath or to see the source. Just because a button is not there does not mean people will not try to use a design feature not allow to them. So this method of syncing the buttons with the allow code goes beyond the security aspects.

markus_petrux's picture

That's besides the security aspect, which is clearly controlled by the server side filter, since one could hijack a post easily bypassing the client editor filters.

I guess the expected behaviour is that the editor should not allow the user do something that will be filtered on the server. That's what is missing right now, I think. But we would have to define "the bridge" between the server side filter options and the editor features so that we can provide a user experience as consistent and secure as possible.

The input filter could provide some kind of structured information that the API could request and pass on to the editor depending on features suported by each one of them.

Help but not trust

dragonwize's picture

I guess the expected behaviour is that the editor should not allow the user do something that will be filtered on the server.

There is only so much we can do to limit what is done on the client side. As mentioned some editors do not have a limiting ability, some do,
and NONE of them can be trusted because they are client side. All we can do is remove the convenience of submitting disallowed code by only limiting buttons or other configurations. The user can always submit whatever they want, especially if they are being mischievous.

My point of syncing the filter with the buttons by a means of auto-config is to ease the desired effect that they user will not be filtered for something the editor allows them to do by normal means because the developer or admin forgot , or didn't know, to add one or many tags needed for that specific editors enable features, then have to manually set that every single time they make a change. In this manner we are putting the configuration on the Drupal integration so it happens 1 time and has thousands of eyes looking at it making it bug free. Plus when a third party editor changes versions and a tag api changes, it only takes a few people keeping track of it and fix it for everyone to use, instead of every single site admin having to be a security and API guru for every editor they plan to use.

That takes care of 90% of both security and design decisions. Because most standard editors do not write lists with swf files other security related issues. So the limiting of code to only what the admin & designer wish to allow by the editor buttons or config, we are already limiting much of our concern on the server side. Then all we have to do is apply an advanced safety net feature for site admins for security concerns that double checks that the code is up to their standards. I say advanced because we can default this to what the Drupal community considers appropriate and most sites will not, or not need to, change it.

While integrating the admin overrides in the beginning of the process to further limit editor features would be great, I see the admin overrides as a feature we can cut or change to lower the complexity at first draw. Because if you are modifying the admin overrides to allow or disallow certain code then you should know what you are doing, hence an advanced feature. So if it can be done, so much the better but I think it is a feature that can be dealt with as it is now if we are running our of time or it becomes to complex.

I am going to start writing up some code over the next month and we can work out the issues in code instead of theory.

FYI - Related issue in Wysiwyg API queue

markus_petrux's picture

I just reminded there is already the following issue opened in the Wysiwyg API queue, that you may wish to check out: Integrate with input filters.

Wysiwyg

Group organizers

Group categories

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: