HTML5 Input Filter

Events happening in the community are now at Drupal community events on www.drupal.org.
jensimmons's picture

We are discussing adding to the HTML Tools module functionality that adds more HTML tags ("elements") to the "filtered" input filter.

Let's create a list here of what tags should be allowed.

Examples:

<article>, <section>, <mark>, <time>

Comments

Context?

apperceptions's picture

It might be good to add some indication of best uses for an element. For example, how is article or section to be used? W3C spec indicates that these should not be used as general purpose containers like div. The article element seems a good match for Drupal syndication and section tag might be appropriate for use with Drupal tabs. I image there are many more uses for these two.

trickiness...

legion80's picture

if section and article are allowed, shouldn't header and footer be allowed as well, since you can have either of those within a section?

in a talk by tantek celik (@ voices that matter san francisco 2010), he mentioned that with the new html5 standard a couple tags got revisions:

a lot of old, presentational elements have been repurposed to have semantic meaning. for example:

  • __b__ was bold, but now it means a structural "bold": product name, first line of an article
  • __i__ was italic, now it should be used for new voices, like introducing a new technical term
  • __br__, now __wbr__, was implemented as a carriage return, but now represents the semantics of line breaks
  • __hr__, as flourish breaks between paragraphs, semantically represents a thematic break between paragraphs
  • __small__ literally means small print, like legalese

moreover, li value and ol start have been restored.

see: http://www.w3.org/TR/2010/WD-html5-diff-20100624/#changed-elements

FWIW

apperceptions's picture

I'm theming a site where the designer is delivering mock-ups using HTML5, CSS3 and jQuery :)

So far, the only new elements that he has used include header, footer and nav.

Also, we are using a very simple html tag with only the language attribute specified at this time (no namespace).

Also here's a list of all the

ericduran's picture

Also here's a list of all the text level semantic tags http://www.w3.org/TR/html5/text-level-semantics.html

video

apperceptions's picture

DOH! What am I thinking, I'd love to see video tag allowed :)

Oh, right. We probably could

jensimmons's picture

Oh, right. We probably could allow the video tag — yes? I'm so used to excluding the object and embed tags for security reasons. Are the video and audio tags secure? If so, let's include them! Wow — videos in comments, ftw!

Also, let's not use this space for debating what the html tags do, or everything about them. Let's list and discuss which tags should go into something that's just like the current "filtered html" input filter in Drupal 6.

For reference, the default D6 input filter is

<a>
<em>
<strong>
<cite>
<code>
<ul> 
<ol>
<li>
<dl>
<dt>
<dd>

It looks to me like D6 core HTML4/XHTML defaults were chosen with several things in mind:
1) don't allow anything that's insecure
2) keep the list small and simple

There's no h1, h2, h3, etc. There's no blockquote, del, q, sub, pre, b, u, sup, img, table, strike, acronym, etc.

If we want to not have too many new tags, how can we decide what's more likely to be needed?

Here's a list of what's been suggest above:

section
article
header
footer
video
audio
mark
time
b
i
wbr
hr
small

What is considered insecure?
from html4: embed, object, img... what else?

Are any of the new HTML5 tags dangerous? Canvas for sure. Any others?

block vs. inline

greggles's picture

It's also been noted that the original list is only inline elements, though I'm not sure if that was on purpose.

Regarding security, I think it's somewhat too early to say for sure which are going to be safe/unsafe.

There is a list at http://heideri.ch/jso/#html5 that is long, but I haven't reviewed it fully to see which of the html5 tags specifcally cause new problems.

new elements will also be

seutje's picture

new elements will also be inline unless specifically stated otherwise, this is the default behavior for unknown elements (like if you were to serve XML and make up your own tags)

also do note that our dearest friends, the IE family (excluding 9), refuse to render unknown elements unless they're shivved in -> http://remysharp.com/2009/01/07/html5-enabling-script/

that one also includes the print fix (even with a regular shiv, IE doesn't apply those styles when printing, so u need extra magic)

I also think that IE8 refuses to apply it if there is no body tag present (which is optional now btw, but for obvious reasons, should be retained)

this means that the IEs won't style these elements if js is disabled, a usable workaround is to use an inner wrapper with an element IE does understand like this for example:
instead of just

<!doctype html>
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <header>
      ...stuff
    </header>
  </body>
</html>

you could use
<!doctype html>
<html>
  <head>
    <title>foo</title>
  </head>
  <body>
    <header>
      <div class="header">
        ...stuff
      </div>
    </header>
  </body>
</html>

and then style .header instead of header
I know, it brings back the horrible nightmares of divitis...

Headings & Accessibility

mgifford's picture

Just a quick note that h2, h3, etc. should be part of the default but didn't make the cut for D7. See http://drupal.org/node/514008

It's the semantic way to break up long pieces of content (even if you aren't blind) so really should be considered when adding to the default.

The trick is you don't want to allow someone to add an h1 or sometimes even an h2 depending on the document structure.

if img is not included for

legion80's picture

if img is not included for security reasons, i would guess that video and audio might be subject to the same issues. maybe this is something to ask the drupal security team, i dunno. with the video and audio tags you can specify multiple sources, and browsers can choose which source to use, based on which codec they support. could specifying fake src's, sort of like tracking pixels, be a security threat?

as for section and article, they are considered tags for "sectioning content" (http://www.w3.org/TR/html5/content-models.html#sectioning-content-0). sectioning content tags "potentially [have] a heading and an outline". the heading includes the h1-h6 tags, which are currently not allowed. so if you can't have headings in the current D6 html input format, then should you be allowed sections?

Headings being recursive is new for HTML5, so i think two arguments could be made:

  1. Users may define their own sections in the node body, in which case you allow h1-h6 and hgroup into the input format (and header and footer).
  2. Users may not define their own sections and only create pure textual content, in which case neither section nor article would be allowed.

My personal vote is the latter, since it seems to me most consistent to the intent of the original D6 Filtered HTML input format.

Hmm...

perusio's picture

I think that the document structural elements like section, article, footer, header, nav, aside, figure, should be left out of a basic input filter. It should be left to each site builder/owner the degree of latitude they're allowing for each content creator on their site. Furthermore it's inconsistent to allow the new elements without allowing div also.

Also if you provide <audio> and <video> you also have to provide <source>.

Here's a flowchart from HTML5Doctor about the new HTML5 elements.

http://html5doctor.com/wp-content/uploads/HTML5Doctor-sectioning-flowcha...

Don't forget FIGURE and FIGCAPTION!

spaceninja's picture

Two of my favorite new tags - I would love to be able to use them in the text editor.

Oh my.

adrinux's picture

Most all of these new tags belong in the 'full' input filter but not in the 'filtered'. People need an easy to use, very simple, secure filter as a default, something that will work for comments, for instance. Much better to think of the minimum required, rather than what would be 'kewl' to a front end web dev.

HTML5

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: