Filter h1, h2, and h3 tags from allowed html in comments

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Cliff's picture

The semantic structure of a discussion can be ruined if a comment includes code for headings that are at a higher level than the title of each comment. In discussions in Drupal, the title of the page is an <h1>, key regions are headed with <h2>s (which sometimes do not appear on screen but are available to screen readers to define regions that are obvious on visual inspection), and each comment begins with an h3. So if any heading is added within a comment, it should be no higher than an <h4>, right?

For people who can see, an <h1> or <h2> in the middle of a comment is merely a visual anomaly. But for people who must rely on screen readers to determine the organization and content of a page, those misplaced headings can make the page ungrokable.

Drupal already filters the html allowed in comments. So to prevent people who are unaware of accessibility issues from making an otherwise accessible discussion fail WCAG, why not remove <h1>, <h2>, and <h3> from the list of tags allowed in comments?

Comments

Agreed. And from the body too!

mgifford's picture

There should be some filter that sets the highest level heading you're allowed so that it fits in nicely with the page's layout.

Unfortunately Drupal's flexible enough that it could really be almost anything. See:
http://drupal.org/node/514008#comment-2014934

Also, would it be removing the H1's or making the H1's into H2's and the H2's into H3's, etc...?

I was just thinking of comments in discussions

Cliff's picture

I was thinking only of discussion pages like this one, where:

  • The heading of each individual comment is an H3.
  • Everything in the comment is semantically subordinate to that.
  • So why should the list of HTML tags that I am allowed to enter as I write this comment include H1, H2, and H3?

Information that is so complex as to need several levels of headings should be introduced with a comment and loaded as an attachment.

I think the new drupal_render

joachim's picture

I think the new drupal_render system would eventually allow the whole of the page to have properly nested headings.

If we have an array of data ready to render, and mark each piece that is to be a heading with a special type (say #heading), then on its way down the array the iterating function can keep count of how many headings it has seen so far and so make sure H3s come below H2s and so on.
I had a brief poke at the page array though and I think some more restructuring would be required for this to be possible.
Speak to MortenDk about this though!

A user warning would be a

kat3_drx's picture

A user warning would be a good idea here, at least a note on the filter that the allowed tags do NOT include h1, h2, or h3. It makes sense that, if the title of the node/comment starts as an h3, then anything below it should be no greater than h4. Which would mean that the user loses three heading tags at the get-go. Not a problem as long as they're warned (and as Cliff points out, most comments should not be as complex as that anyway [though the possibility isn't eliminated]).

Also, if we render headings by declaring the highest variable then subbing down from there as has also been suggested, header tags that are too large can be rendered down. For example:

"* Allowed HTML tags: <h4> <h5> <h6>

* All header tags converted to proper semantic hierarchy where <h3> is topmost."

In this case, h3 being the variable.

What if we revise "Input format"?

Cliff's picture

We could simply delete those codes from the list of allowed html tags.

We could, but I'd argue that

kat3_drx's picture

We could, but I'd argue that simply removing the code without an explanation may confuse users who don't realize that, in the section they're currently in, headings begin smaller than h1. A short explanation with the removal of those codes is needed to keep the user informed.

I understand

Cliff's picture

I'll try a longer comment again. (Last night, Mollom kept identifying everything I tried as spam until I cut it back to that brief response.)

I was thinking that an explanation would add clutter everyone would have to deal with, but very few people actually try to enter headings in their comments. So is it better to add that clutter to announce a change? Or is it less troublesome in the big picture to revise the information under "Input format" and leave it to people to look there? I'm honestly not sure.

On the one hand, you're right: It is a change, and it should be announced. On the other, the few people who find their comment filtered will be able to preview — and edit — their comments. And if something about formatting comments in D7 didn't work the same way as it had D6, the first place I would look as I tried to edit my comment would be under "Input format." So I'm thinking we would be troubling a few people a little, but putting an announcement out where everyone could see it would be troubling a lot of people… a little.

Not that big a difference. I wouldn't die if it were stated more prominently. But if we do, perhaps it would be good to state it in an affirmative way: "Because each comment's heading is an H3, only levels H4 through H6 are available for use within your comment."

I guess this is all premature until we find out whether the idea is doable and accepted.

I agree; in my opinion, the

kat3_drx's picture

I agree; in my opinion, the notice of heading changes should be under the input format note, as in the little code example above (with better wording than I put just there, something like what Cliff wrote in his last comment would be good).

The other question is, what if the user does put things into higher headings than is allowed? When a tag isn't supported by an input filter, users are used to just having it ignored, so this might be the best course of action. The other option being to convert them to the proper semantic hierarchy (h1 becomes h3, h2 becomes h4, and so on) but there are two possible problems there. 1) if the user marks up a comment lower than can be supported in the filter and 2) whether or not this remains operable. One could argue that reformatting the user's input constitutes changing what they meant and remove the operable and understandable principles.

Regardless of the second option, I think at least a note and excluding headings above the level used in the title makes sense.

Block higher headings and let the user sort it out

Cliff's picture

Most people never put headings in comments, so that would be the rare exception to begin with. I think headings that are too high fall into the category of "mistakes we can't prevent people from making."

When the headings they use are higher than allowed, we should just strip them out. The person making the comment will be able to see the result when they preview their comment or just after they post it. They can revise their comment after the preview and, for that matter, after they post it, too. I would expect anyone who sees unanticipated results in their preview or post to double-check the section on input format and see if the answer is there. Because that's where we plan to put the explanation, this problem is solved.

The only legitimate case I can think of for placing higher headings (H1, H2, and H3) within a comment is to illustrate a problem (or a solution to a problem). To avoid wrecking the semantic structure of the page, those kinds of illustrations should be placed in an attached file, not in the comment itself.

Headings are no mistakes but give structure

Frank Ralf's picture

I have started using headings myself over at [#467296] to keep such a long thread manageable at all. I must admit that I have not been aware that the next hierarchical heading level should be H4.

The easiest way would indeed be to strip those tags from the filter. A more user friendly solution would be to automatically convert all H1, H2, and H3 tags to H4 with a short notice to the user why.

Cheers,
Frank

Autoconvert is better? It depends.

Cliff's picture

If an autoconvert is doable, that's worth trying. But if we can be that elegant, wouldn't it make more sense to give the user an explanation and a choice: Autoconvert all to h(n+2) or let me revise? Some might choose a different presentation.

Agreed

Diva's picture

As mgifford said, there should be some filter that sets the highest level heading you're allowed. Then that will fits nicely to page interface.

find more edu articles

Having to look

mgifford's picture

Just as an example for how knowing what that highest level should be in the body, when putting together this GDO wiki page I wasn't sure what heading to use because I didn't look. Think it still isn't nested all that well as there aren't consistent headings in the sideblocks.

Ran it through Webaim's Wave to get a good view of the hierarchy.

Even without a WYSIWYG it would have been handy to have known where to logically begin.

Looking ahead to Drupal 8, this is a good task to take on

Cliff's picture

@mgifford, we've discussed before the notion of using WAI-ARIA as a tool to deal with many of these issues. I am no expert in that field, but perhaps the apparent structure of this page would demonstrate the problem and a potential goal:

  • The <h1> on this page is the title of the initial post: "Filter <h1>, <h2>, and <h3> from allowed html in comments"
  • "Post new comment" is an <h2>
  • The other <h2>s are the headings for the auxiliary navigation features in the right column:
    • Accessibility
    • Accessibility (again!)
    • New Groups
    • Group Notifications
    • My Groups
  • The <h3>s are the headings of each comment other than the initial comment.
  • So far, there are no h4s on this page.

Semantic level is one important clue to the page organization. The semantic levels in this page tell me that the auxiliary navigation is connected quite closely to the first comment, but the responses are connected less closely. That doesn't quite make sense. And it makes even less sense when we consider the reading order. The reading order itself makes pretty good sense: It goes through the top of the page, including the main navigation (My account, Recent, Jobs, Groups, Events, Contact, and Log out) and then to the initial comment. After the initial comment, it goes to the second comment and then to the remaining comments in order. After going through all of the comments, it goes over to the auxiliary navigation and proceeds through it from top to bottom. So here's the confusing part:

  • The first <h2> is after the last <h3>.
  • The main navigation, which presumably is at least as important than the auxiliary navigation, has no heading to help me find it.
  • Some of the comments posted here are direct replies to the initial post. For example, the first five responses were at least submitted as new comments responding to the initial post. Other comments are replies to those replies or even replies to replies to replies. (Whew!) But each reply has an <h3> as its heading, so I can't tell by the semantic level of a heading whether that comment is a reply to the initial response or a reply to a reply. In other words, nothing in the page structure conveys the meaning sighted people get from that extra indent that replies to replies take on.

So here's the goal I'm thinking over:

  1. Get WAI-ARIA techniques incorporated into Drupal 8. Among other advantages, WAI-ARIA would allow us to assign roles to regions of the page. These roles tell the purpose of the text — for example, "site navigation" or "navigation within this group" — and can be read by screen readers. So they enhance the experience for people who use screen readers. If we could achieve this, we could stop straining heading levels beyond their best use. (Roles are not necessarily used instead of heading levels — you can assign both a role and a heading level to the same block of text.)
  2. Within the main body of the page, use heading levels to show how the comment fits into the flow of the discussion. If this were done on this page, we would see this structure within the comments:
    • <h1>: Filter <h1>, <h2>, and <h3> from allowed html in comments
      • <h2>: Agreed. And from the body too!
      • <h2>: I was just thinking of comments in discussions
      • <h2>: I think the new drupal_render
      • <h2>: A user warning would be a
      • <h2>: What if we revise "Input format"?
        • <h3>: We could, but I'd argue that
          • <h4>: I understand
      • <h2>: I agree; in my opinion, the
        • <h3>: Block higher headings and let the user sort it out
      • <h2>: Headings are no mistakes but give structure
        • <h3>: Autoconvert is better? It depends.
      • <h2>: Agreed
      • <h2>: Having to look
      • <h2>: Looking ahead to Drupal 8, this is a good task to take on

I'm not absolutely sure this is the right approach. It would call for a filter to recognize the semantic level of the current comment's heading and prevent any headings of that level or higher from appearing within the comment itself. And because a series of replies to replies could quickly run us through all the available heading levels, we might want to make it clear that one should reply to a reply only under specific circumstances — for example, to insert a new response into the thread after the thread had already moved forward. (If we were to adopt this approach, perhaps the "Reply" link should not appear in the last comment in a thread, because we would want the next item in the discussion to be a new comment, not a reply to the last comment. This isn't changing the type of comment that would be accepted; it's merely acknowledging that my model of differentiating between replies to the initial comment and replies to other replies is not working well. Perhaps we need to think in terms of whether a reply extends the discussion beyond the last comment or adds new information to an earlier comment. Extending beyond the last comment would call for another <h2>; adding new information to an earlier comment would call for an <h3> or whatever the next level below that comment's heading would be.)

Anyway, it's worth thinking through these issues as we look ahead to Drupal 8.

Header

Drecka's picture

I have to agree it can destroy the headers from it

Get instagram followers