Shortening HTML id's and classes automatically in order to reduce kb of document and improve loading speed

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Anonymous's picture

When viewing Drupal HTML, one notices that the id's and classes on a HTML element can be at times numerous and large. Some of the tags are quite long as well. While clear tags can make for excellent readibility, and ease of design; they are extra bytes which slow down the passing of data, and add kb to the bandwidth.

What about automatically renaming and storing HTML elements and equal CSS tags in letter and number combinations?

For example, for illustration purposes:

<div class="block block-block ">
  <div class="block-inner clear-block">
     <div class="content-wrapper"><div class="content">

could also be done as:

<div class="a1 a2">
  <div class="a3">
       <div class="a4"><div class="a5">

How many letter and number combinations would you need? Well, say you use a scheme of AAA or AAAA, where AAA can be A12, B87, C93, etc. Would this allow for enough variations to use in a theme?

So instead of having something like:

<li class="views-row views-row-1 views-row-odd views-row-first">

You'd have:

<li class="B14 B15 B16 B17">

The (PHP) script would have to recognize the code in the HTML and look for the same code in the CSS files used. Then rewrite this into new CSS sheets, and a new theme folder with PHP files, a subfolder of the theme to store all this, which a module could automatically redirect to, and load instead of the normal theme. This way the conversion would only have to be done once, and Drupal would load straight from the optimized theme folder, thereby saving CPU's and memory, in comparison to doing a live conversion. Additionally a white space remover could be added to the process.

Using a module with an administration backend where you would be able to switch on and switch off the shortening of tags, in case you need the human readable version for design.

In conclusion for example using this, with (file-)caching, on a website with 10,000 pages with 3000 visitors a month, the saved kb could add up. There would be benefits in both faster download times and less bandwidth usage in comparison to a non-optimized theme. Studies repeatedly show that visitors enjoy a faster browser experience. Some search engines are reporting that they take the speed of a website into consideration (which hopefully doesn't lead to building larger more powerful server stacks to accommodate only for the sake of better ranking, unless the energy issue is addressed of course.). How often does a theme or the modules used in a website change? A theme layout and the (layout) of modules used sometimes rarely change over a longer period, sometimes years (YMMV). Even if a module or theme changes, running the module would update the theme and optimize it again, and you'd be good to go. With less code and less energy spent by a server and visitors computer, there might even be the potential to be beneficial for the environment as well, and who doesn't want that?

Comments

One big thing you have

Jamie Holly's picture

One big thing you have forgotten - JS. ID's and classes aren't strictly used by CSS and HTML, but also as selectors in Javascript. Add to that the fact that you can have programmatic selection inside of Javascript (ie: var SelClass="comment-" + currentCommentStatus; $("."+SelClass).hide(); ), parsing those will become nearly impossible.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Hmmm, I didn't think of that.

design_dolphin's picture

Hmmm, I didn't think of that. I've only recently starting Javascript fully. Sticking to HTML and CSS for speed, and also recently found out that screenreaders can navigate (some) Javascript. I'm really not all that at home with Javascript.

It's not an agile solution

esod's picture

An interesting and well thought-out idea. However, agile software development includes writing code that is simple and, to some extent, self-documenting.

<li class="views-row views-row-1 views-row-odd views-row-first"> simply means to me "a view of row 1 that is odd and first.". This is a self-documented class definition.

<li class="B14 B15 B16 B17"> doesn't mean anything to me. I also suggest it would not be simple for me to learn what it does mean.

Developing an administration module to go between the two class definitions sounds complicated to develop and complex to understand.

I consider Drupal an agile approach because of its initial minimalist functionality. Where Drupal views start to seem onerous is after the functional modules have been enabled. But consider what we are getting in return -- real-world business solutions.

Lighter weight approaches such as Ruby on Rails are also wonderful, but until one is deeply immersed in the ROR ecosystem, nothing like an Ubercart web site will be anywhere near a production roll-out.

I think discussion topics like Mercury, nginx, omega8cc, boost, memcache, APC, and all the amazing things that get discussed in this group are the way to go.

Thanks

Thank you for your comments.

design_dolphin's picture

Thank you for your comments. I really appreciate it, and your comments bring a interesting and refreshing point of view.

However, agile software development includes writing code that is simple and, to some extent, self-documenting.

That is an interesting way of looking at it. It gave me a new insight as to how themes are build up in Drupal. For a long time I just saw it as 'all this code' that was overcomplicating things from a design point of view. On the other I've come to understand, and actually like it, it does make sense at times. However sometimes I find it a bit overkill, and I can't help but wonder... Take for example the following div's which are grouped together. This is a real life example:


<div class="panel-region-separator"></div><div class="panel-pane pane-block pane-views-very_long_number_here" >
<div class="pane-content">
<div class="view view-czg-vocab-frontpage-artikelen view-id-czg_vocab_frontpage_artikelen view-display-id-block_1 view-dom-id-1">
<div class="view-content">
<div class="views-row views-row-1 views-row-odd views-row-first">
<div class="node sticky node-first node-teaser node-type-story">
<div class="view view-czg-vocab-frontpage-artikelen view-id-czg_vocab_frontpage_artikelen view-display-id-block_1 view-dom-id-1">

Now, as a designer I look at that, and just shake my head (not knowing whether I should cry, be dumbfounded, or laugh, usually it results in a "What the?"'. I mean how many classes does a div need for a margin of 20px's? Allright, allright, fair enough, don't ask me to iterate over an array while processing input from Ajax, which has to be stored in a cookie, in a secured login session while overcoming cache problems. To each his own, and I respect that fully. I see some amazing stuff being build, including those two modules from the example (excellent interface, incredible constructor options for example). The HTML could use some brushing up here and there, but that can be improved upon, this is just from a designer point of view. It's a amazing bit of kit, very impressed by the usability of those modules.

<li class="B14 B15 B16 B17"> doesn't mean anything to me. I also suggest it would not be simple for me to learn what it does mean.

This is a very good point. As a designer, speaking for myself I usually know the themes I work on (in co-operation) backwards and forwards. If I want to examine a piece of code I fire up something like Firebug, look at the relevant area, and compare it to the CSS to see what is going on. What the id or class is called doesn't mean that much to me. I look at the structure of the document, and the surrounding content. However as to how easy it would be to remember this, and navigate through would need further testing. I've worked with this kind of structure before, and I've found it surprisingly easy to navigate through, using the search function in CSS for example. Code like the example above does not make me happy as a designer, and costs me lot of time. Of course I'm not sure if changing this to AAAA classes and id's would improve the readability, but it would improve the speed and file size. :-)

However, from the standpoint of a programmer I can imagine it would be difficult to work with a module author if the output is different in the un-compressed (module) and compressed (theme) version. Although one could use the lines in the code to use as conversational point, and once everything works, convert the theme again to the optimized version. Whether that would work in the real world remains to be seen.

One way of solving that could be:

<li class="R1 views-row views-row-1 views-row-odd views-row-first">

<li class="R1 B14 B15 B16 B17">

Where R1 stands for Row 1. One could automatically add the row number to all id=" and class=" within Drupal (modules) code with a simple str_replace. If only using HTML... I don't know how Javascript would effect this, or possibly even some PHP code?

The 'row 1' designation would give the line a unified reference point. One could also fire up a search and look it up in a document, as well when conversing: Row1, second segment.

I've found with this system, that I very rapidly changed to looking up code by searches, for example instead of scrolling through the CSS file, actually improving my workflow, but it did take some time getting used to.

What do you think?

they are extra bytes which

dalin's picture

they are extra bytes which slow down the passing of data, and add kb to the bandwidth.

That sounds fine in theory. But I think if you actually compare a HTML gzipped version and HTML gzipped + minified you'll see only a few percent difference. Nothing very substantial. I think it would be difficult to justify the extremely high cost of developing such a system for such a small gain unless you are operating a website in the Alexa top 20. There such a small size change will result in justifiable cost savings.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

How much is it really going

choloepus's picture

How much is it really going to save? In my books this is a mini-optimization
and I would rather spent time elsewhere, but here are some numbers.

Lets take the current page, and see some data. I've removed ALL the classes and id's, but you still need them, so
the data sizes would be larger.

Current page is 24919 bytes, after stripping out all the classes/id's the size is 21144.
There is not a lot of difference? Is it worth it?

In addition to that, majority of the sites utilize mod_deflate/gzip content compression. Which brings the size
to 6371 and 5696 bytes respectively:

$ perl -pi -e 's/class="(.+?)"/class=""/g' 91439_stripped
$ perl -pi -e 's/id="(.+?)"/id=""/g' 91439_stripped
$ ls -al 91439
-rw-r--r-- 1 user user 24919 2010-09-06 20:15 91439
-rw-r--r-- 1 user user 21144 2010-09-06 20:26 91439_stripped
$ gzip 91439

$ ls -l 91439*
-rw-r--r-- 1 user user 6371 2010-09-06 20:15 91439.gz
-rw-r--r-- 1 user user 5696 2010-09-06 20:43 91439_stripped.gz

Long class definitions are necessary and agile

esod's picture

Long class/id definitions are necessary and agile even though they do become quite long, indeed. Your examples demonstrate their length well:

<div class="view view-czg-vocab-frontpage-artikelen view-id-czg_vocab_frontpage_artikelen view-display-id-block_1 view-dom-id-1"> does give one pause.

If, for example, we look at a bit of code from page.tpl.php:

<div id="navigation" class="menu <?php if (!empty($primary_links)) { print "withprimary"; } if (!empty($secondary_links)) { print " withsecondary"; } ?> ">

we can see how this id/class definition can become <div id="navigation" class="menu withprimary withsecondary">, where withprimary and withsecondary are replaced with descriptive words. But, this id/class definition does describe itself, which, in spite of its length, is simple.

I also think there is precedence at work in Drupal id/class definitions and wish I could refer to a more complete explanation.

There is also another issue

design_dolphin's picture

Developing an administration module to go between the two class definitions sounds complicated to develop and complex to understand.

A way to solve that could be to have a:

/maintenance/node/123
/node/123

Where /maintenance/ would show the original theme. This is where programmers and designers could look when working. Having /node/123 be the deployed side of things.

There is also another issue, and that would be that for example table rows (tr), and list elements (li) could have the same class assigned in a theme.

So, for example, in Drupal sometimes we see:

<ul class="ul element">
<li class="list element first active"></li>
<li class="list element"></li>
<li class="list element"></li>
<li class="list element last"></li>
</ul>

Although this at some level makes sense for maximum ease of design implementation. Code such as the following I would prefer, as the ul in most cases is sufficient in combination with the li classes assigned:

<ul class="ul element">
<li class="first active"></li>
<li></li>
<li></li>
<li class="last"></li>
</ul>

The reason I mention this as possible troublesome with conversion to a AAAA schema is because you would have to have a custom conversion rule for repeating elements, otherwise the same classes would get different automated numbers. For a couple of li´s not good, for a table with a 100 rows, pretty much a disaster. And how many custom rules would you end up needing? That could become quite unwieldy, doable, but possible unwieldy, adding to the complexity.

How much is it really going to save? In my books this is a mini-optimization
and I would rather spent time elsewhere, but here are some numbers.

Lets take the current page, and see some data. I've removed ALL the classes and id's, but you still need them, so
the data sizes would be larger.

Current page is 24919 bytes, after stripping out all the classes/id's the size is 21144.
There is not a lot of difference? Is it worth it?

In addition to that, majority of the sites utilize mod_deflate/gzip content compression. Which brings the size
to 6371 and 5696 bytes respectively:

Well one could for example additionally run it through a minifier script. See this write-up for example "Experimenting with html minifier" with some pro's and cons, and in's and out's on doing it this way.

Is it worth it? Well some rough estimates: say you save about 5kb per page view unzipped, and let's say for now this is about 1.4kb gzipped per page. Multiply this by 100,000 pageviews, then the savings per month would be approx. 488 MB unzipped and 136 MB gzipped.

According to this source about 7.19 million websites use Drupal. Let's say for example each website has 1 page view a day of gzipped content, and let's take that gzipped number of 1.4kb we saved. Per day this would be a saving's of approx. 10,800 MB (10 GB) a day, 300GB per month, and 3.5 TB a year on the global internet traffic, and energy consumption (depending on if one gzip's, minifies, and how often, and the type and amount of caching). Food for thought, including for myself.

Bad idea

sreynen's picture

Sorry, but I think this is just a bad idea. Markup should be meaningful.

We can (and have begun to) go through various reasons why meaningful markup is practically important, but more fundamentally this just goes against the whole point of markup, which is to describe meaning of content independent of specific use cases. This idea already ran into a problem in missing a known use case, JS in Drupal, but the larger problem is we can't possibly predict the use cases for markup.

Both the id and class attributes have a stated purpose of "general purpose processing by user agents." So to know all the ways these attributes might be used, we'd first need to know all user agents. Keeping in mind our HTML documents will exist in a future with user agents that don't exist today, this is clearly impossible.

We do the best we can with commonly understood class names right now. And Drupal is moving towards more descriptive markup with RDFa in D7. This idea goes in the opposite direction.

Reducing bandwidth is a good goal, but removing meaning from all classes and IDs would do far too much damage for far too little gain (10GB a day isn't really very much). Looking at your original examples, there's plenty of clear redundancy in the class names. Making the markup speak more succinctly (e.g. when inside class="view", we could just say "row" instead of "views-row") would probably save just as much bandwidth as removing all meaning from class names, and it would also move our markup in the right direction.

semantic views

whatdoesitwant's picture

I fully agree with Sreynen. This discussion goes against all best practices. Also, much better alternatives do exist. Check out Semantic CCK and Semantic Views for example. They should actually be a requirement in all D6 sites. If you are concerend about the effort involved in creating semantic html, you may want to start working with Features if you haven't begun to do so already.

this is just silly, instead

seutje's picture

this is just silly, instead of using a simple solution in the theme layer (where it should be), you're proposing some complex and resource-hungry solution to come in, parse everything and swap things out?

I don't see any upside that couldn't be solved in a much saner way, but I do see a ton of downsides that wouldn't occur when using a saner approach...

performance

whatdoesitwant's picture

I've never noticed performance penalties by using semantic views vs my own preprocessor functions actually? But I only do small sites and leave the hosting stuff to the hosting company. Care to clarify on the performance costs. I don't pick it up in firebug? I must admit that I just assumed that semantic_cck, with its similar approach, wouldn't be costly either but I never tested for it...

I believe seutje was

james.elliott's picture

I believe seutje was referring to the original proposal and not semantic_cck or semantic_views.

I agree with seutje, this is

james.elliott's picture

I agree with seutje, this is silly and overcomplicated for such a massively minor performance gain. I'll bet that far more than 10GB a day could be saved with better sprite usage. And I'm not sure, but I don't think that the energy consumption of 10GB a day of bandwidth would even come close to the energy consumption of parsing and reparsing through every page, css, and js file to use these shorter classes and ids

This is the best suggestion

Jacine's picture

This is the best suggestion in this thread, IMO by sreynen:

Looking at your original examples, there's plenty of clear redundancy in the class names. Making the markup speak more succinctly (e.g. when inside class="view", we could just say "row" instead of "views-row") would probably save just as much bandwidth as removing all meaning from class names, and it would also move our markup in the right direction.

Nothing would be lost with an optimization like this, as .view .row can easily be used instead of .views-row. It seems like a sensible optimization, which also helps with the class-bloat issue that we could all enjoy.

Instead of proposing to revamp the entire system because a small piece of code could be improved, it would be a much better use of everyone's time to identity a single problem area at a time and post a proposed solution in the proper issue queue. This is how the community works and how the code evolves.

There's nothing wrong with bringing up these issues here, but when sensible solutions are proposed and generally agreed upon, they should end up in the proper issue queue to be fixed for all. If not, we'll be back here having the same discussion next year.

+1 I also like this solution

james.elliott's picture

+1

I also like this solution because it pushes towards more generic classing. It means I can just style .row.even and .row.odd instead of worrying about whether it is a .views-row or not.

agree

jessebeach's picture

+1

Concatenated class names that just repeat the scope of the element (.e.g .views-row contains the scope .views- that can easily be designated by the .view parent class in the selector) are not ideal. They make it really difficult to write object-oriented CSS selectors.

Yes, I agree, sreynen´s

design_dolphin's picture

Yes, I agree, sreynen´s comment has a good take on this.

As for where to post these things, I'm still searching around. I'm glad I got so many well-thought ideas. It has given me new insight on the how and why of Drupal design, as well as learning new things.

This is what I love about the Drupal community. The ability to discuss ideas, and have developers and designers give some excellent feedback. Great job, impressed.

Edit: @Jacine, just reread your comment, and got what you're saying. +1
It's a solid approach.

What about shortening some (much used) tags such as 'row' to 'r', 'left sidebar' to 'ls', 'footer' to 'f', primary menu to 'pm', comment to 'c' to name a few? Provided there is a glossary/ dictionary for themers? Or should this be leaved to the individual themer, of course best practice providing documentation for the next themer that may have to work on it?

Still concerned about the over use of div upon div. But maybe somebody else has a solution on that. I don't see how to solve that, yet. And it goes beyond the scope of this topic.

glossaries are a recipe for confusion

jessebeach's picture

CSS is a semantics-enriching language. Obfuscating the semantics in minified language makes it really difficult to read and even more difficult to maintain. I have to agree with previous comments that although technically one would see reduced bandwidth usage, the tradeoff is decreased maintainability and a higher barrier of entry to new themers.

It's a noble goal to want to take a knife to bloat. Props for seeking out ways to make Drupal more efficient. Personally, I would resist this particular approach quite emphatically.

As would I.

srjosh's picture

I'm also really skeptical that the processor load required to pull this off in the first place would represent a net savings in terms of request-to-display round trip time. Add to that the immediate failure of tons of JS, and I think you have a pretty untenable solution.

I actually use a lot of

Jeff Burnz's picture

I actually use a lot of shortened class names - mostly for Skinr. Each class holds just a few style definitions, maybe only one. The site building peeps can build their block styles by applying different combination's of these classes. While this is quite different from the OP I think there are use cases for short class names - if I were to use verbose class names there would be a lot of pointless bloat - both for me and the site. After all, no one is actually reading this stuff apart from me and other machines...

Don't get me started on code bloat in Drupal modules and themes, I'll go on for days about it...