Views refactoring ideas #Round 1: The View object.

Events happening in the community are now at Drupal community events on www.drupal.org.
sylvain lecoy's picture

The main problem is highly related to the view object itself, which has grown organically. The main goal of this discussion is to separate concerns from the view object (reduce its capabilities) to creates an API that is loosely coupled (that is, 3 interfaces that deal with: Configuration, Data Source and Rendering). An anecdotal evidence is that this object is known to be the heart of the system.

Do you remember the original black-and-white movie The Blob? Perhaps you saw only the recent remake. In either case, the story line was almost the same: A drip-sized, jellylike alien life form from outer space somehow makes it to Earth.

Whenever the jelly thing eats (usually unsuspecting earthlings), it grows. Meanwhile, incredulous earthlings panic and ignore the one crazy scientist who knows what’s happening. Many more people are eaten before they come to their senses. Eventually, the Blob grows so large that it threatens to wipe out the entire planet.

The movie is a good analogy for the View class, which consumes entire object-oriented architectures.


Figure 1 The Blob.

This is characterized by a class diagram composed of a single complex controller class sourrounded by other data classes, as shown in figure 2. The key problem here is that the majority of the responsibilities are allocated to this class. Thus making the "API" completely unusable.

Here we have a software module that is given responsibilities that overlap most other parts of the system for system control or system management. The View class contains the majority of the process, and the other objects contain the data. Architectures with the Blob have separated process from data; in other words, they are procedural-style rather than object-oriented architectures.


Figure 2 View Class.

A procedural design separates process from data, whereas an object-oriented design merges process and data models, along with partitions. The allocation of responisibilities is not repartitioned during system evolution, so that one module becomes predominant. The is often accompanied by unecessary code, making it hard to differentiate between the useful functionality of the View Class and no-longer-used code.

Some funny facts:

  • render method. The doc says: "Render this view for a certain display. Note: You should better use just the preview function if you want to render a view."
  • execute_display method. "To be called externally by whatever mechanism invokes the view, such as a page callback, hook_block, etc. [...] This function should NOT be used by anything external as this returns data in the format specified by the display. [...]". - WTF ?!
  • Private/protected methods are not really privates, they are public with an underscore as prefix to the function name.
  • There is a mix-and-match of camelCase functions and php_procedural_style function name.
  • An evil copy method.
  • A clone_view method having a sudden clarity about itself: "Because views are complicated objects within objects [...]" - mhhh
  • There is a very curious fix_missing_relationships method that would need some love/reconsideration.
  • The View::views_object_types() make me think of the speech by David Zuelke Designing HTTP interfaces and RESTful Web Services. Especially the slide 77 about the Twitter API: DELETE api.twitter.com/l/status/destroy/12345.json. "DELETE destroy", RPC much ?

The View class can (but is not limited to): determine a user access to the view, run attachment displays for the view, build an individual set of handlers, build a query for the view, force the view to build a title, clean the house, buy you a beer, choose a display among those accessible to the user, delete - save - update the view from the database, use ajax, integrates pagers, makes nice hamburgers..., etc

Views can do a lot more than that, but those are some of the obvious uses of Views. You guessed it, a lot of things... It is by the way clearly stated on the project page.

Consequences

  • Single class with a large number of attributes (49), operations (67), OUCH!
  • A disparate collection of unrelated attributes and operations encapsulated in a single class. An overall lack of cohesiveness of the attributes and operations.
  • A single controller class with associated simple, data-object classes. Views expert I havn't yet digged into plugins and handlers so please correct me on this point.
  • An absence of object-oriented design. The single controller class often nearly encapsulates the applications entire functionality, much like a procedural main program.
  • A migrated legacy design that has not been properly refactored to an object-oriented architecture.
  • The View Class is typically too complex for reuse and testing. It may be inefficient and expensive to load into memory, using excessive resources, even for simple operations.

Refactoring

I mentionned a complete rewritting from the ground-up, but hopefully the solution involves a form of refactoring. The key is to move behavior away from the View class. It may be appropriate to reallocate behavior to some of the encapsulated data objects in a way that makes these objects more capable and the View class less complex.

  1. Identify or categorize related attributes and operations according to contracts. These contracts should be cohesive in that they all directly relate to a common focus, behavior, or function within the overall system. Therefore, the first step is to identify cohesive sets of operations and attributes that represent contracts. In this case, we could gather operations related to query management like, the db_table, base_table, base_field attributes along with the build, _build methods. The CRUD operation to make the View persistent can also be grouped together. We could also identity all operations and attributes related to individual display objects, such as argument, field, sort, filters, etc. And finally all UI related attributes and methods.
  2. The second step is to look for "natural homes" for these contract based collections of functionality and then migrate them there. In this example, we gather operations related to query management and migrate them from the VIEW class and move them to the QUERYBUILDER class, as shown in Figure 3. We do the same with CRUD operations and attributes related to persistance and exportable to the VIEWENTITY class. This both simplifies the VIEW class and makes the QUERYBUILDER and VIEWENTITY classes more than simple encapsulated data tables. The result is a better object-oriented design.
  3. The third step is to remove all "far-coupled", or redundant, indirect associations. In the View class, the display handler is initially far-coupled to the VIEW class in that each display handler really belongs to a display ARRAY, which in turn belongs to a VIEW.
  4. Next, where appropriate, we migrate associates to derived classes to a common base class. In the example, once the far-coupling has been removed between the VIEW and DISPLAY classes, we need to migrate DISPLAYs to a separate class. I need some View experts here to define a strategy, because it seems to be partly done with plugins ?
  5. Finally, we remove all transient associations, replacing them as appropriate with type specifiers to attributes and operations arguments. In our example, an EXPORT or a GET_ITEMS_PER_PAGE would be a transient process, and could be moved into a separate transient class with locale attributes that establish the specified export plugin or pager criteria for a specific instance of an export or a paged display.


Figure 3 Regrouping the Blob by contracts.

While I described general practises along with some examples that may be erroneous, it really need some reviews and a Views expert architectural viewpoint. Also for the sake of conventions, please use a camel case approach for naming functions. I propose in the discussion to establish how to explode the View class, then creates a set of interfaces (which will defines the API) and will be implemented by the re-architectured components.

Having a high-level architecture will let us create interfaces that will represents the final API, while highly-skilled components implementers will be able to work in a loosely coupled fashion on each components. The whole thing will be more unit testable and I am sure we will enjoy to work programatically with Views.

AttachmentSize
Views.png187.85 KB
theblob.jpg15.78 KB
Views-regrouping-web.png113.09 KB

Comments

So it seems I am working on

sylvain lecoy's picture

So it seems I am working on my own, I will try to resolve the first point.

Identify or categorize related attributes and operations according to contracts.

There could be the following contracts, it may be not perfect but this will give a good idea:

  • Entity: All Entity Operations, do we want View to be an entity ? With an ID, a language, and maybe fields ? But what for ? Views can be an entity, without fields. It will embrace the drupal entity mechanism, with exportable capablities for instance. In this category, there will be: $db_table, $base_table, $base_field, $name, $vid, $description, $human_name, $core, $api_version, $disabled, update(), get_base_tables(), fix_missing_relationships(), get_human_name(), load_views(), save(), delete(), access(), export(). I am not sure of the benefits for View to be an entity however, this needs to be discussed.
  • Query: All things that creates the query. Here we have to choose either $base_table and $base_field goes on Entity or Query. In this category, there will be: $built, $executed, $query, $build_info, $result, $base_database. What about: $field, $argument, $sort, $filter, $relationship ? There will be: _pre_query(), _post_query(), init_query(), build(), _build(), pre_execute(), post_execute(), execute(), is_cacheable()
  • Pager: as a decorator, like Drupal makes it. There will be: $current_page, $items_per_page, $offset, $total_rows, set_current_page(), get_current_page(), get_items_per_page(), init_pager(), get_offset(), set_offset(). Also, the use of a decorator is appropriated because not each views needs pagers, this will make the View object lighter. We could get rid of use_pager() function.
  • Rendering: all related templating stuff and rendering methods. This would include: $attachment_before, $attachment_after, $is_attachment. What are $exposed_data, $exposed_input and $exposed_raw_input ? $display_handler, $display, $header, $footer, display_objects(), set_use_ajax(), init_display(), init_style(), _init_handler(), init_handlers(), $inited <- what is this ?, render(), render_field(), execute_display(), preview(), attach_displays().

View object would coordinate the above objects, and will have methods such as getResponse().

Do you think we could have better separation of concerns ? Is this make sense or absolutely not ?

Good Start!

mikeytown2's picture

I agree that rendering should be a separate thing. I think most people agree with this and it should be the focus of development initially. Keeping entity, query, and part of the pager on one side and all rendering on the other would be the right starting point.

If doing a new code base, then having lots of different objects sounds like a good idea. Things get tricky with the pager when you take things like this into consideration http://drupal.org/project/views_calcpager which uses the main query for the pager instead of doing 2 queries. Another project that makes things a little tricky is something like http://drupal.org/project/views_javascript_random which will change how thing get rendered based off of how you sort. As long as complex things like this are kept in mind when making the new code base, most people will be happy.

While I'm not enthused about

merlinofchaos's picture

While I'm not enthused about engaging Sylvain in any way, at least this post is less insulting than the last few I've read. I wish it were simply not insulting, rather than less, but I guess I will take what I can get.

Sylvain, a word for the future: I will not engage you in any way if you continue to insist on using emotional and denigrating comparisons and arguments when discussing the Views' architecture. You have made it very clear that you have no respect for me or my work. That's fine, you are welcome to your opinion, but that opinion is plenty clear. I have no interest in having it reiterated repeatedly. If you wish to help us, the very, very first thing you have to do is take a step back from yourself and stop attempting to reinforce that you are smarter, better or whatever it is than you are than us. Maybe you're smarter than me. A whole hell of a lot of people are. Now that we've established that, it's another thing I have no interest in having reiterated repeatedly.

render method. The doc says: "Render this view for a certain display. Note: You should better use just the preview function if you want to render a view."

During the very initial growth phase of the Views object, I started with 3 simple phases: build, execute, render. I had intended for render to do the full rendering, but as the display class came into being, I had to extend that.

execute_display method. "To be called externally by whatever mechanism invokes the view, such as a page callback, hook_block, etc. [...] This function should NOT be used by anything external as this returns data in the format specified by the display. [...]". - WTF ?!

I'm sorry that's not understandable. hook_block expects an array that includes a subject and content. For example. execute_display is meant to provide data in the format that is expected.

Private/protected methods are not really privates, they are public with an underscore as prefix to the function name.

Views 2 was written for PHP4, and we've spent very little effort to try and take advantage of PHP 5 features. Note that Views 3 still exists, with 90% of the functionality, for Drupal 6, and as such a lot of pieces have maintained that compatibility. Because of the PHP4 compatibility, there is no such thing as private, protected, or a whole host of other nice OO features, nor has anyone had the time and energy to devote to a full scale architecture revamp at this point.

There is a mix-and-match of camelCase functions and php_procedural_style function name.

No there is not. If there is any camel case, it's likely coming from transitions to Drupal 7 where camel case was adopted as a standard for OO over my objections. Or rather, without me having a chance to object because it was put into the standard before I had any idea it was happening.

An evil copy method.

Really?

A clone_view method having a sudden clarity about itself: "Because views are complicated objects within objects [...]" - mhhh

Prior to PHP 5.3, it was very bad about removing referenced references. So there's a lot of extra cleanup we do to try to alleviate PHP problems with memory. This is much less important as of PHP 5.3 thanks to improved garbabe collection.

There is a very curious fix_missing_relationships method that would need some love/reconsideration.

This is one of those transition methods that exists when you have several years worth of updates to deal with.

The View::views_object_types() make me think of the speech by David Zuelke Designing HTTP interfaces and RESTful Web Services. Especially the slide 77 about the Twitter API: DELETE api.twitter.com/l/status/destroy/12345.json. "DELETE destroy", RPC much ?

Really?

The big picture

As you know, it is already on the VDC road map to separate the view executable from the view storage. The actual storage object has been undergoing very serious work at this very moment as the team has helped with the CMI project to figure out how Drupal, in general, is going to do this.

Views was written at a time when there was nearly no OO in Drupal at all. I went from a purely procedural system to a simplified OO system. Since that was written, Views has grown somewhat organically as a bunch of different developers have added to it, and we've had to do smaller in-place refactorings to improve the system. There are a couple of fairly important errors in the initial design that have been worked around to keep things working as we move forward.

Finally, there is a constant struggle in software to achieve a balance between architectural perfection/purity and pragmatic realism. Trust me, I find it annoying when I work with code that has imperfect APIs, but I find it more annoying to work with code that doesn't actually function. I personally think we've done reasonably well on that balance. Not everything is perfect, but Views does okay. It's certainly no worse than Drupal core itself, in terms of design debt over several generations of software.

Also: No one on the Views or VDC team actually reads this group. It seemed like a good idea back in 2006 or whenever this group was made, but realistically, it hasn't worked out as a good place to work.

One of the interesting

merlinofchaos's picture

One of the interesting problems with the architecture is that the prominence of the display object actually came fairly late in terms of the design. Originally it was a very small object that was only intended to hold specifics about location -- i.e, where pages were and how they were accessed.

This is interesting and important, because once the display was solidified, it became the home of nearly all actual settings, and therefore the home of a lot of functionality around things that can be controlled in the UI. However, the view object had originally been architected thinking it would be the home of all the settings. This led to a mismatch (pagers are the most notorious thing affected) and caused extra methods (the unfortunate pre_execute() method in particular) to appear in order to make this happen. While an argument can be made that I should've gone further in my refactoring when I introduced displays the way I did, obviously that never happened.

We have to balance how much we refactor that with the danger of how much code will have to be modified to respond to it. We're already going to be biting off a pretty big chunk by refactoring the view object into two.

I see Views as a fascinating

sylvain lecoy's picture

I see Views as a fascinating project and a pretty motivating incubator for innovation. If Wordpress guys and Joomla folks are impressed by the functionality given, let's amaze developers community by its wonderful API. I pledge that if we do this right, we'll have good feed-backs. I'll propose myself to help writing the entire documentation related to this API change, would be also a way to excuse me for the tone employed against Views architecture.

I have also well understood that the change-set size need to be controlled, I will not propose too big changes or a dream architecture plan that is impossible to implement. By separating the View object into unit-testable, separates objects will, I am sure, help to gain into stability. For me ViewExecutable and View is not enough, for reasons that I explained above.

Having a View layer: that is theming/rendering layer (in the Model-View-Controller or Model-View-Presenter schema), not knowing anything about the datamodel is definitely a good direction. mickeytown2 and pounard seems to be fairly convinced, what about you ?

The Query should not know anything about the View object, this is already the case it seems. I have to go deeper in the code but it takes time.

I think the fundamentals phases you mentioned can be a good starting point.

  1. Build: gather the parameters (arguments, fields, sorting, ...), and build the query.
  2. Execute: this step is obscure for me but I have to investigate the code. It seems its dealing with caching, display handlers, pager.
  3. Render: displays the view as a lovely array that drupal likes after being processed into theming hooks, row plugins and so on.

These three steps are also a standard in drupal where Queries and EFQ along with entity_ methods mimics this behavior.

A last question, when you speak of the display object, what is it exactly ? There is a ViewsDisplay object, a $display property which is an array of display handlers in the View object. You are referring to it but its confusing for me.

Just for the record, I did

sylvain lecoy's picture

Just for the record, I did some works on the Query and have found an interesting concept into core, which is called "Data Source": http://groups.drupal.org/node/237443. And here is the discussion: http://drupal.org/node/1696640.

Views Developers

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week