merge and select from, incomplete or incompetent?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
mixel's picture

While converting my exercises from Drupal 6 to 7 I've tried to stay loyal to the new DB-abstraction layer. One of the exercises has some advanced mysql. In an attempt to make this query work, I've encountered several issues that make me question if I've discovered some incompleteness in the DB-abstraction layer or if I'm incompetent and do not understanding the DB-abstraction layer.

Let me start with the query, it is used for data analysis between users. The query will populate a table based on how often a user has replied to another user. The key of user_interaction is (sid, rid), which allows me to write one query to populate the table:

INSERT INTO user_interaction (sid, rid, count)
SELECT s.uid as sid, r.uid as rid, 1 as count
FROM comments s
INNER JOIN comments r ON s.pid = r.cid
ON DUPLICATE KEY UPDATE count = count + 1

From the DB documentation I understand that the above code is not exactly according to the SQL standard. The "INSERT ..ON DUPLICATE KEY UPDATE" means we are combining an insert an an update query and so need to use db_merge. However, this query is also an "INSERT ... FROM", which is demonstrated here. How I think the query should look like is:

  $query = db_select('comment', 's');
  $query->join('comment', 'r', 'r.nid = s.nid');
  $query->addField('s','uid', 'sid');
  $query->addField('r','uid', 'rid');
  $query->addField('',1, 'count');
  $tmp = db_merge('user_interaction')
  ->from($query)
  ->expression('count', 'count + :inc', array(':inc' => 1))
  ->execute();

Normally the db_merge needs a "key" and "fields", but we expect this should be replaceable by "from".

So time to dig into the DB abstraction layer to understand what is happening.

We see that both InsertQuery and MergeQuery inherit from Query and of course the from method is located at "InsertQuery::from". Logical as from only is relevant for insert and should not be part of Query.

The evaluation of the "from" method can be found in the "execute", were a the actual object is asked to transform the query to a string:

if (!empty($this->fromQuery)) {
  $sql = (string) $this;
  // The SelectQuery may contain arguments, load and pass them through.
  return $this->connection->query($sql, $this->fromQuery->getArguments(), $this->queryOptions);
}

With Query being an abstract class and the string transformation is and abstract method (abstract public Query::__toString();), we see how (string) $this only becomes executable in the concrete InsertQuery (e.g. InsertQuery_mysql, InsertQuery_pgsql, ...).

So I got a warm and fussy feeling about the design and tried debugging my code by calling (string) $tmp, but MergeQuery does not implement the string transformation properly it does: public function __toString() { }... byby warmth, hello cold shower ...

MergeQuery does not seem to be implemented properly, but to suggest a solution, I need to have a look at some software architecture patterns in the DB abstraction layer. By putting some code behind db_insert together we get:

$class = $this->getDriverClass('InsertQuery', array('query.inc'));
  return new $class($this, $table, $options);
//and inside getDriverClass($class, $driver);
$this->driverClasses[$class] = $class . '_' . $driver;

As a small remark, I'm not sure why the class creation needs to be outside the getDriverClass, as separate method calls (insert, update, delete and merge) all create the class. More importantly, this is a creational patter (I'm guessing the builder). The breaking of the "abstract class/method" structure seems a validation of the architectural patter.Two possible change could be done to MergeQuery:

1) If we do not need unique syntax for can MergeQuery, we may use a structural pattern (something like the wraper). This would make MergeQuery a simple object and have all default method calls go to InsertQuery, so that we only need to regulate the update case ... I'm not sure if this is possible
2) If we cannot wrap the object than we need MergeQuery to follow the creational structure (e.g. MergeQuery_mysql, MergeQuery_pgsql, ...)

Considering that I'm bringing this problem up, I'm more than happy to try and solve it. I would actually prefer to solve it as this would be a good exercise for me to contribute code. Still I could use some support from experienced DB abstraction layer architects. It is also possible that I misunderstood the whole problem, in that case I hope you will educate me.

Comments

Not Merge

Crell's picture

Greetings.

Merge queries are... nasty. That's not a Drupal thing but an SQL thing. The problem is that Merge queries are a very new introduction to ANSI SQL, and many databases don't support them. In fact, none of our core databases support them. We are therefore emulating the behavior of Merge queries as best we can, which is only kinda-sorta.

Merge queries by default do not use the toString() compilation pattern because our emulated versions do not have a string representation. If you look in MergeQuery::execute(), you'll see that it's actually a series of queries internally that may or may not get run. There is no way to fold that down to a single string in the degenerate case. If a database did support ANSI Merge queries properly (I think Oracle does but I'm not sure) then its driver implementation could use the toString() method to compile it, just like the other queries do. (Arguably using __toString() was a bad decision in the first place; that may or may not change in the future. I'm not sure yet.)

To address your aside, getDriverClass is simply a utility method to handle the optional-inheritance logic that the database layer uses. If we're in driver foo, then insert queries may be represented by either InsertQuery_foo or, if that is absent, by simply InsertQuery. It's not complex logic, but it's repeated in every factory method so it made sense to move it to a simple utility method.

The key here is that the original SQL query you are demonstrating is not, in fact, ANSI SQL. ON DUPLICATE KEY UPDATE (ODKU) is a MySQL-specific extension, and one that does not map to MergeQuery semantics at all (much as it would be nice if it did so). Early versions of the database layer did use it, and we tried to keep it alive, but in the end the semantics were simply wrong. As our goal was database portability, we switched the MySQL implementation to use the degenerate multi-query-transaction approach. So in native DBTNG, what you are trying to do is impossible because it is impossible (as far as I am aware) in ANSI SQL.

Now, you have three options:

1) Feed that query literal into db_query(). That will still work, although it is not recommended for insert queries. db_query() will take pretty much any SQL string fed to it and pass it on to the database, so you certainly can do any database-specific logic you want that way. If your goal is a releasable module, however, that is naturally not recommended.

2) Just because the core system and default drivers do not support MySQL-proprietary extensions doesn't mean that you cannot. The database layer is designed to be quite extensible. You can if you so choose subclass InsertQuery_mysql to InsertQuery_mixel, instantiate it manually (as everything is constructor-injected that is quite easy), and use it anywhere that you would a normal Insert query in the exact same fashion. Then, add functionality to your subclass to support ODKU. The implementation of such is left as an exercise to the reader.

3) If you don't like the idea of manually instantiating a query object, you could implement your own complete driver. The process is actually quite straightforward. You would simply create a new directory in includes/database, create the proper class names in the appropriate files following the pattern of the other drivers, and include stub copies of all relevant classes. Have each class inherit from the MySQL implementation (class UpdateQuery_mixel extends UpdateQuery_mysql {}, etc.), and make sure that the driver file manually includes the appropriate parent driver file. Then add your extensions to InsertQuery_mixel only as in #2, and leave all others as empty extensions. (In a derived driver like this, all classes are required even if trivial.) Finally, switch your driver type in settings.php to your new driver (mixel), and Drupal will start using your new classes instead. No changes are necessary to your module code, and your custom code may now use your new, enhanced insert query builder.

Naturally all three options above are MySQL-specific and not portable to different drivers.

why not MergeQuery_mysql?

mixel's picture

Hi Crell,

Thanks for taking the time to answer this fully.

I'm a bit surprised you suggest InsertQuery_mixel over MergeQuery_mysql. The query is not some kind of strange mixel-extension to mysql, it is mysql. Of course this mean I'll have to create the MergeQuery_pgsql and MergeQuery_sqlite, still this looks less like a hack.

Could you elaborate why you don't like to have extension of MergeQuery. Is it because merge is nasty and not really implemented any way or is there another problem I'm not aware of?

MySQL-specific

Crell's picture

The reason is simple: The operation you are trying to run is not a Merge query. It is a MySQL-specific extension to Insert queries that does not, as far as I am aware, even exist on non-MySQL databases. ODKU is not, we discovered the hard way, a merge query. The semantics are quite different except in the degenerate case of a single opaque primary key. Early implementations of MergeQuery_mysql used ODKU, but we had to change it because it was semantically wrong.

ODKU is a part of the MySQL Insert query syntax, so that is the only place it would even run at all. Merge queries are single-record operations, not multi-record operations.

Remember: Despite the existence of the ANSI SQL standard, SQL is not a standard in actual practice. It is an approximation of the lowest common denominator between a variety of incompatible database vendors each with their own syntax. Life sucks like that at times. :-(

Ow, It has been tried, sorry

mixel's picture

Ow, It has been tried, sorry for not knowing this.

I've tried searching if the solution was being discussed. There are some footprint left of MergeQuery_mysql in the class hierarchies of API manuals, but without extra information. Only now with the ODKU abbreviation I found some issue and group discussion dating back to 2008. ... Let's hope that the Prairie Initiative will help with finding the existing discussions easier.

Database

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week