Some review and proposal about Drupal 7.x database stack

Events happening in the community are now at Drupal community events on www.drupal.org.
hswong3i's picture

I start my Drupal + Oracle research since the end of Drupal 4.7 life cycle (around Oct 2006), based on my client's request. The project is still running, but I am not satisfy about its progress. After keep trace in Drupal database stack implementation for more than years, it is time for focusing on Drupal 7.x development.

So what are the changes since 4.7.x? And what will be happened for 7.x? I would like to share my research progress with you, and so let's brain storming for what's next :-)

Only local images are allowed.

Drupal 4.7.x/5.x

Only local images are allowed.
Fig 1.1 Drupal 5.x database stack.

The case of Drupal 4.7.x and 5.x is very similar: we support both MySQL and PostgreSQL backend, with specific database access API file (e.g. database.mysqli.inc), enclosed with database.inc, and that's all. We still have some extra abstraction build on top of this simple model, e.g. tablesort.inc and pager.inc, for some useful application layer abstraction.

During this period, core and contribute developers need to understand the differences between MySQL and PostgreSQL, since database schema must handle manually. I would like to call this as the dark age of our history: we are asking every contribute developers as both MySQL and PostgreSQL expert (but most of us coming with MySQL-only skill set), or else code was not portable. This result in a complicated PostgreSQL support, and so most contribute modules are MySQL only.

Drupal 6.x

Only local images are allowed.
Fig 2.1 Drupal 6.x database stack.

For our newest stable Drupal release - 6.x, Schema API is being introduced. We spend a lot of time for its development, during 6.x 1-year development life cycle. It is target to solve the schema problem as mentioned above. Contribute developers are no longer required as bi-expert nor multi-expert: we just need to know how to define our schema in a correct Schema API syntax, Schema API will handle the rest for us. IMHO, it is very powerful, and it is an important improvement for other database development.

drupal_write_record() is being introduce, too. This handy function will grep out existing cached schema, and build INSERT/UPDATE queries for us automatically. It try to handle all column type binding for normal developers, which is usually very bulky in implementation.

BTW, the work is not too elegant: all stuff are mixed into database.*.inc, and so files are even split as 2 section (groups database and schemaapi) internally. This may increase the difficulty for our core maintenance.

Most likely, this design is very similar as Moodle 1.7+ XMLDB implementation.

So what's coming next for Drupal 7.x?

Only local images are allowed.
Fig 3.1 Well... So what's coming next for Drupal 7.x?

The truth is: I don't know the overall schedule up to this moment. Data Architecture Design Sprint group is now hard working for this, and going to present their research progress in Boston DrupalCon 2008. Since we are going to use PHP PDO (instead of legacy PHP database connection drivers), people are also focusing on implement Data API or Active record model for a higher level of abstraction. Before they have their final decision, I would like to provide some successful case study as reference, and let's brain storming for the optimal solution :-)

Only local images are allowed.
Fig 3.2 Moodle 1.7+ XMLDB stack. From doc.moodle.org.

When we are talking about improve our database abstraction as more powerful and universal, Moodle 1.7+ XMLDB stack should act as a perfect example (it since Nov 2006). As Moodle core uses ADOdb internally, this possibility has been present since the beginning. Plus their XMLDB abstraction, Moodle 1.7+ is able to handle both MySQL, PostgreSQL, Oracle and MSSQL while maintaining everything working properly. I will not detail their implementation within this article, and you may refer to their introducetion, roadmap and problems document.

The most important idea of Moodle's XMLDB stack is split-brain handling. They left every layer keep focus on their duty ONLY, and so the final approach is perfect and elegant. This is also similar as the well-known OSI model: A layer is a collection of related functions that provides services to the layer above it and receives service from the layer below it.

Only local images are allowed.
Fig 3.3 Research progress of Siren + personal suggestion.

Well, Siren is my personal research project besides Drupal 6.x, which target to explore the possibility of multiple database support as like as Moodle, Gallery2, and eGroupWare. Moreover, prepare research progress whenever Drupal 7.x is open for public development (and that's now). Within this project, we are able to support both 5 database + totally 9 PHP database access drivers, including mysql, mysqli, pgsql, oci8, pdo_mysql, pdo_pgsql, pdo_oci, pdo_sqlite and pdo_ibm.

It sounds interesting, isn't it? So what changes I have been made in order to support this approach? Just some step forward improvement: split the handle of function calls into number of individual files, and each layer just keep focus on what they are needed for. E.g. split Schema API related functions into schema..inc, common DML into common..inc, and so on. Surly I plug some extra update to our core queries syntax; but most likely, the idea is just very strict forward. The research progress should belongs to OSI session layer when compare with OSI model.

IMHO, Data API is a must for our future, if Drupal is going to be a web service framework. Acting as a high level abstraction should be the best position for it. It should be similar as the idea of OSI presentation layer, which is one level below our core application APIs. It may implement as OOP or whatever; BTW, according to its duty and responsibility, it should be separated from PDO database access API (where PDO is much similar as OSI transport layer implementation).

One statement as conclusion: split-brain handling.

Any else suggestion?

Database driver is something only need to be implement once and forever. Once the design of logic is completed, the rest is all about the style of implementation, e.g. using OOP as ADOdb or traditional function callback as Drupal.

The design of abstraction layer is also similar, too. Once you get a complete study of the variation between each databases, and able to foresee ALL of the needs of their specific limitation, it is all done. There is no such idea of "Contribute Supported Database" within the above 2 layers, and it is what I call as "No gray area, but only black and white". Whatever you support it, or else you don't.

On the other hand, Data API is much different. It try to serve the need of our application, and so its requirement will keep on changing depend on time. Therefore a split-brain handle should be the most suitable implementation. Please correct me if you have some better idea :-)

But the truth case is: we are lack of PostgreSQL/Oracle/DB2/SQLite/etc developers, and not enough task force nor union for a complete and perfect support for them. Up to this point, I would like to proposal the idea of "Contribute Supported Database" in this way:

  1. Implement ALL possible database drivers into core, and build an abstraction that cover ALL the needs of them. This is a once forever task, and the base of the rest of discussion.
  2. ALL core libraries and essential modules for installation MUST implemented as universal, so basic installation MUST function for ALL databases. E.g. node, comment, users, block, etc. This is all the duty of our core developers which need to be ensure before stable release. On the other hand, if MySQL (and maybe SQLite, too) is our main focus, just keep its support as perfect as usual.

    P.S. an additional pre-requirement for this point: try to implement ALL core module as SQL-99 compatible by default, and so save a lot of time for further more follow up handling.

  3. Label the optional modules as "Not supported" or "Not perfectly supported" within modules selection page, if that is the case of specific database backend; give a different level of support status; or maybe even, disable its activation checkbox by default. Don't let our second/third priority database hold our stable release!
  4. left the issue open for contribute developers if they really hope to let that module function for database A/B/C/etc, and release subversions besides stable release for this purpose. E.g. if now search and forum module are not compatible with Oracle, just let it be as Oracle developers duty!

This is my ideal proposal. So how about yours? Here is the raw file of above images, please feel free to utilize it if you hope to visualize your idea as above :-)

Comments

Enough of this

chx's picture

You are not the first to fork Drupal but anyone who walks out of peer review finds himself with bogus code.

As we also have written a PDO layer and was shocked to find no way to pass in CLIENT_FOUND_ROWS as reported at http://bugs.php.net/bug.php?id=44135 -- I read the pdo_mysql driver source and I am fairly confident there is no way. For Drupal it means that when you variable_set something to the same value then there should be an error because of the way variable_set is written and how MySQL returns affected rows. If you have not stumbled on this then

a) you have not tested your code on MySQL at all
b) Because you use PDO::ERRMODE_WARNING , this covers up the errors in your mplementation. In our testing, we found only exceptions to work reliably.

In either case, Siren is bogus and you have no idea what those bugs are.

Don't shift the focus of my artical

hswong3i's picture

As stated for many times, Siren is just a personal research project, which target for multiple database supporting with minimal logical change and bother to end developer, based on existing DB API implementation. According to this main focus, Siren did its duty: able to support both 5 databases, with totally 9 PHP database access driver implementation (both legacy and PDO).

I can NEVER discover ALL hidden bug; what a single man power can do for this is very limited. If you are asking me to proposal a PERFECT solution for both MySQL, PostgreSQL, Oracle, DB2, MSSQL and SQLite, or else wait for your work and progress: sorry, this is not my cup of tea, because I can't see your interest about supporting database other than MySQL and SQLite. Or even solution is now proposed with Siren, you are still telling me "Enough of this", then close your eyes and ears ;-(

Siren is bogus!? With those solid research and implementation? With those countable benchmarking support? Please don't try to shift the focus with such minor stigma. IMHO, your comment is just trying to escape from reality.


Rome Was Not Built In One Day.


Edison Wong
CEO, Co-founder
PantaRei Design Limited

Oh SNAP!

pcorbett's picture

Hey, chx - if you're pushing for the peer review of code, I might be a bit less crass in your commenting - it's a bit of a turn off :)

I'm not necessarily taking either side here, but I think that we all need to get together, perhaps at Drupalcon Boston and seriously review all of our options and develop some kind of plan going forward. While the Drupal community is amazing, it is sometimes difficult to take on larger-scale efforts such as this in a purely virtual setting.

I'm not as advanced of a developer as you, hswong3i, or Souvent22, but I have a well-trafficked, SQL Server site running now and have a lot of experience getting something like this going as well as the desire to get alternative DB support into Drupal. Souvent22 and I have proposed a session at Drupalcon Boston and perhaps we can all manage to meet up or present together?

Either way, I think there are a lot of people who would love to see this happen, including Dries, and I think we just need to organize ourselves and discuss this together.

What do you think?

Community Spirit

nickvidal's picture

I think the Drupal Community is very lucky to have members such as hswong3i who has dedicated much time and effort researching better Database support for Drupal. From his work and his speech, we can easily recognize three things:

1) He has demonstrated that he is a very hard-working person, given all his research, implementation, time and effort dedicated to solve Drupal's database challenge;
2) He has no intention of forking anything. That's why he has proposed a database stack for Drupal 7;
3) He wants feedback from the community, and he wants to be part of the community. That's why he has asked for help and tried to engage with the community.

I think the Drupal Community has two alternatives:

1) Either ignore him completely, reject him, and shut him off; or
2) Welcome him, and make him feel part of the community, and collaborate with him.

Chx, I know very little about you. But from what you've written so far, I have a very bad impression about you. You have chosen alternative 1 and that's just terrible, specially because it contrasts with hswong3i's attitude and ability. It seems as if you feel challenged by this young man who has embraced this task, as if this is a competition of vanity and popularity. I hope I'm wrong about you, and I hope you'll choose alternative 2.

Best regards to the Drupal Community!

Sincerely,
Nick Vidal

very little?

catch's picture

Chx, I know very little about you.

Perhaps you should find out before making pronouncements: http://drupal.org/user/9446

You can't really defy

rszrama's picture

You can't really defy someone's first impression of someone else, but this link might help someone get a sense of core dev frustration with hswong3i's "contribution" practices: http://drupal.org/node/183148

Community Spirit

nickvidal's picture

I did find out. But I still know very little about chx since I've never met him or talked with him in person. I basically gave him the opportunity to cause a good second impression, since perhaps I might have understood him wrong.

Anyways, a long list of contributions doesn't give anyone the right to be arrogant. Much the contrary. This is what I admire in Dries, as he sets a true example of a leader who cares about the community spirit and who is always gentle to everyone.

first impressions, and second, and third

catch's picture

Well chx is quite capable of defending himself (and rubbing people up the wrong way as well), however the continual cross-posting of Siren - which has been all over the D6 issue queue (now D7), development list, drupal.org/planet has become quite frustrating.

It's not hard to familiarise yourself with either the discussion or personalities at hand here, in fact it's almost impossible to avoid if you regularly look in any of those places. So if this is the first you've seen of this, please ask yourselves why you're wading in, and consider looking into the background if you feel strongly about it.

Personally, I'm pissed off seeing the same stuff being cross-posted everywhere, issues and development list threads hijacked and derailed, a process which has been continuing for several months now. I'm fortunate enough not to actually be doing productive work in the area under discussion, so it's less disruptive to me than some, but still. First impressions may count, but so does a year of FUD and derailment.

What's wrong with Siren?

hswong3i's picture

I can't say any things which clear than this:

1) He has demonstrated that he is a very hard-working person, given all his research, implementation, time and effort dedicated to solve Drupal's database challenge;
2) He has no intention of forking anything. That's why he has proposed a database stack for Drupal 7;
3) He wants feedback from the community, and he wants to be part of the community. That's why he has asked for help and tried to engage with the community.

First of all, I am not going to fork, Siren is target for research: research for a solid progress for discussion. If discussion can let the work get done, with people who against any other database support go into core, I can only comment this as idealistic. I try to discuss the needs of other databases for more than years with contribution, but the most common feedback are "who need it?", "who care about it?", "let's vote for what database are you using?": asking for positive figure from something we have not yet support. Well, this is just an endless loop :s

And that's why I did Siren. If no one really care about the needs of each database, I care for all I can give love with; if no one willing to provide some good idea for brain storming, I dig indeed with it and search for the solution; if you are asking for a perfect solution, I give the code of Siren as a workable example for further more development and discussion.

So what's the meaning of cross-posted? hijacked? derailed? I don't understand :s


Rome Was Not Built In One Day.


Edison Wong
CEO, Co-founder
PantaRei Design Limited

Let's see. A couple of

catch's picture

Let's see.

  • A couple of dozen D6 database issues, including quite mundane cleanups like removing old updates retitled to 'Siren: blah blah". I can see no purpose for this other than to advertise the project inappropriately. A by-product is it distracts from the real purpose of the actual issues that were there, and did so primarily at release candidate stage. That covers hijacking.
  • Daily updates about Siren on planet for a few months, sometimes multiple posts at once filling much of the first page with little discernable difference between them. As you know this resulted in the temporary removal of your blog since a few people considered this unacceptable use of the feed. Much of these psots were also linked to the development list, and sometimes again linked from the issue queue, or from groups, or from all of these. That covers cross-posting.
  • Constant taking over of bug fixes to try to impose Oracle-friendly solutions at the expense of getting the bug solved with minimal cruft added - again, at release candidate stage when this was unnecessary and counter-productive.
  • Ignoring community decisions on things like postponing issues to D7 - resetting status every five minutes despite several different people explaining that there was no way a particular patch would be accepted into D6. This pattern is continuing with D7, along with abuse of the 'critical status'.

In short, you completely ignore collective processes built up over several years both in regards to use of community resources like mailing lists or planet, and code review. This makes it increasingly a life-sucking experience participating in any issue that's remotely related to databases, no matter how mundane, because there's always a risk that it'll be taken over, bumped up 'my issues' over and over again and generally go completely off track with what often boils down to off-topic and spammy followups.

Now, you have knowledge of a particular niche area within Drupal development which is not shared by that many people (and certainly not by me). However that knowledge will be squandered if you don't take some time to review how effective your contributions actually are in the context of what is a collaborative project. I for one have sworn off (albeit unsuccessfully) interacting with you on the basis of the above behaviour, many others may not bother to tell you that they have, or why.

bravo

moshe weitzman's picture

wow, thats one terrific post, catch. you've put into nice words the frustration that i too have been feeling for a while.

People like hswong3i are

drupalio-gdo's picture

People like hswong3i are visioners, who fight for the next level of drupal.
A level where companies will embrace drupal, since it will cooparate better with the most a database (oracle) a LOT of companies use.
When this day will come, it will be a better day for drupal and all of us.
Even the simple and open-source users will benefit from more attention and more people and developers to drupal.
The poing is to attract more attention, bigger installation base, hence more good developers, whose work will benefit all.
Thanks hswong3i for inspiring us for this better future :-)
Continue your great work, and soon people will come to contribute.

Now, now

chx's picture

If we want more people like hswong3i then I will quit contributing to core. Think again.

What's up?

nickvidal's picture

Hi chx,

Would you care to explain to me what's going on? I tried digging more about this and I don't know why exactly you are basically ignoring and rejecting hswong3i.

Thank you,
Nick

See below

Crell's picture

Hi Nick. Please see my comment below for background: http://groups.drupal.org/node/8907#comment-28451

Thanks for clarifying

nickvidal's picture

Hi Crell,

Thanks for clarifying the issue. I apologize for any misunderstandings from my part. I think it's great that now this general opinion is registered publicly. It will avoid any misunderstandings in the future.

Best regards,
Nick

catch's summary

Sorry

nickvidal's picture

I hope I'm wrong about you

My first impression was wrong. I apologize sincerely.

Best regards, Nick

Recall the pinpoint idea

hswong3i's picture

This review is target for some experience and technical readers (e.g. members of group "Data Architecture Design Sprint"), so we will able to brainstorming some positive, solid and achievable solution for D7. Moreover, if we are able to visualize our idea, interested readers (e.g. members of group "Enterprise" and "PostgreSQL") will able to have a quick look and review, therefore contribute their progress without losing the global direction.

I would like to recall the pinpoint idea of this review:

  1. What is the general idea of group "Data Architecture Design Sprint" for D7? It may be too complicated for interested reader to trace all of the team's discussion detail. Some people ask me if this group is now working within a black box without accept others suggestion and contribution. Some visualized figure should be a good idea to explain this clearly?
  2. Database abstraction layer for Drupal 4.7.x/5.x/6.x are step forward upgrade, and towards optimal as like as the model of Moodle 1.7+ XMLDB. Do we really required to have a totally revamp from sketch based on the introduce of use of PDO (which most likely overlooked)? Or we should fully utilize our existing successful progress for a solid improvement?
  3. "Contribute Supported Database" should be a realistic solution from some point of view. BTW, the most important point is "How we implement it, without losing the required minimal support and concern from core developers?". I guess this need for more indeed research and discussion.

A first phase implementation is now updated in http://drupal.org/node/199217, about the split brain handling. The implementation is quite strict forward, and shouldn't overlap with our next generation development. I would like to ask for some help, support, review and comment for a better approach.

Thank you very much, members of group "Data Architecture Design Sprint". Your hardworking and progress should give us a better future in D7. Hopefully we will have some professional feedback after this message, rather than a loss focus and nonproductive striking comment ;-)


Rome Was Not Built In One Day.


Edison Wong
CEO, Co-founder
PantaRei Design Limited

Off topic

Crell's picture

This review is target for some experience and technical readers (e.g. members of group "Data Architecture Design Sprint")

Ed, while I appreciate being called "experience[d] and technical readers", if you had read the many pages worth of writeups that we have already posted on this group you would know that one of the important take-aways from the design sprint was that we want to get away from being local-database-bound. It is a bad thing. We actually didn't talk about database implementation details much at all. As a result, this entire thread is completely off topic for the DADS group and has no business being here at all. I repeat, talking about the database layer implementation is off-topic.

That's the pattern of behavior that others in this thread have mentioned. I appreciate that you have done a lot of work to try and improve Drupal's database layer, but the way you have gone about it has been consistently off topic, pushy, and flippant. Your signal/noise ratio is horrid. It is frequently a better use of time to ignore you and re-learn whatever it is your research has found than to try and decipher something useful from your lengthy rants that seem to show up in entirely the wrong place. I recall one thread about making a critical query faster in Drupal 6 that was stuck on making it work in Postgres that was derailed for weeks because of your insistence on compatibility with other databases such as Oracle, which for Drupal 6 is not a target database and therefore of no interest or consequence. I think that patch didn't even manage to get into core as a result. That's the first impression you're leaving people with, and this post here does nothing to improve that impression.

You say you don't want to create a fork, but that is exactly what you have done. Last summer when you were talking about Oracle support using the oci8 suite of functions, I told you that I was already working on a move to PDO for the database layer (it had its first public demo at DrupalCon Barcelona in September) which should make supporting Oracle et al easier, and asked for your feedback on how easy Oracle support would be via that mechanism.

Instead, you first resisted PDO entirely then ran off and embraced PDO as if it was your own invention and started implementing what became Siren: Supporting both ext/mysql and ext/pdo.mysql (why?), and various other redundancy in a way that did not offer any potential for doing, well, anything else to the database layer besides more parallel, duplicate implementations that required even more string parsing than we do now. How is it inaccurate to say that you created a development fork from the database work that was already ongoing, that you knew about, and that you were invited to help with?

You then flooded the issue queue with Siren work, and argued with everyone under the sun about whether it was appropriate to add Oracle support (or changes that were only useful for Oracle support) via massive changes to the database layer while Drupal 6 was in beta state for weeks. All that does is make people not want to work with you, no matter how valid your proposals might be on their own.

If you want to thank us for the work we've been doing to improve Drupal's data access layer, please do so by keeping unfocused and off-topic posts out of this group.

drupalio-gdo's picture

I need my company to connect drupal with oracle as fast as possible.
what would be the fast way?
this project? another project? please decide and do accordingly.
I hope I don't wait just because of inner-drupal-politics issues, or inner-community communication problems.
according to my understanding of this issue, we have ready oracle support solution, unless some small issues are settled down. maybe i have wrong understanding.
tell me when u expect an oracle solution or if i have to continue my research for other cms.
thanks
no offence

Find another CMS

Crell's picture

Really. I'm not kidding. The existing "implementation" mentioned in this thread is rough, and only helps core. Contrib modules for Drupal 6 will not be compatible with it without a lot of rewriting. Unless you're prepared to do without the major contrib modules and heavily modify the ones you do use, I don't see Oracle as a viable platform for Drupal in version 6.

The primary reason for that is Oracle's out-dated requirements for field type handling, where any remotely interesting field type needs special handling and even an extra query. That's not something Drupal is currently designed to handle, or is it something most contrib authors are even aware of.

The new database layer being developed for Drupal 7 will, hopefully, help and make Oracle support more feasible. Not guaranteed, feasible. However, that is a year away from production release at this point (Drupal averages a major release about every year), and still doesn't guarantee that Oracle will work perfectly.

Unless you can wait about 12-18 months to launch your site, if Oracle is a hard requirement than Drupal is not really an option for you right now. If you can drop the requirement for an out-dated, expensive, proprietary app then Drupal/MySQL may or may not be what you need. It depends on your site.

Hope to clarify the bottleneck of Drupal + Oracle

hswong3i's picture

Core (and so for contribute modules, too) queries need massive update because of:

  1. Reserved word conflict problem. Please refer to http://drupal.org/node/371 for more information, which will not detail in here.
  2. Syntax update for PDO, e.g. remove '%s' and %% syntax (detailed in http://drupal.org/node/199101).
  3. Utilize drupal_write_record() for LOB handling (detailed in http://drupal.org/node/183148).
  4. Writing code which strictly follow Drupal query syntax requirement, e.g. map data type correctly, or escape empty string with %s, etc.

As mentioned above, most of these changes are not Oracle specific and not suitable to handle within Drupal driver implementation. Moreover, they just require code written in Drupal standard. That's why I call this as "minimal among of logical change", which is totally different from "minimal among of code change".

The remotely interesting field type is not a special case for Oracle but for most database, e.g. PostgreSQL (pdo_pgsql LOB handling), DB2 and MSSQL also have similar requirement. From an opposite point of view, the updated field type handling (in PDO style, with ? or named variable binding) is only suitable for MySQL and SQLite. On the other hand, provide field information from client side is Drupal 6.x requirement, e.g. by using %d, %f, %s, %b, etc. So the needs of Oracle and other databases have no conflict with our existing Drupal implementation and requirement.

Support other databases doesn't mean we need to trade off our MySQL users, which is already proved by the rough implementation mentioned in this thread, and other successful cases like ADOdb, Moodle, Gallery2, eGroupWare, etc. We just need to start Drupal 7.x with a complete consideration of the needs of other databases, and the work will soon be done without difficulties.


Rome Was Not Built In One Day.


Edison Wong
CEO, Co-founder
PantaRei Design Limited

Enterprise

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: