Posted by Souvent22 on May 23, 2006 at 3:35pm
The issue of supporting more databases has been around for a while. It was rescently brought up today on the development thread. I personally think that more database support is needed also. The more flexilble that Drupal is, the healthier it is in general. My currently list to implement is MS-SQL, and then Oracle. This list is definatly going to grow, but I believe that those two are a good start.
Comments
One barrier towards
One barrier towards supporting new database systems is the lack of functions for database structure alteration (mainly table creating and modification) in drupal's current database abstraction layer.
Therefore, a new database port must not only consist out of a new database.inc, probably some more core sql modification and a translation of the database scheme, but also must ALL contrib modules that want to be available for the new database port their database schemes and update routines.
And as most contrib developers will probably work only with one database system (mostly mysql), they won't be interested or won't even be able to do this port.
So even if a new database system would be supported by core, most contrib modules won't run on it.
There had been a long discussion on the dev mailing list on this started by adrian [1]. He proposed the extension of the database abstraction layer.
I did exactly this, and provided a patch [2]. To be included in core, it currently lacks mainly of some more reviews and a postgres implentation (I'm totally unfamiliar with postgres, so I won't be able to do it).
However, for me the patch works perfectly in mysql.
Hence I think the first important step to ease supporting more database systems is to get this patch into core.
So go ahead and review it! ;)
(see http://drupal.org/node/63049)
[1] http://lists.drupal.org/archives/development/2006-05/msg00456.html
[2] http://drupal.org/node/63049
But based upon the
But based upon the discussion on the list that idea needs more than just reviewers, it needs people to agree to it in principle as well as having good code, right?
It seems that editing the database.inc file is the best short term method of getting another database included.
On that subject - what databases are worth targetting? Clearly MSSQL and Oracle. Firebird? SQLite?
Greg
Growing Venture Solutions
Drupal Implementation and Support in Denver, CO
knaddison blog | Morris Animal Foundation
Is there not an alternative model
Whilst I'm happy to go through and add the mssql databse inc and the associated changes to database.inc I'm concerned that it feels like the changes for MS SQL will not easily be covered by this approach. In particular their use of TOP instead of LIMIT the latter of which isn't supported.
I was wondering whether it might not be better to use an approach the abstracts database calls completely into a separate file. I guess this would be akin to using a stored procedures / user defined functions approach but without having to actually use those. I realise this would be a huge amount of work from where Drupal is now and probably might have implications for performance but it could be a way forward.
Another approach is to pur the SQL commands in the database itself - but that's probably a bit radical and has other performance issues.
already discussed
Nick,
Rather than re-hashing that argument I point you to the link [1] in frando's post. http://lists.drupal.org/archives/development/2006-05/msg00456.html
It was discussed in detail and no resolution was made about which model is the easiest, but many people wanted to stick with using SQL rather than an abstraction layer. And when I say "many" perhaps the most important is Dries...
Regards,
Greg
--
Growing Venture Solutions
Drupal Implementation and Support in Denver, CO
knaddison blog | Morris Animal Foundation
not too surprising
I imagined that would be the case. I had kind of worked out a way to resolve the limit problem which I suspect is the major stumbling block. Looking at the SQL in some of the modules I imagine that there shouldn't be too much problem implementing the normal commands. I guess I imagined a scenario in which SQL wouldn't be the only access language for a given DB.
In which case I'll start making the MS SQL access file and see what people think.
Regards,
Nick
MS SQL DDL
I have started making (remaking) a database.mssql for creating the drupal databases in Drupal.
I am also starting to outline a "conversion" function that will be run on Create table statements for modules to convert them to MSSQL DDL. This is only a quick fix for now until a better solution comes along. I'm thinking either people will start including MSSQL in their modules DDL (Data Definition Language)/Table creation statements (yeah, not likley), or a new database abstraction comes along (currently in the talks).
MS SQL
I have created a basic script that creates all the tables for MSSQL if that would be of use - not sure how to post any of the stuff I've done up - so some advice on that very welcome
Nick
I'll create an issue on drupal.org for discussion, and you'll be able to attach your script there. It'll also open up a "case". I'll post a comment with a link after I make it. Thanks for your work. it's a first step.
Upload area
Have you managed to create the issue on drupal.org so that I can upload the files so far.
Thanks
Targets
Greg,
I think MSSQL/Oracle are worth targetting, but they are the hardest because the drupal community does not like/support them much. Firebird and SQLite would be easier because it is used more so by Drupal developers in other projects (this is my opinion/assumption, I have no facts or data to back this up).
So, I guess my list is:
MSSQL
Oracle
Firebird
SQLite
DB2
FoxPro
I suppose the next question is order of importance/community needs.
Substituting for LIMIT
As you probably all aware the Postgres and MySQL database...inc files use LIMIT for queries requiring a range of results. LIMIT sadly isn't SQL:2003 compliant which specifies the use of syntax like:
SELECT
ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
columns
FROM tablename
Sadly (and somewhat obviously) SQL Server 2000 doesn't support this or anything approaching the LIMIT functionality and only has the simple TOP command i.e. SELECT TOP n ... So in order to make the MS SQL interface work I need to find a mechanism for replacing LIMIT. I though of using a temporary paging table (which is the normal trick) except I don't know the structure of the original table in the query. So I'm left with the following:
1) Run some kind of query to get the size of the result set likely and then create a paging table to join onto it and then run a third query to get the paged results - yuck!
2) Run the query using SELECT TOP n where n would be the sum or start and count and then remove the rows upto start prior to returning the results - which is messy but doable.
3) Some better suggestions - please
Thanks
Nick
Most of the solutions I've
Most of the solutions I've seen for doing paging involve using temp tables and T-SQL stored procedures :(
--
Growing Venture Solutions
Drupal Implementation and Support in Denver, CO
knaddison blog | Morris Animal Foundation
Paging and Temporary tables
Likewise - sadly. I have also come accross another problem with the temporary table query - MSSQL doesn't have CREATE TEMPORARY TABLE - instead it has CREATE TABLE #temptablename. Since I will not know whether a table is temporary in following queries I have know way to know whether to add the the # or not. I have a couple of ideas but nothing very elegant - so ideas welcome!
An idea - Needs discussion
At the moment all table names passed from modules to the data layer are surrounded by {} in order to allow for replacement of table prefixes. Could we not have an additional type which is akin to {{...}} (or anything sensible e.g. [...], {#...}, etc.) that would signify temporary tables - this would be a relatively small change in a few modules at the moment I suspect.
Nick
Following Up ...
Any updates on this activity. I'm definately interested in MSSQL Support.
Cheers,
Matt
Pinging about MSSQL
Just another quick ping about Microsoft SQL Server interest. Are folks actively working on this in the background or did interest die off? Very interested in learning more ...
Paging
I have off and on. But I can't figure out a nice, solid 'Paging' solution to it. perhaps I should throw up the 'paging' options for a vote.
MS SQL Support Patch
It looks like Peter Heinrich has added a patch for this ...
http://drupaldigest.com/development#comment-138
I won't be able to test it until mid-to-late October, but I'm eager to give it a go.
Update
I ran across this thread, i think this could help in the "paging" problem:
We have received lot of feedback on making server-side paging syntax easier and considering it for a future version of SQL Server.
As for ROW_NUMBER, why do you say that it doesn't work when doing JOINs or ORDER BY? The ORDER BY of the outermost SELECT affects only the order in which the rows are returned to the client. ROW_NUMBER() has it's own ORDER BY clause so you can effectively number rows in different order and it doesn't impact the presentation order. With ROW_NUMBER you are just counting rows based on set of column(s) and optional ordering so joins doesn't really matter either. Can you give some specific examples where you have problems converting LIMIT syntax to SQL Server?
And the effect of paging results is often misunderstood. Generating a row number without ordering the rows in the result can produce unpredictable behavior. You can end up retrieving some rows in say page #1 in subsequent pages depending on how the engine executes the query and so on. So application requirements is a key consideration before implementing paging. The use of ROW_NUMBER forces you to think about these issues carefully. Note that Oracle also implements the same and it is effective replacement for ROWNUM pseudo-column.
IBM DB2 Database Support (Module) for Drupal 4.7
Hi,
I am a developer working on a drupal db2 database abstraction module for a client. My development database server is the free DB2 Universal Database from IBM, running on a windows machine, however my client has a Z series IBM server. I am using the PHP ibm_db2 API which accesses the database via a db2cli.dll library. After hacking in support using multiple disimilar databases concurrently to the drupal 4.7 database.inc I created a working database.db2.inc abstraction module.
Initially I found it sufficient for accessing some ancillary data by switching from the default drupal db running on a mysql server to some tables in the db2 database via db_set_active(). However i decided to go further by porting the mysql drupal db creation schema files of drupal 4.7 as best of possible to db2 sql syntax using a reference manual. Well to make a long story short, after much debugging with db2cli traces, and a fair amount of hacking of some of the drupal modules I finally have drupal running from db2. The drupal source code hacks though a very unfortunate and reluctant act on my part were absolutely necessary in my opinion due mostly to the loose type casting used in Drupal which was no problem for mysql. However since db2 has very strict casting and data types, doesn't support double quote string delimiters, and requires prepared statements for working with clobs some changes had to be made. Never-the-less I made every effort to ensure my modifications were backwards compatible with atleast the mysql database module.
Right now i'm still running tests for more bugs but i am very confident that i can have this working 100% error free. So if there is anyone interested in the work I have done please let me know, since I would be more than happy to share some patches. Drupal is great software and I would love to recontribute.
Regards
Kris
db2? whew!
Kris ... while I haven't used db2 in about a decade, I retain a deep fondness for it and would love to my favourite cms support my old favourite rdbms. I can't see myself ever needing db2 support again, but kudos to you for doing this work and contributing it back! No doubt at least a few people out there will be able to check off another operational requirement that can be met by drupal.
Paul Chvostek — Developer, integrator, Internet plumber.
Nice
Kris,
Nice. Yes, I have been interested in playing more with DB2 as I hear it has wonderful native-XML-database-table support. Did you try using the base code by the IBM Developer works for your DB abstraction layer, or did you start from scratch?
I too ran into some issues with SQL Server using the right connection client library, although, once I found the newer library, it was not problem. Keep us posted on how things go.
Common problems
This seems like the most likely place to pose my question. Is there a resource anywhere of the common problems that people have run into in getting other databases running with Drupal and the common solutions to these problems. I suspect that although not completely identical the other databases may share common restrictions. A summary of problems I have run into:
1) num_rows not supported by all database drivers
2) Liberal use of single quote characters with numeric types in core modules (i.e. using '700' instead 700 on an insert of an integer)
3) DISTINCT, GROUP BY, and comparison on "CLOB" fields somewhat common.
4) Use of the MySQL specific IF(expr1, expr2, expr3) function.
5) Need to use prepared statements to insert CLOBS once they reach a certain size.
6) One shouldn't look at the MySQL CREATE TABLE statements for guidance, use the PostGres ones instead.
7) TEMPORARY tables require a schema name prefixed on DB2
8) Perhaps others I've forgotten.
I now have a simple install of Drupal 5.0 working with DB2 9 although there are probably some bugs lurking around. But I am not too happy with some of the measures I needed to take to get it going (I changed db_query in database.inc and I require the implementation of a couple of extra simple methods by the database specific database..inc files. Are these problems common for new DB adapters? and are there recommended methods of handling them?
No Manual Unfortunatley
Hello. Yes, I have run into those problems, but no, there is no real "how to" in how to deal with them. the "CLOB" issue is something the MySQL/PostGRES developers have never dealt with if they have never used a larger/robust RDBMS system. It is nice in the MySQL world that you can treat just about everything as a large or small "string" :).
Anyway, i'll try and shed some light on issues:
1) Yes, num rows aren't supported by some drivers, especially for larger DB's because they like to keep the cursors/information server side. most usually grab the entier dataset, count, and set that as the row count. This sounds in-efficient, but you usually don't have that many rows returned, and if you do, you need to look at your SQL (why in a web-app would you return 10,000 rows to the user?). I know the Oracle driver that's been floating around does that, and I've run it, and it runs sweet, just as fast as MySQL if not faster.
2) Yes, type casting is always an issue, even in PHP (sometimes I have to cast vars). That's something that needs to do into the DOCS that people need to watch their data types during their inserts.
3) Yeop, "Dirty SQL". SELECT * is so much easier that listing your columns, but makes it hell when trying to move to other DB's; cause the DB is saying "Why do you want me to try and 'DISTINCT' on a CLOB which some binary video of you're wedding? Seriously...here's what I think of that....ERROR". Somtimes I wish modules came with a little stamp that said "Clean use of SQL. Safe for most DB's"....but...no.
4) I usually try and create equivilant functions in my db (user functions in SQL Server). E.g. I had to create the GREATEST function. But, if it's some random MySQL specific function, I just re-write the logic.
5) No getting around that one.
6) Yes, to a certain degree. Try "normalizing" your datatypes. Use INT, instead of NUMERIC, SINGLE, etc.
7) not too familiar with that one
8) Reg-ex is your friend...but don't abuse it (re-writing SQL statmeents)
9) pager_query can be tricky, but after that mile-stone, it gives you a good feel for how hard it's going to be to implement your DB (at least I thought so)
10) On larger RDMS, pre-fixing with your schema usually adds speed. e.g. SELECT nid FROM [drupal_space].[node]
11) Ok, I'll say it....I did find the ODBC wasn't THAT much slower than a native client. However, things work diff. through ODBC. Me personally, I like to stick with native clients over TCP/IP.
But yeah, what you have are common problems. I'm sure sometime in the next few months there will be another "more RDBMS" push. There are a few circling like buzzards as "patches", etc. (DB2, Oracle, SQL Server, I think a sqllite), so we'll see.
RE: No Manual
Well I guess now that I'm done (as far as I can currently tell at least) it's less important to have a knowledge base somewhere, although I find it interesting that similar problems are encountered with other DBMS. Anyway about my experience:
1) Yes I also ended up fetching all the rows, this is different from what was done in the IBM developer works example but I was finding their solution didn't work in all cases. This was probably mostly to do with the fact that I'm working with Drupal 5.0, I seem to recall their solution dying because of the caching.
2) Agreed.
3) Agreed. Although if the core ones aren't clean...
4) This was fine except for the IF example, because DB2 doesn't allow passing around of a BOOLEAN type. I just had to rewrite these as CASE statements.
5) This is where I had to modify the base db_query method so I could get access to the parameter replacement.
6) The problem I encountered was MySQL's implicit DEFAULT '' on longtext NOT NULL columns, PostGres actually requires you to specify a default on a not null column if you want it to have one (gasp!). Of course I doubt any other DBMS behaves like MySQL in this regard.
7) DB2 forces temporary tables to be created in the SESSION schema and this is not where your main tables are. Only thing I could think of was getting the modules to call a db_temp_name() function I had created, which adjusts the name of their temporary table (and this could be a simple identity function for other DBMS).
8) I tried to keep my reg-ex to a minimum. I tried to just "clean up" the bad SQL I found, and I believe it will still work on other DBMS.
9) I didn't have too much trouble with db_query_range but I fortunately had the IBM developerWorks to base mine on.
10) Something to try I guess.
11) I'm also using the native client. I haven't actually looked at the ODBC one, although I suspect I'd have had to compile the driver either way.
Changes/additions will need to be made to the Drupal database layer and as a result the correct use of it by modules in addition to just cleaning up the SQL. Otherwise I don't see it being possible to support a good selection of DBMS.
DB Layer re-write.
Sedave,
Yes, I like your thinking and I agree that the db layer needs to be updated/changed to support more DB's. I think what needs to happen is what basically happened with FORMS API....."Suck it up, and re-write your forms (SQL)". When FAPI came out, people had to totally re-write their forms. Although there was a lot of moaning about this, in the end this as made for a much more powerful, faster, and some what easier forms system. This has allowed many advances in Drupal to take place. I think with the next release, I am going to try and push for the same thing with the DB layer. What I need though is a consensus from the other db layer creators (meaning NOT post-gres or MySQL) to create some basic foundation that will make it easy to translate across most DB's. Perhaps even some sort of "SQL verifier" that makes sure people are doing things like "..if you have a column set to NOT NULl, you MUST set a default value".
So, i'm not saying totally re-write the db-layers, but perhaps how people use the db_layer, and that functions are supported.
I also think the SQL statements need to be "normalized"/"concentrated" to function calls and/or an .inc file. Why? Well, when you get into larger sites, it is nice to have "some" speration of the DB layer. If you have all the SQL statements in some include file, and you'r ejust calling "_my_module_get_item_list()" or "_my_module_make_new_item("item_name")", etc., then the DBA can change the schema, update the SQL, etc., and you, the application developer are just using the same function calls and not having to worry about it. Just a thought though; but for Drupal to handel larger sites, this NEEDS to happen. Or at least, this is how one should go about coding their module.
not likely
i find the prospect of moving all SQL statements out of the code to be quite unweildy for developers and thus unlikely to get implemented. you could probably implement such an inventory with a small modification to db_query()
True
Moshe,
True. I didn't mean the core-code. i just meant for custom module development. E.g. If I'm creating a custom module for a client, and I know it's a large project with a dedicated DBA, it would help for me to use my .module file for my application logic, and let the DBA control (to some degree), the included .inc file for my function calls for information to/from the DB.
I don't like messing with core much, but I believe I can and should make up for any core short-coming in my own module. It's nice that core is abstracted in this manner. But yeah, in essence, the SQL statements wouldn't be totally moved out of code, but just kind of "logically" seperated; thus giving the app developer his own space for development, and the DBA/DBO their own space for development w/o having to step on one another's toes too much.
As a general rule though, this would be inefficient and unwieldy for most general module development. It's again, a "situational" approach and most implementations are.
I Agree, but wonder if there is enough developer interest
While I agree, it will be harder to get support for changing/updating the DB layer than the forms API. I'm relatively new to Drupal but I assume the overhaul to the forms API overall benefited all module developers, whereas they probably don't want to worry about whether their SQL is or is not tailor made for MySQL unfortunately.
Also if we weren't forced to support MySQL I wouldn't necessarily require defaults on NOT NULL columns, sometimes you want it to be a fatal error if you do not supply a value on insert. It's those implicit defaults that throws a spanner in the works.
As for what would actually be required I agreed it wouldn't require a total rewrite, but a few things I think would help:
-Slightly modify the db_query parameter list to separate text replacement from actual string DB parameters (to make it easier to do prepared statements). Currently both are %s, and I'm having to work off of DB parameters being '%s'.
-Move db_query to the DB specific implementation or somehow make it so we can do prepared statements without it.
-Add a few more small functions to the DB specific list, that module developers should be using with certain tasks. (Like my temporary table example).
Unfortunately the biggest problem is still module developers writing SQL that only works on MySQL, it will be difficult to change this when the only DB they are familiar with is MySQL, and all they test with is MySQL (and sometimes PGSQL). Although I suppose once installation and core works with other databases some of them may take the cross DB thing seriously and look to the core modules for guidance on how to write cross DB SQL (something that doesn't help at all currently).
Large/Small
yeah, for the large large majority, this is not an issue. And then there's the problem as your said of introduction more databases to support, thus more schema's to support. There is an attempt being made to at least abastract the DDL statements (db_create_table, etc.) to a FAPI like interface. So, I'd say that's a step in the right direction.