So, you want to speed up your large site? Well, start by cleaning up your SQL if you're accessing large tables. We are running Drupal on top of MS-SQL for a client. We have one table that has 7.5 millinon rows in it. Now, I had mentioned what I am about to talk about in another post before, but I know have some hard numbers to base it off of.
So anyway, this table had 7.5+ million rows, and the page was loading mighty slow. Running it 10 times, Devel query showed where I was bleeding...more like gushing time:
- 13839.11 1 SELECT * FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 13526.18 1 SELECT * FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 13481.94 1 SELECT * FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 13232.37 1 SELECT * FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 13771.38 1 SELECT * FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
etc...
Now, what was strange was the node table has 20+ million rows, and this query was running great:
SELECT n.nid, n.vid, n.type, n.status, n.created, n.changed, n.comment, n.promote, n.sticky, r.timestamp AS revision_timestamp, r.title, r.body, r.teaser, r.log, r.format, u.uid, u.name, u.picture, u.data FROM node n INNER JOIN users u ON u.uid = n.uid INNER JOIN node_revisions r ON r.vid = n.vid WHERE n.nid = '4234891'Times:
- 0.44
- 0.43
- 0.67
- 0.34
- 0.41
etc..
So..what do notice? The node query has it's columns fully enumerated out. So, I followed my own advice and did the same:
SELECT REF_NUM, CAGE, RNCC, RVCC FROM {IMPORT_MCRD_D_TEMP} WHERE niin_search = '%s'Times:
- 8.43 1 SELECT REF_NUM, CAGE, RNCC, RVCC FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 2.04 1 SELECT REF_NUM, CAGE, RNCC, RVCC FROM IMPORT_MCRD_D_TEMP WHERE niin_search = '000032734'
- 3.03
- 2.23
etc..
Moral of the story...enumerate your columns already! It'll save you headache as your grow and an instant performance boost.
FYI, the hardware stack:
App-server:
HP Proliant DL360, (5) 72GB 10K SCSI Drives, RAID 5
2GB Ram
One-proce, drupal core
Apache
Zend Optimzer
DB-server:
HP Proliant DL360, (5) 72GB 10K SCSI Drives, RAID 5
4GB Ram
2 dual core with HT per-core (8 virtual procs)
MS SQL Server 2005
SAN Array (directly connected only to the DB-server)
HP SAN Array
(14) 300GB 10K SCSI Drives (3 tera-bytes)
- Souvent22
Comments
Impressive. Thanks for
Impressive. Thanks for posting that! I'm digging your hardware, too... ; )
Table Paritioning
Hi all, I just subscribed to the group. I'm a DBA for a medium sized .com and have been using drupal on my own time for three years. I've got four small drupal sites that I run.
SQL Server 2005 supports table partitioning which allows you to divide your tables/indexes into seperate physical files based on a given criteria (created date for example). The physical files are then grouped together to create a logical table/index.
Choosing what to partition on is a bit tricky. You'd typically partition on a a range that is commonly filtered on so that the range is always within a single partition. SQL Server supports upwards to 1000 or so partitions but it is recommended that you keep the number under 100. You could partition on the month+year of the node, or on the node type (probably a good one for Drupal). It's not something that's easily accomplished. It requires a lot of thought and a smart way of maintaining the partitions if you change the data on the "partition function".
The speed improvements can be quite significant. Instead of searching through an index of 10million items, if the field the partition is based on is included, it will only search through the data file in question. For example, if you're partitioning on node type, and you're executing a query in the forum module, only the forum node types will be searched w/o having to filter the superset. It won't load the rest of the table in memory, because it knows that it only has to load that one data file.
PT is a very powerful feature
Skorch,
I agree. Table partitioning is a VERY powerful feature. One other feature you forgot to mention about partitioning is multi-threaded SQL completion. Meaning, if you did search across partitions, the job is split so that 1 threads runs the sql against multiple partitions as needed, and then is merged into one and returned. But it (partitioning) is much like a hash; A hash is only as good as it's hash-function, much like a partitioning scheme is only as good as what you choose to partition on (partition function). Me personally; I would say, it depends on the type of site you have. Some sites do not have a lot of diff. node-types, and on the other had, some times have very static information. But it is as you said before, the partitioning function is tricky and requires a lot of thought.
I believe it is items like these where you must start to separate the DB from the application layer (fore-warned, the information I am about to type is just my own thoughts and opinions). Most say it is bad to have the DB logic in the with app logic; and others say good. well, I would say, when it comes to Drupal and large sites, having both is good.
What I mean by this is that if I create a custom module, the information and tables at the module level should be some what controllable, maintainable, and changeable by my DBA w/o me having to worry about it. There are a few ways to accomplish this (and I am not talking about stored procs), however that is for another subject thread.
DBA should be a developer's friend
Sounds like you're volunteering to write a Drupal O/R mapper.
I agree, you could have the interoperability vs performance debate until the end of the universe. The C vs OOP factions are still at each others throats. While you may have gains at the DB level by isolating it from the developers (no offence, developers, but you sometimes write sloppy SQL), it'll cost you trying to translate relational data into OOP constructs. I guess that's why DBAs will always have a job.
Sql query Execution Fast
Dear,
I have four table and table have 50,000 record.
i have joined that table and execute that query but its take 1 minute.i want fast Execution.
tell me what's process i have do for fast Execution.
Thanks,
Brijesh Shah