This question is probably a bit on the beginner end for this group but you've all probably been in my shoes previously so hopefully you'll be able to help me out.
I have a site with 6,000 nodes and 15,000 users. I make liberal use of Views, CCK and Taxonomy on all my pages and I use normal drupal caching and views caching. I run this site on a single VPS however my CPU usage lately has been a bit high (10% on a regular basis) and I need to figure out what my next step is to scale so as not to get on the bad side of my host.
Given modest financial investment (this is a niche site and thus doesn't generate much revenue at all) - what do you think the best way to reduce my CPU load / scale my site will be?
...
From my searching here are some that I've come up with (I'd appreciate some advice on whether these are good ideas or not too)
- Check with host and/or verify that some sort of PHP cache is running. Cost: cheap or free
- Migrate to Pressflow Drupal which supposedly is higher performing (how much would this even matter?). Cost: low however possibility of higher problem rate with drupal with edge case incompibilities using 3rd party modules.
- Separate search from drupal by getting Acquia Search thus offloading all search tasks to Acquia. Cost: low although cost grows with total # of nodes on site.
- Separate HTTP and MYSQL by getting another VPS. One VPS for HTTP and one VPS for MYSQL. Cost: moderate
- Add reverse proxy (varnish?) by getting another VPS. One VPS for HTTP/MYSQL and One VPS for RProxy. Cost: moderate/high?
...
Thank you all so much for any input you give. I'll be the first one to admit I'm out of my league when it comes to scaling drupal. My site seems odd from what I see on the forums - many people have small sites and no performance problems or they have huge sites like you all and plenty of money/experience to throw at it. Hopefully I can learn a lot here.
Thanks,
Chris

Comments
reverse proxy & other thoughts
I'm not sure how much a reverse proxy will get you, since if I'm reading this right, the vast majority of your page hits are from logged in users (15,000). If you are getting a lot of hits from anonymous users then using something like Boost/Varnish will help to take some load off your server. And if your doing this on the cheap I would recommend giving Boost a shot. One thing you could try is tuning mysql: http://groups.drupal.org/node/57213 Since your using lots of CCK fields in views I think this will help your page generation times.
Pressflow is a no-brainer; simple drop-in, 99% of modules work with it.
Offloading search will help a lot but that can get pricey depending on your site. Running your own solr search box is another option.
PHP caching is always a good idea.
Pressflow Question
Is it pretty easy to upgrade to Pressflow? and I guess should I encounter any problems, is it pretty easy to downgrade to a standard Drupal setup?
If you have no core hacks...
...it's very easy - just as easy as doing a minor version upgrade. And yes - going the other way is just as easy.
Best,
/Johs.
Very straightforward for me
And I'm really enjoying it with Varnish. My server's cool as a cucumber.
I did run into a bug with openID - they are using lazy session creation now. Here's commentary and a patch, in case you use it: https://bugs.launchpad.net/pressflow/+bug/521868
indexing
Also, as mikeytown2 recently brought up in another thread, there are quite a few opportunities to add indexes in the drupal database. You can use Devel to look for modules that seem to generate long queries, then look through the corresponding module code and examine the db query statements you find there. In general it's helpful to have an index on any field specified in a WHERE clause, so that's where you want to look. You can use PHPMyAdmin to add indexes to these columns. Doing this should make those long queries somewhat easier on mysql.
why is this a concern
You could implement a type of user objects caching that is not generally available in 6 without patching.
Can you tell us what your vps specs are in terms of cpu and memory and whether it is openvz or xen.
your 10 percent utilization needs better context since my vps runs on a .40 load with a dual quad core xeon and I think nothing of it.
Managing your Drupal sites performance
The first step towards Performance is understanding server resource utilization. This intern depends on the type of traffic: Authenticated or Anonnymous users, The type of Content served, The Frequency of content updates and the Features that have been enabled.
Broadly the Tuning of the site can be handled at 2 levels:
1) Fine tuning of the LAMP stack - Finetuning Apache and Mysql, Introducing Server side caching tools.
2) Drupal Resource Otimization
Some Analytic tools to fine tune Drupal resource consumption:
1) YSlow can be used to fine tune the HTTP, performance parameters.
2) Devel module to fine tune the Queries. Optimizing the Queries can give a sizeable benifit to the performance of the resource consumption by Mysql.
Some good server analytic tools:
1) Mysql tuner
2) Apache Log
3) Slow Query logs
Check out The Handbook page on Managing your sites performance if you already haven't.
As regards for Search, are all your pages indexed and does google crawl your site regularly, then Google Search could be a good option. Google search could also be a source of Revenue.
Shyamala
Unimity Solutions
My advise. Optimize each
My advise. Optimize each pages first (with help of Devel). Replace all query generated with Views by custom query. Optimize your code is the first priority.
After, implement some cache system (which depend of the type of your website (many anonymous users, authenticated users)).
Good luck !
Julien Didelet
Founder
Weblaa.com
Custom query
I'm interested in replacing queries generated by Views by my own custom queries. But How? Any help? Any docs?
Thx.
Karim
KarimB - Read the blog Le blog en français
If you want to replace the
If you want to replace the Views query, the best solution is to create your own module and input your own query inside your module (with the help of db_query and db_fetch_array (or db_fetch_object)). You'll find all information here http://drupal.org/developing/modules (how create your own module) and of course on http://api.drupal.org/ (e.g for db_query http://api.drupal.org/api/function/db_query/6) where you'll find all functions of drupal's api.
That's for how you can replace your query. After, you have to understand how you can replace different display included in Views (like display the result of your query in the table). For that, Drupal's api help you too (here, for display your result in table http://api.drupal.org/api/function/theme_table/6). With experience, you'll be able to replace all your Views by custom request. Of course, you can overwrite the default function by your own function (see how you can create your own display's code in your template.php file.
Other advantage is for further update. Update a code is very easy :).
Good luck !
Julien Didelet
Founder
Weblaa.com
Not that easy
I use Views to keep from those manual modules. If you want your own module, then you must integrate it into Panels, learn how to do all the pagination, AJAX, arguments... stuff.
I was thinking of using db_rewrite_sql, check if a query is exactly what we want, then replace it with two or there other simpler queries. But not sure if it is easy.
queries in views
Thx for your answer Julien.
If you want to replace the Views query, the best solution is to create your own module
Does it mean that we can not alter the query generated by Views directly in views ?
Creating a module for altering the queries seems okay, any doc about it? (I'm not speaking about how to create a module, but how to alter a query generated by views and return the results to views)
Thx and best regards
KarimB - Read the blog Le blog en français
You can overwrite Views query
You can overwrite Views query but you have to create your own module too :). For that, you have to input this in your .module file:
function xxx_views_api() { //xxx = name of your modulereturn array(
'api' => 2,
);
}
and after, create in your module folder, a file named xxx.views.inc (xxx = name of your module) and use xxx_views_query_alter(&$view, &$query) to overwrite your views query.
As you can see, it's more easier sometime to create your own module and put your own query. Of course, it's my opinion.
Good luck !
Julien Didelet
Founder
Weblaa.com
Coule one not use
Coule one not use hook_views_query_alter(&$view, &$query) to alter a views query to make it more efficient?
Canadaka.net
Developer time is way more
Developer time is way more expensive than hardware. The cost of rewriting your views is usually better spent in some other way. Especially since you loose a lot of flexibility and maintainability. If there are one or two cases where Views wasn't able to generate a good query and it's affecting your performance you can use some of its many hooks to alter things. See this recent post by Trellon on how to do that:
http://www.trellon.com/content/blog/view-views-api
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
Huum I'm not sure with your
Huum I'm not sure with your remarks. Of course, you can sent less money at the beginning with the hardware but in fact in the time, probably invest some money and time to have clean and optimize code is better (hardware has a cost in time, developer cost is a one shot). It's my point of view.
Julien Didelet
Founder
Weblaa.com
Me too, is it possible?
Me too, is it possible? Generally Views's queries are too complex, so uneffient. When I try to split it into two queries (from a SELECT JOIN to a simple SELECT then a SELECT IN (...)), the execution time of this query drops from 700 ms to less than 100 ms. A lot of queries like that.
It sounds like you must be
It sounds like you must be doing something very complex. A query that takes 100ms is still probably waaaay too much. If it is affecting your overall performance you may need to look into a custom caching system where the cached data is refreshed on cron and the block/page always pulls from the cache.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
CCK in filters
Long story short if you take out the filters that use CCK then you get your speed back. Has to do with database indexes, joins and other fun things.
Give this a spin, only for Drupal 6.16+ ATM...
http://drupal.org/project/dbtuner
Sometime with Views, you can
Sometime with Views, you can have a complex query where the executive time is > 100 ms (or more)... And unfortunately, sometime, you cannot input your result in the cache. I'm working actually to create a crm where all results have to do in real time (with authenticated users). Impossible to use cache in this case. And if your database is big, you can have a problem. In this case, the only one solution I know is to create your own custom code inside your module. But if you know an other way, I'll be happy :).
Julien Didelet
Founder
Weblaa.com
For example, a Views with
For example, a Views with TermID + depth:
SELECT node.nid AS nid, node.sticky AS node_sticky, node.created AS node_createdFROM node node
WHERE (node.status <>0) AND (node.vid IN (
SELECT tn.vid
FROM term_node tn
LEFT JOIN term_hierarchy th ON th.tid = tn.tid
LEFT JOIN term_hierarchy th1 ON th.parent = th1.tid
WHERE tn.tid =40
OR th1.tid =40))
ORDER BY node_sticky DESC , node_created DESC
LIMIT 0,20
When I split it to SELECT only the first 20 vid, then SELECT nodes with thoses vid, speed increases by 500%. And there are queries with dozens of JOINs...
The problem with Views is it does complex queries that don't always use indexes (see http://hackmysql.com/case1). And sometimes, multiple queries are way faster than one complex.
Cache doesn't always help, neither. I have MySQL cache, Views cache and boost. But if there are queries that take extremely long time, a few requests to pages that aren't in cache can take server down by the too high iowait.
*Enable mysql's query cache
*Enable mysql's query cache in the my.cnf file
*Increase the query cache size
*Install APC
*Install memcached
*Get Squid or Varnish in place
*If migrating to pressflow make sure your database runs innoDB as default and convert all myISAM tables to innoDB
For 6000 nodes you really dont need to think about 3rd party search ... But if so see if you can install Apache Solr or Xapian.
Drupal Rocks !!!