Hello,
I am going to create a D6-based web site for a huge number of users ( > 800,000). We expect this site to have tens of thousands of nodes (>20,000).
What is the impact on performance of these? I could not find any info on scaling the user accounts number. Does it make absolutely no problem performance-wise to have so many user accounts?
Here is what i managed to gather until now:
- i should use the SOLR module for my search engine, because the vanilla Drupal search engine will not scale to that nodes number,
- i should use caching intensively (between web site and database, between browser and web site), because the load can be too high on the web server / on the database.
- i intend to use pressflow, to dispatch the database load on several Mysql servers.
If you have any advice, or if you know a reference site on the best practices to scale Drupal 6, please let me know :)
Best thanks
Comments
The gateway question here to
The gateway question here to start with: how much users will be anonymous and how much authenticated.
And for both: how much page requests / sec do you wish to serve. Then you'll have to programm your system and server, benchmark, test, etc to make that happen.
There is a huge difference on server impact in these two roles, since anonymous users do a lot less crud action on your drupal site.
From there on out there are a lot of scaling possibilities, check out http://www.2bits.com/ and of course d.o
anonymous users do a lot less
That is an important point, thank you for it.
It will be a commercial site (sort of), in which authenticated users will be able to comment nodes and buy articles; most probably no crud on nodes for most of them, except editors.
I will have a look at 2bits.com, thanks!
Are you scaling for a
Are you scaling for a possible peak future, or is this actual visitors that will visit your page and register an account upon launch?
--
Vegard
Good question vegardx. This
Good question vegardx. This is not a possible peak future, this is the user base to start with.
Most probably registered users will appreciate to be connected by default on the site... however i was wondering if that is a good practice regarding performance.
Any hints will be appreciated :)
The number of node and user
The number of node and user records won't be a problem, the question is how many simultaneous users will you have logged in at any given time.
Large tables can be a problem
This isn't necessarily true. Load a million nodes and users, and problematic SQL queries can become nightmarish since the temp tables no longer fit in memory. Large data-sets generally contribute to performance problems. There's no hard limit, but it's definitely something to watch from the database performance side.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
Basically, this is the sort
Basically, this is the sort of problems against which i would like to protect myself.
For example, do core modules behave correctly with a huge user database? Or should i already set some performance settings for that? I would have the same question for the nodes, even if their number will be smaller, and i believe relying on SOLR should help us a lot for searching.
Sorry, what I meant to say
Sorry, what I meant to say was that number of nodes and users won't be a problem. Just make sure those tables are innodb. It's your concurrent set that will be an issue. That is what will push you past what your SQL query cache can keep in memory. If you have 2,000,000 nodes, but only about 500 are accessed on any regular basis you are fine--though you may want to partition your table if that is the case.
Sessions
Like a few others have said, it really doesn't matter as much about how many users you have. What I would think matters is how many are active in any given period of time, and how long can someone stay logged in? Depending on site policies, your sessions table could be a lot bigger than your users table, and it is more likely to need scaling attention. As a start, I might recommend looking at storing the sessions table in memcache. I know once upon a time that wasn't as performant, but I am not sure what the size/activity was, or if that has changed.