One million or more users may create problem with buddylist.

Events happening in the community are now at Drupal community events on www.drupal.org.
sid3000's picture

Hi,
I have a question about scalability issue of buddylist. Suppose I have a site with 1 million or more users and admin(or any user) have near about 200000 buddies or more. My question is can buddylist module handle the access of buddylist of admin specially when buddylist calls the function buddylist_get_buddies and execute a query :

$sql = 'SELECT b.buddy, u.name, u.mail, u.uid FROM {buddylist} b
INNER JOIN {users} u ON b.buddy = u.uid
WHERE b.uid = %d';

that's mean whenever this function call is made to fetch admin buddylist, a sql INNER JOIN will be made between 1000000 users and 200000 buddies of admin. So how buddylist can handle this situation?

Any help?
Thanks

Comments

It would be good to cache

shyamala's picture

It would be good to cache the buddy list for each user and update the cache table only when a particular user updates the buddy list.
We will have to use a separate cache table specific to this module and store user and the buddy list as an array.

Checkout, http://www.lullabot.com/articles/a_beginners_guide_to_caching_data.

Netlink Technologies Ltd
http://shyamala-drupal.blogspot.com/
Follow me on Twitter :)

Thanks for your suggestion.

sid3000's picture

Thanks for your suggestion.

That query really isn't a

Jamie Holly's picture

That query really isn't a problem as long as both tables have uid indexed. Actually you are looking at the best case scenario on a join query, in the fact that your join fields are integers. I handle one system that has more than 50,000,000 records in one table and 5,000,000 in another and it constantly runs inner joins, most of the time with 2-3 where criteria and it returns the results in under .001 seconds. That's on a lower end dual core system.

Actually if you think about this query, its not uncommon, even in Drupal core. For example, sites with more than 1,000,000 comments is rather common, and to load the comments on those nodes you are doing a far more complex join:

SELECT c.cid as cid, c.pid, c.nid, c.subject, c.comment, c.format, c.timestamp, c.name, c.mail, c.homepage, u.uid, u.name AS registered_name, u.signature, u.picture, u.data, c.thread, c.status FROM comments c INNER JOIN users u ON c.uid = u.uid WHERE c.nid = 83391 ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1)) LIMIT 0, 90

With no query cache, that query runs in 0.002 seconds on the same lower end dual-core server. That database snapshot has 1.4 million comments and over 30,000 users.

Don't think because there are a lot of records on a join that it instantly means a performance hit. MySQL is a relational database and for it to be that, it has to be able to handle joins effectively.

Caching the data, like Shyamala suggests, is always a big bonus. But to see the bonus on a query as simple as yours, you would need to really utilize something like memcached. Caching it in the DB (the Drupal default) would most likely end up not doing much, or even decreasing performance, especially when you start thinking about table overhead from updates and deletes on the cache table.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

intoxination - thanks for

sid3000's picture

intoxination - thanks for your advice.

How you can show so much

Alexandr Kosarev-gdo's picture

How you can show so much buddies for one page and keep nice look for page? More why someone want to see page with so much buddies and what admin can do with so much data???

my suggestion is to use limit for this query and add pagination to admin page.

hi Alexandr Kosarev, If you

sid3000's picture

hi Alexandr Kosarev,
If you can see the buddylist.module code, you will probably see that the buddylist_get_buddies function is being called at user load (hook_user) like:

function buddylist_user($type, &$edit, &$thisuser, $category = NULL) {
global $user;
........
........
else if ($type == 'load') {
$thisuser->buddies = buddylist_get_buddies($thisuser->uid);
}

.......
}

and even more time and it loads a large buddylist(some time large amount of data) in an array at:

function buddylist_get_buddies($uid = NULL, $key = 'uid') {
static $buddies;

if (!$uid) {
global $user;
$uid = $user->uid;
}
if (!isset($buddies[$key][$uid])) {
$buddies[$key][$uid] = array();
$sql = 'SELECT b.buddy, u.name, u.mail, u.uid FROM {buddylist} b
INNER JOIN {users} u ON b.buddy = u.uid
WHERE b.uid = %d';

$result = db_query($sql, $uid);
while ($row = db_fetch_object($result)) {
$buddies[$key][$uid][$row->buddy]['uid'] = $row->uid;
$buddies[$key][$uid][$row->buddy]['name'] = $row->name;
$buddies[$key][$uid][$row->buddy]['mail'] = $row->mail;
if (variable_get('buddylist_buddygroups', FALSE)) {
$buddies[$key][$uid][$row->buddy]['groups'] = buddylist_get_buddy_groups($uid, $row->buddy);
}
$buddies[$key][$uid][$row->buddy]['online'] = 0;
$selectlist .= $row->buddy.",";
}
// Add the online flag
if (db_num_rows($result)) {
$sql = 'SELECT uid FROM {sessions} WHERE uid IN (%s) AND timestamp > %d';
$result = db_query($sql, substr($selectlist,0,-1), time()-1800);
while ($row = db_fetch_object($result)) {
$buddies[$key][$uid][$row->uid]['online'] = 1;
}
}
}

return $buddies[$key][$uid];
}

so I'm not talking about viewing buddylist page.
In this case users maybe admin or anybody. So Suppose 1000 such users who have average 200000 to 400000 buddies and online at a time then what will happen?

an solution

Alexandr Kosarev-gdo's picture

db_query('SELECT b.buddy, u.name, u.mail, u.uid FROM {buddylist} b
INNER JOIN {users} u ON b.buddy = u.uid RIGHT JOIN {sessions} s ON (u.uid=s.uid)
WHERE b.uid = %d AND s.timestamp > %d', $uid, time()-1800) - will return online users only.

I think it will be better if you design new function for this query, because several part of code have relations with current buddylist_get_buddies

My thoughts exactly. I would

Jamie Holly's picture

My thoughts exactly. I would also limit it, unless you are planning on showing a list of 200000-400000 names on each page view. If that's the case then you will have for more problems, especially if you are using path aliases.

So I would break it down into 2 queries. One for a count, in which case you can actually drop the join on the user table, then a limited query to get the actual display limited to X buddies for the block, with a link at the bottom - maybe something like "plus X other online buddies". That would go to a new page created in your module, which I would definitely page out the results on (maybe 50-100 per page).


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

I'm thinking about ajax like

sid3000's picture

I'm thinking about ajax like flickr !

Also I'm thinking about

sid3000's picture

Also I'm thinking about Client-Side Ajax Cache. I think facebook and many big players can use it.

Ditto on what was said about

nicknickoli's picture

Ditto on what was said about it being a fast query and the slower part being a return of the result set. Although that should be pretty quick by selecting columns. You could also hook_cron? and do some kind of reverse update so mostly current results were kept with your buddy list.

....

minesotaa's picture

This may not be exactly ontopic and I am not sure either :
but are some big snw sites not actually restricting the number of buddies
that a member can have ? For example, 500 or 1000.

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: