6.x is moving to mysql 4.1. I think this removes the need for any support for temporary tables, but first, I'd like to confirm that 4.1 supports subselects. We also need to evaluate the pgsql options. Does pgsql support the sub-selects? Does pgsql support something similar to the ALTER IGNORE TABLE construct for adding a unique index to a table that has duplicated?
If you're just joining us, please also read: #143160 (search_index has duplicate entries) and #102088 (views_fastsearch 5.x-dev release notes)
This is what I mean by subselects. (I'm not sure if I'm using that term correctly.) In the query below, the subselect is the LEFT JOIN (SELECT ...):
SELECT n2.nid FROM node n2 LEFT JOIN (
SELECT n.nid FROM node n LEFT join search_index x on n.nid=x.sid
WHERE x.fromsid=0 AND (x.word='bush' OR x.word='cheney' OR x.word='kerry' OR x.word='gore' OR x.word='reid')
GROUP BY n.nid HAVING COUNT(*) > 5) ns on ns.nid=n2.nid
WHERE ns.nid IS NOT NULL;search.module currently uses temporary tables. The temporary tables are bad -- they're slower (I think that they force additional writes?), and their use prevents mysql caching of the results (page 2, next, etc are just as slow as the first page).
The actual query that I'm generating in views_fastsearch is below. The extra scoring joins don't appear to have any impact on performance. (It's all in the search_index join.)
SELECT count(node.nid) FROM node node INNER JOIN (SELECT n.nid, SUM(5 * (i.score * t.count) + COALESCE(5 * (2.0 - 2.0 / (1.0 + c.comment_count * 0)), 0)) AS score FROM node n LEFT JOIN search_index i ON n.nid=i.sid LEFT JOIN search_total t ON i.word=t.word LEFT JOIN node_comment_statistics c ON c.nid = i.sid WHERE i.fromsid=0 AND i.word IN ('bush', 'cheney') GROUP BY n.nid HAVING COUNT(*)=2) vfs ON node.nid = vfs.nid WHERE (1)
Comments
Progress Update
Why is {search_dataset} needed? I've been able to perform all my queries without using it. And I think it's a contributor to the slowness, given the use of SQL LIKE.
I almost have a 6.x patch ready that uses subselects. All that is left is to add the exclude join similar to how views_fastsearch does it, and then (pending the answer to the above question) remove all the search_dataset code.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
I was under the impression
I was under the impression that search_dataset was used for phrase matching.
my blog (mostly drupal)
search_dataset
Thank you! Yes, I learned that earlier this week. The table is also used for exclusions. I'm pretty sure we can skip the join for exclusion terms and only add it in for phrases.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
Proof of Concept
Here's the 6.x patch - http://drupal.org/node/145560.
Doug Green
www.douggreenconsulting.com
www.dougjgreen.com
subselects
Hi .
I am using 4.1.21 and this form of sub-selects works just fine...
select (select 1),(select 2);
I also have a bunch of non-drupal queries that correlate to the outside FROM clause so the sub-select is dependent--however, the data must be normalized or things crash.
When I get a few minutes, I will try to render it into that standardized SQL form that drupal uses.
Hope that helps.
t.