I am managing a site with over 19 million business listings. Cloning a database this size is daunting and time-consuming. Most of this data is static, but there are some definite advantages to managing it as nodes. In a perfect world, I'd like to be able to move these 19M nodes to a separate database so that I could easily clone the rest of it for development and testing, but I can't think of any way to do that that will allow me to keep working with the data through the Drupal Node Module with its incumbent advantages.
Has anyone else wondered whether there is a way to manage some content in a separate database? Is it even possible (meaning I won't have to take the next 2-3 years off to do it?).

Comments
Hotcopy?
Since you are using drupal I will assume you are using mysql. If this is the case I highly recommend checking out: http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html
I was doing a standard backup/restore on a fairly large DB to do testing, the process was taking about 20 minutes total. Mysqlhotcopy was able to clone it in probably a matter of seconds.
http://dev.mysql.com/doc/refman/5.0/en/mysqlhotcopy.html
If you have innodb
mysqlhotcopy works only for backing up MyISAM and ARCHIVE tables
Xtra Backup for InnoDB
https://launchpad.net/percona-xtrabackup/
Thanks, this is like
Thanks, this is like mysqlhotcopys brother though, it basically does the same thing but only for InnoDB.
Drupal is myISAM out of the
Drupal is myISAM out of the box. Well, drupal 6 anyway, d7 is InnoDB now.
Generally you can't have
Generally you can't have flexibility, extensibility and performance. Choose two. But don't confuse Nodes with CCK. You don't have to store your data in the CCK tables or even in the node table. See hook_node_info(). This is the way that everything worked back in the Drupal 4.7 days. You can even keep your data in an external database by having $db_url as an array in settings.php.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
And if you're doing D7
And if you're doing D7 there's entities as well:
http://www.istos.it/blog/drupal-entities/drupal-entities-part-3-programm...
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
hook_node_info()
Dave, how can hook_node_info() help me keep nodes in a separate database? It returns only an array with some basic info for the node type, but the docs don't mention anything that describes an alternate database or table.
hook_node_info() is just the
hook_node_info() is just the tip of the iceberg for defining your custom node types in code that do not use CCK. You'll also need to use hook_view(), hook_insert() et. al. You can then save your data anywhere in any structure that you desire, and still make use of any modules that extend nodes.
Another option is to completely bypass the node system, create your own tools for CRUD. And then expose your data to the Views module to use as your display mechanism.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
Exposing Data to Views module
I am a beginner to Drupal (I'm not a PHP developer, but I am an experienced database developer). I have been struggling to find an easy way to expose my external data to the Views Module. Ideally, I would like the Views Module to allow me to easily replace the Views Module query builder to (the part of Views Module that gets the data) with either a call to a remote web service or a Stored procedure call to another database, etc.
I would like to link these external sources to the display management part of the Views Module WITHOUT involving programming as I am really hoping to avoid having to learn PHP if possible.
Thanks for any guidance or direction you might be able to provide.
Views 3
Helpful info
http://www.lullabot.com/articles/querying-slave-database-with-views
http://nodeone.se/blogg/views-3-yql-views-attach-presentation-at-drupalc...
Almost forgot
http://drupal.org/project/tw
bg1, if periodically synching
bg1, if periodically synching a local table(s) with your remote table(s) would be acceptable, you may be able to do this without code by using Feeds to fetch the remote data and put it into a Data table. Data contains all the code to automatically display its tables to Views, so as long as you can fetch the data, Data will make the rest easy.
The Boise Drupal Guy!
Yup
The last job we had to collect data from several other sites.
Although we didn't feed directly into the table but used cron for collecting data, building and discarding nodes.
One other solution could be
One other solution could be to use node export:
http://drupal.org/project/node_export
This module integrates with drush, so it could easily be scripted.
You could tie it to a cron job and have it automated at night if you wanted.
thank you all
Thank you all so much for your great comments and suggestions! Is this a great group of professionals or what?!
I just got a new partner to work with and because this data set is essentially read-only and gets searched for and displayed on only one page of the site, we're hard-coding our own custom call-back that gathers and themes the search results and presents them in a list that to the end-user looks just any any view-created list, even though it doesn't use the node or the view modules. But that's not to say we won't consider your suggestions in the future if our requirements become more complicated.