Posted by JuliaP on October 7, 2008 at 5:03am
We need to run the bulk update feature on Path Auto for approx 100,000 articles. Even with increasing the maximum allowed bulk upload this is going to take ages to do. Any ideas? thanks in anticipation.
Comments
Increase PHP execution time, update count and trigger frequently
I hope your page runs on a dedicated server! If so you're able to do it efficiently..
A:
Remain on a bulk update value of about 1000 and trigger it very frequently, right after previous trigger finished. 100 Calls shouldn't take that long..
This way you're still capable of checking the server behaviour during the trigger. I'd expect a process time of 30..60sec. You still can stop triggering again and again if the server is not that healthy as it should be meanwhile.
B:
Increase PHPs maximum execution time (e.g. php.ini), increase the bulk update amount to 100'000 and call the trigger..
Don't forget to allow PHP to consume a huge amount of memory (i don't know the amount needed, i'm pretty sure it should work with normal memory settings but you should avoid script kills due to such sideconditions when doing such hacks..)
I'd recommend to take the site offline during this process since the server will have a lot of load meanwhile.
Backup the DB first and be also ready to kill apache/httpd if something is going wrong.
You should calculate the expected total time it will take (by measuring the cron runtime with e.g. 100) and have a lot of patience during the single call.
There's no progress bar and no thing you could do except waiting. I wouldn't like that because there's no tool to interrupt the update nicely. Finally undo your changes to php.ini and bulk counter.
Processing chunks of such a challenge is always the best idea and processing a lot of data always takes time.
Have fun .-)
Otherwise
Otherwise it will take ages as you've already noted. But if you have 100,000 nodes that you execute 100 for every 10 minutes you will have updated 600 in one hour and 14400 in 24 hours. That will take just shy of 7 days to execute. If you increase the 100 to 1000 it will take less than a day to execute all of them.
Create a cron.php in a subfolder where PHP settings can vary
If you create a subfolder under your Drupal installation, something like /scripts, then provided that your Apache configuration allows you to override PHP settings per subdirectory, you could create a .htaccess file similar to this:
php_value memory_limit 64Mphp_value max_execution_time 3600
php_value mysql.connect_timeout 3600
In that folder you could create a
supercron.phpscript like this:<?php
/**
* This supercron.php script is just a wrapper to Drupal's cron.php
* that can run with different PHP settings.
*/
// Move execution context to Drupal root.
chdir('..');
// Now, run normal Drupal's cron.php
require './cron.php';
Finally, adjust your crontab to run this script instead of Drupal's cron.php.
# This is normal Drupal's cron invocation.#0 * * * * wget -O - -q -t 1 http://www.exeample.com/cron.php
# This is our new cron script with potentially more PHP resources available.
0 * * * * wget -O - -q -t 1 http://www.exeample.com/scripts/supercron.php
I haven't tested this, but it might be an option that will prevent you from increasing PHP settings for the whole site. Just for Drupal cron executions.
all great ideas
Lots of great feedback here - thanks everyone.
There is also a handbook page Bulk updating Pathauto node aliases from cron or command line. That could be updated with some of the ideas in this thread and could also help inform your decision, JuliaP.
--
Growing Venture Solutions | Drupal Dashboard | Learn more about Drupal - buy a Drupal Book
knaddison blog | Morris Animal Foundation
Thanks for the suggestions!
Thanks for the suggestions!
cron it
Idea 1:
Idea 2:
Write a module that lets you do the same and that would use a Javascript page refresh call to keep cycling for as long as you wanted.
a different approach.
I take a different approach. One of my clients has over 1,000,000 url_aliases. In order to generate all of the aliases, I enabled php-cli on the server server, and wrote a simple script to bootstrap Drupal, and execute the bulk update.
In the long run, I think a better solution is adding an additional database table to keep track of which items have url_aliases. It could simply contain url_alias key, url type(node, term, etc), and the type_id(nid, tid, etc). I believe apachesolr does(or did) something similar to track which nodes have been indexed.