MERCI Master Inventory Sync Slow Going

Events happening in the community are now at Drupal community events on www.drupal.org.
stefanwray's picture

We have a lot of resources and equipment here at channelAustin: a variety of microphones, light kits, cameras, 3 studios, and more. Just the checkout gear alone has added up to about 95 content types for MERCI. This includes both buckets and resources for singular items like the Conference Room, Main Studio, or Glide Cam. It has taken our staff a long time to clean up our data that's been housed in Facil. Before doing an import into MERCI, we wanted it to be in the best shape. Relative to the amount of time needed to address the poor condition of equipment data, modifying the merci_import.php script to deal with the import was a breeze. But now we're at another bottleneck.

The process for getting the data out of Facil and translated into CSV files for import is like this. (1) Step one is to use Facil's export feature to export an Excel document containing all the equipment records. (2) Step two is to filter out which equipment records you want to import into MERCI. Facil has something called EquipStatusCode. With this, in our case, equipment is marked as Available, Assigned, Repair, Missing, Surplus, or Retired. We've had a big turnover of new equipment (more than $1 million in new gear in last several years), so we have a good amount of EQ in the Surplus and Retired category. Overall we have about 2400 records in our EQ inventory. We decided there is no reason to import the Missing, Surplus, or Retired gear. Our first priority is what's marked as Available, because that consists of our checkout gear.

(3) The third step is to start to put the Available equipment into uniform categories that are user readable and lend themselves to content type names that make sense. Our records had some similar gear being referred to with multiple naming conventions. One goal was to create content type names so that family of EQ would be grouped in alphabetical listings. So we ended up with content type names like: Camera HDZ1U, Camera iYouth, Camera Tripod HD, etc.

The cleanup of our Facil database also coincided with a thorough inventory and accounting of all the gear in the building. This included noting the location (Rm #) of the gear, which hadn't been done previously. First discussions about this data and the inventory began in April. We're still not completely finished, but we are at a point where we're now ready to start, and have started to import data into MERCI.

I cannot stress more that the amount of time in data preparation is considerably greater than the amount of time spent customizing the merci_import.php code and understanding the import process.

(4) The fourth step is to turn the equipment records into two separate documents: one for content types, the other for the items. I'm not going to go into this part right now. But obviously there is a need to match the equipment records to the fields in MERCI. MERCI Inventory Master allows you to customize and add other fields. We did this. We looked at what were the out-of-the-box fields in Master Inventory and looked at what fields in Facil's records we wanted to keep. For some there was a match. For a few others we created extra ones. For example, Master Inventory has a fields that correspond to Facil serial number, manufacturer, and model, but not to purchase date and purchase cost.

(5) The fifth step is to get the records out of Excel (if that's what you're using) and translated to a CSV file. Excel does a poor job of that, leaving unwanted characters like quotation marks in the wrong place. But, OpenOffice's spreadsheet software works well. So we used that.

(6) The sixth step is to actually engage in importing the content types. Like I said we have about 95. Bad idea to try importing all at once. In fact the merci_import.php script won't let you. First, of course, back everything up. This import was done in batches and went well, mostly. Only ran into one issue with 5 content type names being longer than 32 characters. Wrote a bug report that has already been addressed: http://drupal.org/node/911404

(7) And now to the annoying step. After importing the content types, the next step is to sync the content types to the fields in MERCI Inventory Master. It's a simple task. The problem is that it takes a very long time. I've clocked it at anywhere from 8 minutes 30 seconds to 9 minutes 45 seconds. And these need to be done one at a time. . . Check the check box for the content type, click on sync, wait, do it again, do it again. If you have 10 content types, maybe this is no big deal. But with 95 content types, this adds up to between an estimated 13 and 15 hours of sync time to complete the job. I've done about 30 syncs now and expect to be done by tomorrow some time.

With, kreynen we looked at the server, looked to see what else might be drawing resources like apache and mysql (used htop) and could not find anything particular. We re-started apache and mysql to see if that would help. I spent a good part of yesterday afternoon looking at information on how to optimize mysql and did find a few suggestions that seemed like they might make a difference. http://drupal.org/node/51263

The runtime information in phpMyAdmin shows a few in "red", like this.

Handler_read_rnd 51 k The number of requests to read a row based on a fixed position. This is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan whole tables or you have joins that don't use keys properly.

Handler_read_rnd_next 2,044 k The number of requests to read the next row in the data file. This is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.

Created_tmp_disk_tables 12 k The number of temporary tables on disk created automatically by the server while executing statements. If Created_tmp_disk_tables is big, you may want to increase the tmp_table_size value to cause temporary tables to be memory-based instead of disk-based.

Key_writes 4,281 k The number of physical writes of a key block to disk.

Select_full_join 1 The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
(Not sure how to do this)

Opened_tables 2,923 The number of tables that have been opened. If opened tables is big, your table cache value is probably too small.

So, it is doing a lot of queries, a lot of table scans, and there may be an issue with indexes.

I have no idea how to even begin to address these issues, or if these issues are "normal" on a Drupal installation like this where obviously there are going to be a lot of queries.

All of this raises some questions.

It is obvious that syncing content types to MERCI Inventory Master uses a lot of resources (CPU, memory) and involves complicated back-and-forth between tables. It is going to be time consuming, BUT is the slowness in our case solely attributable to this function, OR could Apache, PHP, MySQL be better optimized, OR is there something else going on? And how can I know?

I'm trying to answer those questions, because if this slowness not solely attributable syncing process, then performance across the board will be impacted.

Comments

The merci_inventory sync

darrick's picture

The merci_inventory sync functions calls content_field_instance_create and content_field_instance_update repeatedly. If you read the docs at http://drupalcontrib.org/api/function/content_field_instance_create/6 you'll notice that those functions both call the content_clear_type_cache(TRUE); and menu_rebuild(); functions. The menu_rebuild function is horribly slow. The docs link above states the fastest way to update/create fields is:

<?php
// Create several fields at a time.
   
foreach ($fields as $field) {
content_field_instance_create($field, FALSE);
}
// Clear caches and rebuild menu.
   
content_clear_type_cache(TRUE);
menu_rebuild();
?>

The merci_inventory module has been updated to sync the fields in this way.

Stefan reported the time taken to sync five content types is about 2 and half minutes which is down from about 45 minutes.

Open Media Project

Group categories

Audience

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week