Manchester Meetup: Thursday, May 8, 2013 at The Farm Bar & Grille

Posted by miche on May 5, 2014 at 6:43pm

Start:

2014-05-08 18:00 - 22:00 America/New_York

Organizers:

miche

himerus

Event type:

User group meeting

Anyone who has any interest in Drupal is encouraged to attend, whether you use it in your day job and want to learn more, have used it for a while and need help with something, or are just interested in finding out about the most popular open source content management system.

Next Meeting

Thursday, May 8, 2013
Meeting: 6:00pm to 8:00pm
Social: 8:00pm to 10:00pm

The Farm Bar & Grille

1181 Elm Street, Manchester, NH 03104
(Located at the cross of Elm Street and Bridge Street)
Website: http://www.farmbargrille.com/manchester/home
Map: http://g.co/maps/uegpa
Parking: Diagonally NorthEast of The Farm

Take Elm Street going North
Cross over Bridge Street (passing The Farm on your right)
Turn Right on either Pearl St or Orange St. Follow the loopdy-loops to the inside of the lot.

Signup!

Please, please, please use the built in sign-up functionality so we know how many people to expect.

Topics/Presentations

We'd like to see everyone come prepared to either pose a question/issue they've ran across in the past month or to share something cool/amazing they've discovered in the past month. There are plenty of questions out there, so let's make sure we know what they are so we can get them all answered at this month's meetup!!

Additionally, please use the comments on this event node to suggest topics. If there is a consensus, we will try to have someone prepared to do an informal presentation.

Dinner & More Drinks

After each meetup, we usually depart from the "structured" environment, and continue our discussions over beers and food. So, when we are done being geeky, let's order food!

We look forward to seeing you all there!!

Comments

hook_cron and queues

Posted by richard damon on May 6, 2014 at 12:44am

I am working on a module that will be doing a lot of "background" operations. Currently I am using hook_cron to run periodically and in the hook I limit how much I do, and save my state and start again on the next hook.

The documentation seems to be saying that I really should be using a queue for this sort of task, but I can't find a detailed documentation of this, or good extended examples of how to do this.

For what it is worth, I

Posted by Bob Newby (not verified) on May 6, 2014 at 7:09pm

For what it is worth, I frequently employ a flag of a given type to queue up a task that needs doing. Like you, I use hook_cron implementations to process selected flagged tasks, removing the flags upon the task completion. I also have flag-specific variables configured in order to throttle the numbers of tasks processed upon any cron invocation. This is easy to set up and it works well.

Also, given that a flag is an entity, you can even attach priorities to flags and process the queues in priority order. I don't yet do this, as I process these queues in a FIFO order, but it is another nice thing about using flags to queue up the tasks. Of course, additional information can be stuffed into each flag, if needed -- even argument values and callback names, etc.

On the non-flag front, the off-the-shelf Simplenews module blasts out all queued up emails upon a single cron run. This can be a huge number of emails that show up as a spike on the recipient ISPs' mail servers. In turn that signals them that the emails are likely "spam". So, I altered Simplenews' default behavior, more or less as above, even to the point of throttling different mailing lists at different rates, with in my case the aim of spreading out each newsletter to be sent out in chunks on weekday days over a 2-week period. Again, this has been working just fine using cron (and Elysia cron).

Just a bit of feedback on

Posted by richard damon on May 11, 2014 at 3:55am

Just a bit of feedback on implementing the queue after the discussion. I now have a hook_cron function that reads my files and queues up the messages to process, and then a queue call back to process those messages. Doing all the work in hook_cron, I was finding I needed to limit the processing to no more than 10 messages a minute, or I got warning of cron seeing occational conflicts with itself.

The new cron hook I am putting a limit of 20 seconds of running, and it seems able to often process a 3000 message/ 4 Mega-byte file in a single run. and then the queue will typically drain in 5-10 minutes. (I have the cron hook hold off starting a new file until the queue is mostly drained to avoid getting too big of a back log).

The Java way (no, not javascript)

Posted by Bob Newby (not verified) on May 11, 2014 at 9:21pm

Hey Richard,

My background is framework level Java enterprise engineering. The classic way your use case is handled in Java enterprise contexts is to employ independent "worker" threads.

I cannot give you a tutorial on this. However, there seems to be a decent PHP-specific intro at http://www.mullie.eu/parallel-processing-multi-tasking-php/, along with some doc references.

Here's how this might work... Every once in a while (on cron?), you ask me if I am done processing the last batch of work you handed to me. I am in charge of how a batch is processed, including at what rate I want to work through it. I might answer back to you, "Rich, stop bugging me, I am still working on the last batch you handed to me!" Or I might say, "Oh good, I was getting bored just twiddling my thumbs here. What have you got for me now?" In the latter case, you hand off the next batch to me.

In short, you and I have negotiated a protocol for communicating about batches of work, for handing off a batch, etc. The protocol is the key. The treading part is a matter of mechanics.

I hope this helps.

Bob

P.S. There are also numerous Drupal-specific discussions on the use of threads, just one of which is at http://deeson-online.co.uk/labs/multi-processing-part-3-jumping-drupal-q....

P.P.S. There's also the question, of course, of whether more than one thread is needed to tackle work within the required time frame.

Since my primary use case is

Posted by richard damon on May 11, 2014 at 10:37pm

Since my primary use case is a one time (or just occasional) import of historical data, there isn't a major hard deadline, so I don't think I need to go to multi-threaded. There is also the fact that due to the server TOS, crontab tasks are limited to once and hour, so drush based solutions don't work well here.

Richard, I wrote a blog

Posted by rbayliss on May 6, 2014 at 2:14am

Richard, I wrote a blog article about this a while ago. You can check it out here: http://rbayliss.net/drupal-queue-api. Hope it's helpful!

Drush FTW!

Posted by damienmckenna on May 6, 2014 at 2:26am

I also strongly encourage using Drush to trigger the commands rather than cron, the less work that's ran via the master cron task the better imho.

Can you put some flesh onto

Posted by Bob Newby (not verified) on May 6, 2014 at 6:56pm

Can you put some flesh onto your counsel to employ a mechanism other than cron (in my case fine-gained Elysia cron on the scheduling side) to accomplish regular housekeeping activities?

Damien, good point. Another

Posted by rbayliss on May 6, 2014 at 12:47pm

Damien, good point. Another nice thing about the queue API is that you can process it using drush. "drush queue-cron" will pull items out of the queue one at a time and process them using whatever callback you've specified in your hook_cron_queue_info().

Thanks

Posted by richard damon on May 7, 2014 at 2:27am

Rob,
Thanks, that article is clearer than the previous things I read, which is making it clear that I am going to have to figure out what method is the best to abuse for this. The problem I am having is that the "tasks" I want to set up and schedule aren't a lot of small indecent tasks to queue up, but big tasks that get broken down while processing, for example uploading a multi-megabyte file that gets parsed and generates a couple of thousand nodes, or scraping a couple hundred pages from a web site, the URLs of most are determined from some of the pages being scraped.

Perhaps we can discuss this Thursday with more details.

I guess the question is how

Posted by rbayliss on May 7, 2014 at 2:28pm

I guess the question is how you can break your large tasks into smaller ones. For example, take that multi-megabyte file, parse it into individual rows (assuming it's a CSV or something), stick each row in the queue, then use your worker callback to turn each row into a node and save it. In that case, uploading the file would just parse the file and populate the queue, which could be worked off over multiple queue runs (or immediately if you were to also use the batch API to get one of those progressbar pages for long running jobs).

In the case of the scraping, it might help to think of each page that gets requested as a single unit of work which could be a queue item. Depending on how you're doing it, you'll either start with a list of pages, stick each one in the queue and process, or start with a single page, queue and process it, then use the results to generate more queue items (if you're following links on the page or something).

I won't be there on Thursday, but good luck!

automation

Posted by dgamache on May 8, 2014 at 8:01pm

Drupal is my favorite CMS. The current thread is very interesting to me. I've got a very busy schedule and love to build drupal solutions for people even though I never have as much time as desired to do so. I have even less time to make it to meet ups sadly. Any automation practices to make efficient solutions are of interest, starting with development environment for themes and modules, to staging, to making modules to import or migrate external content. Its a pretty broad topic, I'm happy just drilling down on the topics already mentioned for tonight. Glad to have the opportunity to meet up!

David Gamache