Web-to-print workflow for newspapers (SoC proposal)

Events happening in the community are now at Drupal community events on www.drupal.org.
stdbrouw@groups.drupal.org's picture

I'd like to propose building a web-to-print module for Drupal as a SoC project.

Most newspapers still work with a print-to-web workflow (all editorial work is done on a platform other than Drupal, and is subsequently imported) which can be a hassle because in effect it means that you have to work with two content management systems and get them to work together. A web-to-print module for Drupal that exports content to XML so it can be easily imported into InDesign and Quark would be the first step in establishing Drupal as a one-stop solution for newspapers. It could also help online media that are already using Drupal to expand their market with a print product. And I mean "media" in a broad sense: schools could use the module to make a print newsletter (or let a designer make it with less effort) based on their online news/announcements page.

Some of us at the Newspapers on Drupal group, including myself, already have modules that accomplish this in a production environment (so the usefulness is not in question), but the functionality is accomplished with very workflow-dependent solutions that were written specifically for one use-case. The dilemma is that it is more costly and takes time to develop something sufficiently generic so that it can accommodate different sorts of web-to-print workflows, but at the same time it's wasteful to reinvent the wheel every time. Ken Rickard wrote some time ago "What we need to design is a module flexible enough to support multiple use cases. ... I don't think that [a web-to-print module] should dictate work habits; I think it should support them." and I agree fully.

Features

Therefore, one of the challenges will be finding out what kind of features users want from this module. Here are some possibilities:

What to export?

  • export based on user-selected content (e.g. "let's select the most interesting articles for our print edition")
  • export based on a shared term from a vocabulary (e.g. "I want to export everything from this edition its sports section")
  • export based on a search
  • ...

How to export?

(XML is supported by both Quark and InDesign so that seems like the obvious choice)

  • Automatic export: instantly, on a cron-routine ("I want my print edition to be in sync with what's online")
  • Manual export ("Alright, everything is proofread, let's send it to the designer")

Content of the exports

  • special formats like NITF or NewsML
  • entire nodes; only selected fields; in its original form or with computation on the fields (e.g. "I want the date of this event to display as 'x days from now'")
  • ...

The Plan

As you can see, there is a plethora of options (lots of which I've probably haven't even included) and it won't be possible to add all those to the module within the timeframe of a Summer of Code project. Therefore, the main focus should be on writing a solid basic module (or an API if needed) that is flexible enough to accomodate all these features. There have been different suggestions so far: an approach based off Views, something that uses the import/export api, or a start from scratch. It's not clear yet what approach would be most futureproof, so researching that and trying out some different possibilities before committing to a specific approach will be part of the project. Once that is decided upon and coded, the remainder of the time should still be ample to add the most-requested features.

Previous discussions: http://groups.drupal.org/node/5004 and http://groups.drupal.org/node/3009

Anyway, input and suggestions are more than welcome.

Comments

Note

agentrickard's picture

One of our developers (grndlvl) is already working on something similar and has signed on to be a SoC mentor, with this project in mind.

--
http://ken.therickards.com/
http://savannahnow.com/user/2
http://blufftontoday.com/user/3

I would love to be involved

jlmeredith's picture

I would love to be involved with this project. We currently have a need within our organization for a simple web to print solution. We currently have about half of our staff who have access to computers daily and another half who only interface "when it is needed". As a rural telco, many of these people are out on the roads a lot and and can spend time reading while riding from one place to the next. Being able to take local intranet information and push it too a printed form would be wonderful.

What can I do?

tmg-studio (Jamie)

--
Jamie Meredith
Technical Account Manager
Acquia, Inc.

Good ideas

ChrisBryant's picture

This would certainly be a great module/system. One thing to possibly add would be the ability to output directly to PDF on the site. I know there are pdf generation modules available that do this, but they do it on a basic level. The idea here is that you could have a separate theme for your site that is your print theme and the module could output a nicely designed/styled PDF based on the site content. It could also integrate with imagecache for instance to support high resolution imagery in the pdf.

I understand this would be somewhat separate from the above mentioned goals, but it's something to think about during the process of planning for this. It would allow you to create high-resolution print ready pdfs that could be sent directly to the printer.

We've created something along these lines but it is somewhat of a specific use case and not a generic solution. It would be great to explore the possibilities of making it generic and integrating with a common solution for web-to-print.

--
Gravitek Labs

PDF/POD

catch's picture

As well as PDF it'd also be great to have it compatible with Print on Demand services ( in fact I have a feeling they often take PDFs, so this could be the same thing). Either way, I agree as generic a solution as possible would be best.

PDF

stdbrouw@groups.drupal.org's picture
  • The proposal above is geared mostly toward the needs of newspapers, but it's interesting to see that people outside of that industry would have use for it as well, and something that I should keep in mind indeed. (PDF's is the format of preference for printers, and I'd be surprised if POD would be different.)
  • I'm wondering, Jamie, if you could describe what you'd need in a bit more detail? Print-friendly pages, but of a bunch of arbitrarily selected documents rather than of single ones?

Could go on top of import/export framework

jonathan.morgan's picture

I posted a proposal and design of an import export framework for summer of code. I think this web-to-print workflow could be implemented using that framework, so you would only have to develop the higher-level processes for pulling together lists of things you want to publish, then you could assign an export type and send them to the export module.

If you have a chance, check out the post entitled "create modular, extensible, API-based import and export framework for drupal" - I think it shows up in the newspapers group, but also is in the SoC 2008 group (I am new to these groups).

It is lower-level than this, but it is designed to abstract out the importing and exporting of data, and I think it might make sense to consider coordinating these two projects.

For PDFs, etc., this import/export framework way of dealing with the nuts and bolts of your export would let you abstract out the part of the process that formats the output, so that you implement the part that deals with setting up the exports, then just assign an output format type from the export framework. This would enable you to add in different formats as modules and then just pick a format type when you set up a given channel (I'd recommend enabling this to store multiple output channels, not just one, so you can output some things to print, but also output some things to blog software, an archive system, etc.). It would also enable you to just change the export type if you have to change pagination systems (or archive systems, or blog software) without having to change any of the other settings for a given channel.

You could consider integrating XSL:FO into your project regardless. It can be a little tough to generate the XSL:FO, but once you have it, you can convert to PDF, printer control languages (these are the native formats that printers speak, like Postscript, which files are converted to when they are printed - when you print a PDF file, for example, it is translated to a printer control language before it is sent to the printer) and many other formats (I know apache's FOP XSL:FO transformer is in good working order for Java - http://xmlgraphics.apache.org/fop/. I don't know if there is a good equivalent for PHP.).

I read your post and that

stdbrouw@groups.drupal.org's picture

I read your post and that looks promising. However, I'm a bit wary of relying on something that, well, doesn't exist yet :-) That said, it would definitely be possible for me to start out with a rudimentary export to plain xml, then dedicate my time to the "pull stuff together" and UI, and develop other output options later in the timeframe of this project. That would allow me to check if your project is already in a usable form by that time and if there is a possibility for using your import/export module. So if our projects get accepted, we should definitely keep in touch.

Anyway, it's all a bit vague at this moment because there are so many ways that the export could be handled (e.g. Views theming; XSLT transformations of a standard XML output; with an independent export module and so on) and I don't know what grndlvl has in mind.

There's various things in

catch's picture

There's various things in the works for D7 for rendering nodes in various formats - so I'd look carefully at those. (Drupal pipes and popups are worth taking a look at to get started).

Something which hooks into views (and supports CCK) is likely to have the most forwards compatibility and wide use. Being able to take a view (or a nodequeue etc.) and send that to print would be pretty slick.

Web Versus Print

Mike Wacker's picture

Sounds like a good idea. Even with our web publication system at The Cornell Daily Sun, I know that The Daily Sun still tends to think in terms of print-to-web in some respects. Here are a couple of challenges I see in terms of going the other way.

  1. How do you specify if content does not appear in the print edition? For example, if for the web edition, I added a YouTube video to my post and put a caption under that, that could not directly be exported to print, and I would probably want to strip the video caption out for the print edition. Another area this is relevant is hyperlinks, thought that can probably be addressed by stripping out the <a> tags. However, the words that were hyperlinked may not make as much sense in the print edition if they lack an accompanying link. Also, I know that Quark likes to do the "fl" character a different way sometimes in the print edition.
  2. I know this may sound a little counter-intuitive, but what about also thinking about the problem the other way around? Sometimes the print edition can put more constraints on the article than the web edition can. There's only so much space on a print page, but there's unlimited space on the web. Going back and forth between web and print would be essential in this scenario. Print design can get pretty tricky, so you simply can't wait until the last moment to send it out to print. Some would even argue that exporting to print as the last step is just as bad as adding the web edition as an afterthought. The design editors would kill me (and rightfully so) if I tried that.

The fundamental idea is solid, but often the web and print world differ in significant and meaningful ways. Going between the two often is not as simple as just exporting some XML.

@mike

stdbrouw@groups.drupal.org's picture

1 > Web-to-print does not mean that your online and offline content should be the same. You can put additional content for the web in different cck fields that you don't include in your print version. (This is how we do it at this moment.) Or, a bit more advanced, you could work with a tag to indicate content that the export should filter out but that is still displayed in the online version. That might be worthwhile to include in this project, so that's a good suggestion. Video captions shouldn't be a problem as the preferred method using Drupal is with a CCK video field rather than inline anyway. Hyperlinks and other tags that you don't need in print don't pose a problem either because you have to allocate tags to styles anyway in DTP-software, so if you don't define a special style for hyperlinks, they are displayed just like regular text. And ligatures (fl, ff, ...) do not need to be specified in the XML, the dtp-software takes care of those.

2 > Exporting to print, in my proposal, isn't the last step (that's just one possible workflow). If you set up an automatic export your print version is in sync with what is online, so the dtp-staff can start working as soon as there is a rough draft of the article available and stop updating the XML when the deadline is near and the final touches need to be made. I'm not sure how that would work for Quark, but in InDesign that's cake. I should look into the functionality that Quark offers, though. The word "export" is perhaps a bit off the mark to describe the full scope of this project, as it seems to imply a one-time last-minute export - what you're talking about.

The main benefit of web-to-print is that you can use a free (yet advanced) content management system that is available from anywhere where there is internet access. In some cases where I've used web-to-print, we didn't even have a public website, and we only used Drupal as an easy-to-use central place where everybody could edit the content that would appear in print. It saves time, and if you do have a public-facing website it's easier because you don't need a second piece of software for print content management.

Web-to-print?

davidw's picture

Did this project ever get off the ground? I am looking to put together something remotely similar, for a commercial printing operation.

I would like to have a PDF "template" with static text and images, and also fields that accept variable data. Then use CCK to create a form to have a user input the data for the variable fields, let the user view the results, and then create a hi-res PDF print ready file.

Any ideas??

Any further development?

chrisdwells's picture

Did this ever get off the ground and, if not, is there any plan or development happening elsewhere? I'm not sure how I could contribute as I'm no PHP developer but I'm an advanced InDesign user, graphic designer and I have some usability and information architecture experience. I'm also able to write good documentation.

I'm interested in helping to develop a web-print/print-web publishing system with a Drupal backend/frontend.

Drupal web2print

Ivo.Radulovski's picture

so whats the state and are there any no-go's?

I want to get involved too!

-----

Drupal Development by Trio Interactive

Me too

canishk's picture

Count me too.

Anish Karim.

any development on this front please

hassanali20's picture

has anyone developed some kind of module or workaround to pull articles from drupal (web) to inDesign (print)

kindly share, if there is something in place.

Newspapers on Drupal

Group organizers

Group categories

Topics - Newspaper on Drupal

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: