HOWTO make a news site with Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
nikolai's picture

We're finally getting ready to port our main site to Drupal (so far we've only been dipping our toes with our blogging site). We've been holding back waiting for 5.0.

There seems to be many ways to go about it. It would be a fantastic help for us (and hopefully others) if some of you who has been through the process of building a newspaper site out of Drupal would share some of your insights.
So: How did you do it?

It would be great to hear things like:

    - Which modules do you use?
    - Which content types do you have? Custom made or by CCK?
    - How do you use taxonomy?
    - Do you use one or more themes?
    - How is your workflow when articles are imported from the print edition?
    - How about performance - do you use caching?
    - What about premium content?
    - What do you use for handling advertising?
    - What does your back end look like?
    - How did you build your front page?
    - How do you handle images?
    - How do you handle trolls and spam?
    - Your basic setup: Do you run everything (newspaper, blogs, community etc.)from the same install?

and tons of other stuff I haven't even thought of.

Maybe even this very basic one: Why did you choose to go with Drupal instead of alternatives like Ellington?

It would be great if this thread could become just a collection of case studies. It could be a great way to attract more newspapers to Drupal. I cant wait to contribute with our own case - in hopefully in less than four months!

All the best

Nikolai

Comments

a few thoughts

darthcheeta's picture

i chose drupal because drupal is free, ellington is not. it also has a very big community working on it and is well loved. had i to do it all over again now, i may opt to use joomlah, but it really is six of one, half dozen of the other.

since drupal is free, though there are some trade offs. the biggest is that drupal is a resource hog and in my experience tends to get pretty mucky and leave mysql calls open which eventually clogs up the server... caching or not. we found it was pretty crucial to avoid the poormans cron module, but even with a system crontab, we still get bogged down, which limits our traffic.

While taxonomy management is fantastic, it is also so flexible that it gives a lot of rope to hang users. I am interested in seeing how others manage the creation of sections and combining taxonomies of subcategories into a major category. For instance, when I try to make a news section by combining subcats of international, national, politics, etc using a taxonony\term\1+2+3 it is anything but elegant.

I also have not been a big fan of the epublish module or how inflexible the templates for blogs are. These have been pretty big stumbling blocks for us.

-daj

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

Eh?

yelvington's picture

"tends to get pretty mucky and leave mysql calls open which eventually clogs up the server..." Please explain. Drupal opens only one database connection, in bootstrap. That connection is used in all subsequent database transactions.

You may be having PHP/Apache problems that aren't Drupal's fault, especially if you're using one of the accelerators.

Check also to see that you're not serving Drupal-based 404 pages on non-Drupal paths (such as image directories). That can result in trouble if you have any bad IMG hrefs, for example, in your templates. And check your logs to see if you're getting pounded by botnets. We had to disallow certain types of connections, such as referrers containing the usual spam terminology.

explanation

darthcheeta's picture

some things i can't explain, but this nutshells some of the angst we've experienced. i should note, we're using a fairly tricked out version of 4.6.x and are somewhat wedded to it while we get our heads around the replacement(s) to aggregator2, which is my main means of auto feeding content to the site.

4.6.x shipped with persistent connections off, but even after making those fixes, we're still dealing with mysql congestion that is hard to nail down, holding back our traffic during peak times, and limiting our growth. a huge piece of that for us was poormanscron, which fired on viewed pages and is to be avoided at all costs in any deployment outside of a small traffic vanity site. I'm gathering best practices and lessons learned for hosting, config, etc. and will share whatever knowledge I uncover if it is worth sharing.

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

and another -rather important- thing....

darthcheeta's picture

we also experienced some very strange behavior with how google news bots crawled our site a few months into our first launch, that we have yet to explain with any level of satisfaction. at one point, the bot just quit crawling us, and their response came down to the bots not liking the breadcrumbs that drupal automatically creates at the bottom of index pages. one SEO guru came back at us with a theory that most drupal sites aren't playing in breaking news turf. we're constantly working on this aspect of our business as well, since google news is one of our biggest referrers.

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

One other issue is that

merlinofchaos's picture

One other issue is that google doesn't like the same page with multiple URLs; if you're using aliasing (particularly pathauto) this can burn you when node/298734 is the same URL as articles/20070116/foo_bar_baz.

Another thing that can cause a lot of database load is statistics.module; if you have a lot of posting activity on your site and you use anonymous page caching, you need to set your page cache lifetime up from 0 or you can waste a lot of time clearing the cache too often.

What ways are there to avoid Google duplicate issue?

geoffb's picture

So, is using the rel="nofollow" attribute or restricting node URLs in robots.txt the best way to avoid Google thinking you're serving up duplicate pages if you're using Pathauto?

This module is the best

merlinofchaos's picture

This module is the best solution, I think: http://drupal.org/project/globalredirect

Brilliant.

geoffb's picture

Thanks kindly. It looks to be exactly what we need. I hadn't spotted that module.

Google News Crawl Requires unique number in URL

geoffb's picture

We have experienced the same issue where Google News didn't seem to be happening and news Alerts stopped arriving.

As I just posted on the drupal.org forum, the reason seems to be that according to Google (http://www.google.com/support/news_pub/bin/answer.py?answer=40741), "in order to have your articles crawled by Google News, the URL for each article must contain a unique number consisting of at least three digits"

It seems that, as we are using the pathauto module to give us clean urls, the News Crawl doesn't want to know us.

A reply on the forum suggested we can generate numbers using Pathauto, but we're looking into creating a News Sitemap as per this Google's page: http://www.google.com/support/webmasters/bin/answer.py?answer=42738).

Hope that helps.

Geoff

My 2 cents

merlinofchaos's picture

I wrote this article:

http://www.angrydonuts.com/publishing_articles_a_tutorial

Sometime later I refined some of that into the publishing module:

http://drupal.org/project/publishing

I've been meaning to set up a demo site for this but it's still a good bit of work to collect enough fake content to properly demo it.

Upgrading for 5.X?

femrich's picture

(edit to clarify) Merlin of Chaos, Are you planning on upgrading the publishing module for 5.X?

Thank you,

Frederick Emrich

Yes...

merlinofchaos's picture

...but it is dependent upon workflow module which has not yet been fully updated (unless mfrederickson has made some progress and didn't tell me).

Our case studies

yelvington's picture
  • Which modules do you use?

buddylist, devel, emailpage, event, guestbook (for commenting on profiles), htmlcorrector, image, img_assist, mysite, nodequeue, print, privatemsg, tagadelic, urlfilter, views, webform. OG-related modules on some sites.

  • Which content types do you have? Custom made or by CCK?

The usual stuff, plus a custom content story type and data loader to accommodate NITF-loaded data on SavannahNow, and a custom event calendar that is likely to be replaced by CCK. Going forward, CCK looks very powerful.

  • How do you use taxonomy?

We're using free tagging quite a bit. Some users take to it. Others don't.

  • Do you use one or more themes?

We're using single themes, but some of them have some pretty heavy conditional behaviors.

  • How is your workflow when articles are imported from the print edition?

In Savannah, I believe it's DTI to NITF, then through some scrubbers to fix DTI's crappy NITF, then twinned into Oracle for longterm storage and into Drupal for immediate publication. Nodequeue is used to manage linksets that appear in blocks on highlights pages. I don't think they're using any heavy workflow or doing any significant editing on the Web side unless the stories are chosen to be "enhanced" with multimedia components.

Savannah is stuck in 4.6.x at the moment.

Our other websites aren't using Drupal for news content, only for community content, blogging and profiles. They are all on 4.7.x.

  • How about performance - do you use caching?

Savannah had some serious performance problems before we figured out where we were shooting ourselves in the foot. It runs on a pair of webservers with a separate database server, and it's seriously overpowered. We can get you some data if you're interested. We're "flattening" highlights pages, basically caching complete pages in the filesystem.

Bluffton runs both database and webserver on a low-end AMD pizza box and has very low load unless it's being hit by a botnet attack.

Both sites run APC, which can introduce some odd side effects.

  • What about premium content?

Not doing any.

  • What do you use for handling advertising?

Open AdStream for banners and such; Morris classifieds.

  • What does your back end look like?

I think I covered that.

  • How did you build your front page?

We're big fans of the front page, nodequeue and Views modules.

  • How do you handle images?

Savannah has some custom Flash code, driven by XML files. The rest of our sites offer img_assist to our users, and they really like it.

  • How do you handle trolls and spam?

Muttering and cursing! Most of the spammers don't register, and most of those who do fail to return quickly to actually post their spam, so we keep a close eye on the registrations.

  • Your basic setup: Do you run everything (newspaper, blogs, community etc.)from the same install?

At this point each newspaper has a single Drupal installation. We've discussed using the multisite functionality to cluster multiple small sites on a single server, but we're not doing it.

Node queue

nikolai's picture

Hi Steve - thanks for your reply, it's a fantastic help.

Could you tell a little more about how you use node queue?

Nodequeue

yelvington's picture

Nodequeue is a handy-dandy all-purpose facility for maintaining lists of nodes. These lists potentially can be ordered.

The resulting list (queue) is available to Views, which can be used to create the code necessary to display elements of the queue on a page.

The simplest example can be seen on the homepage of BlufftonToday.com, where there are two featured blog postings, followed by a cluster of five links to secondary blogs.

Each cluster comes from a nodequeue. Any staffer with nodequeue access can promote an item into those queues and/or manage the sort order within the queue. Since the nodequeue tab shows up on the top of a node display (right next to "edit"), it's easy and natural to promote an item when you stumble across it.

With Views, you can do more complex tricks, such as randomly choosing five items from a queue that might contain eight or ten or so.

The bottom line is that newsroom staffers can keep the site up to date without having to know any HTML.

ibm developers page

darthcheeta's picture

this is a pretty good resource if you haven't uncovered it yet:

http://www-128.ibm.com/developerworks/ibm/osource/implement.html

-daj

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

if everyone is thinking alike, chances are no one is thinking.
www.davidandrewjohnson.com

information.dk

newswatch's picture

Nikolai: I like the look of your existing site. It is neat. When are you going to port information.dk to Drupal? And are you changing to 1024*800?

We're almost

nikolai's picture

We're almost there....hopefully we will launch within a few months.

Nice looking and help me out

king0007's picture

Hi Nikolai,

I like the look of your site and I am preparing to make my site more of a news related.
However, I will appreciate if you can help me with the tutorial in setting up my site in Drupal and the modules etc. I am new to Drupal. You can check my site below and I am looking forward to your support and advice.
Thanks,
Kingshuk

http://www.webinfotain.com

drupal.org/information.dk

nikolai's picture

Hi Kingshuk,

We did a writeup on the site, I hope it is helpfull: http://drupal.org/information.dk

Nikolai

Currently in development

Izz ad-Din's picture
  • Which modules do you use?

Core is built off panels, views and CCK. Other modules are used for more specific tasks.

  • Which content types do you have? Custom made or by CCK?

100% CCK

  • How do you use taxonomy?

1 Vocab for categories linked to the menu;

Remainder is mainly CCk Taxonomy.

  • Do you use one or more themes?

1 Main theme and subbthemes.

  • How is your workflow when articles are imported from the print edition?

We do not have a printed edition at the moment, but when we will, printed articles will not be imported from the printed edition.

  • How about performance - do you use caching?

Cannot answer this yet, since we are in development.

  • What about premium content?

Paid content are imo spasms of the ancient regime, why pay for news if you can get it for free from most sites?

  • What do you use for handling advertising?

n/a

  • What does your back end look like?

CivicCRM, Node Form Cols. Still in heavy development.

  • How did you build your front page?

? With CSS ?

  • How do you handle images?

n/a

  • How do you handle trolls and spam?

n/a

  • Your basic setup: Do you run everything (newspaper, blogs, community etc.)from the same install?

Single codebase and database.

to make site online

sarthi's picture

Hi,
i am varsha and was making website in drupal..by mistake the srttings changed and it became offkine ....how to make it online again to proceed..........

/admin/settings/site-maintena

arnieswap's picture

/admin/settings/site-maintenance

and make it Online.

/admin/settings/site-maintena

arnieswap's picture

/admin/settings/site-maintenance

and make it Online.

Newspapers on Drupal

Group organizers

Group categories

Topics - Newspaper on Drupal

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: