Scalable file upload backend?

Events happening in the community are now at Drupal community events on www.drupal.org.
BartVB's picture

I'm in the process of setting up Drupal as the CMS for a large community site. This is is going to get a fair amount of image/video uploads which are put into /sites/default/files/ This works well for a few hundred to a few thousand files but after that performance and maintainability starts to degrade.

So I'm wondering how you have solved this problem. Are you using a module (which one?) to put files in a directory tree like /files/0/4/04ac8e2683d.jpg ? Are you using MogileFS, GlusterFS (Acquia seems to be using that) or something else? How do you deal with backups? How do you deal with replication?

MogileFS looks promising, built in replication, built in high availability, multiple 'file classes', compression, striping, checksums, everything written in Perl, no need for NFS. Main disadvantage is that it seems to require quite a few modifications to Drupal because it doesn't expose a 'normal' filesystem, it's more like a webservice although that can be solved with the Fuse adapter (but that one doesn't seem to be too stable).

Comments

Paths

...

BartVB's picture

Thanks! Hadn't found that one yet :) Not sure if it can be used to prefix the filename with a 'segmentation path' like /0/d/ but I guess it shouldn't be too hard to extend the module if this isn't possible out of the box.

FileFramework

rjbrown99's picture

Have a look at:
http://drupal.org/project/fileframework
http://drupal.org/project/rdf

RDF specifically may be something you like for this purpose.

...

BartVB's picture

These modules mainly seem to work with metadata and the semantic web, they are not solutions to scalability problems, or are they?

Im looking for exactly the

oxford-dev's picture

Im looking for exactly the same solution, I would like to use a seperate file server where files are uploaded to and served from leaving the web servers only having to serve the web pages.

I've been searching for quite a while but cannot seem to find a real solution yet.

The added benefit of having a seperate file server is that you can then start adding extra web servers and uploads will all go to a single location.

Our firm is also looking for

jason ruyle's picture

Our firm is also looking for a better solution, but right now the way we handle things is:

Smaller client side items are uploaded to normal webserver.
We use cloudfront to serve them through Amazon S3.

For our content (its a photographery site) we actually process our photos, send them to amazon s3, and create references to those links instead of our local system. Usually this is done through the tpl files. Then we just do something like:

<?php print $node->content['field_file']['0']['filename']; ?>

So our files are served from S3. We then clean out our web server periodically of the older files that we already know are on S3.

Our servers are on "the cloud"

Same issue. I know it's been

jordanmagnuson's picture

Same issue. I know it's been a while, but has anyone made any progress on this? Pushing files to S3 isn't ridiculously difficult... but how do you handle tens or hundreds of simultaneous file uploads?

Update

BartVB's picture

I've implemented Drupal on my site and I'm using a modified version of http://drupal.org/project/hash_wrapper for file-uploads. Currently I only have approximately 200GB of uploads but I'm looking into distributed filesystems like MogileFS.

Don't really like the Amazon like cloud solutions :)

Amazon EBS

jordanmagnuson's picture

Does anyone know whether/how Amazon EBS handles multiple simultaneous uploads/downloads?

Looking for more information

aczietlow's picture

We are also looking for a backend solution to handle the fiile system. Has anyone successfully deployed MobileFS, GlusterFS. If so did you find the results that you were looking for?

NFS

mikeytown2's picture

We are using NFS and have around 3 TB of files hosted across 6 servers (3x2). Something like http://drupal.org/project/imageinfo_cache (6.x) is key. Something like that for 7.x doesn't exist yet. We where using S3 but switched to NFS for performance reasons.

We run this every week to free up space. find /mnt/DrupalNFS3/s3cache/hlocal/sites/*/files/imagecache/*/ -iname "*.jpg" -type f -atime +90 -exec rm '{}' \;
We do keep atime if your wondering.

Is drupal really up for the

gateway69's picture

Is drupal really up for the challenge for a highly scalable image server, for instance drupal is still using mysql and has a limitations of how many signed or unsigned nodes you can have. Instagram recently had 1 million file uploads on the turkey day, with that type of load you will run out of node id's quickly. I have been struggling with using Drupal for a similar project we have in the works.

Using something like drupal + services + s3 seems very reasonable but the bigger worry is the amount of nodes that are created, esp if you add things like ratings, comments etc.. also I dont think services right now supports raw image upload, so everything has to be based 64 encoded (granted im thinking pushing binary data from a mobile device).

anyhow just random thoughts, still struggling with if to use drupal, vs building a custom app with node js, couchbase, s3 ..

Is drupal really up for the

gateway69's picture

Is drupal really up for the challenge for a highly scalable image server, for instance drupal is still using mysql and has a limitations of how many signed or unsigned nodes you can have. Instagram recently had 1 million file uploads on the turkey day, with that type of load you will run out of node id's quickly. I have been struggling with using Drupal for a similar project we have in the works.

Using something like drupal + services + s3 seems very reasonable but the bigger worry is the amount of nodes that are created, esp if you add things like ratings, comments etc.. also I dont think services right now supports raw image upload, so everything has to be based 64 encoded (granted im thinking pushing binary data from a mobile device).

anyhow just random thoughts, still struggling with if to use drupal, vs building a custom app with node js, couchbase, s3 ..

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: