I'm upgrading a large Drupal 5 system to run under Drupal 7 on AWS, and am trying to figure out the best practice for setting up the public files that used to be in sites/default/files.
I'm looking at a couple of the modules available for using S3 as the storage back end, and am beginning to think there has to be a better way to do this. That, or those of you who are doing large sites are doing something seriously custom that isn't making its way back into modules like AmazonS3.
Could someone point me to what current "best practice" is? Is S3 the right solution to this problem, and if so, how are you currently setting up Drupal 7 to work with it. If it isn't, what are you doing instead?
Tutorials are very badly needed for the existing modules, if those modules are actually doing the job for Drupal 7. And if those modules are not doing the job, we need new modules.

Comments
subjective
it is kind of subjective.
as a sysadmin i tried getting S3 to be mountable under linux but all the systems i tried were highly unstable and slow. the web team didn't even try getting S3 to work with drupal. instead we built a NAS setup in AWS. using GlusterFS as the Storage setup. this allowed for more flexibility in the storage area. as well as kept the design a little more simple for the drupal install. no need for an extra module that is only used for file translation.
if worried about network speeds use the CDN module configured with Cloudfront.
however cost wise it is A LOT more expensive then S3.
Gluster seems to be the HA way to go
It does sound like the way to do a highly available file directory for Drupal would be via Gluster. I've looked a bit at the docs, and yeah, it's not for the faint of heart. But I'll probably make it a point to learn how to set it up.
My application will tolerate some level of down time (a couple of hours in the course of a year or two would be painful, but tolerable), so we will start out using NFS, most likely with a pull-type CDN in front of it. If the site increases its traffic enough, we'll go from there to some kind of NFS fall-over, or to Gluster, but for now, I think even a rather small NFS server will do us fine.
NFS with Pull CDN
This is the setup we are using. Overall things have worked out quite well and we are serving a lot of traffic over 1400 domains. Haven't had any major issues. If possible try to get NFSv4.1 as this has some advantages over the older versions. We are still running NFSv3 but will be upgrading soon.
What modules are you talking
What modules are you talking about? I'm quite curious.
Acquia Cloud uses GlusterFS over Amazon EBS and barely touches S3 last I knew. Like whytewolf, we tried the trick of mounting S3 under Linux using (e.g.) s3fs and that was an absolute performance disaster. That's not the right way.
I agree that Drupal would be better without the local filesystem - the filesystem is a giant headache in Drupal hosting. I envy the folks whose web nodes share nothing but database connections. Managing GlusterFS is no picnic, especially if you're under load.
I looked at amazon_s3, AmazonS3 and CloudFront
What I really wanted was a drop-in replacement for the sites/default/files directory. I looked into:
CloudFront -- my understanding is that this module (for Drupal 6) staged image files to S3 so CloudFront could actually deliver them to users. I can't speak to how well this would work, because the author found porting the Drupal 6 file code to the Drupal 7 daunting, and a bigger project than he could do himself (which I can totally understand). So there's no Drupal 7 version of this module, even in a dev state AFAICT.
AmazonS3 -- this is mainly a StreamWrapper for S3. I'm not sure what its intended use case is. You can certainly read and write files with the author's streamwrapper implementation. But I tried to get a simple file upload to a node to work, and couldn't get it to work at all. I then ran the code through the debugger a couple of hours, and discovered that while the module can read and write files, it does not mimic directory related functions very well, certainly not well enough to get through Drupal's file uploading routines. I haven't gotten a response from the author, but I believe she doesn't really need this functionality for the kinds of things that she does with the module, so rather than emulate the directory calls in the streamwrapper interface, she elected to return FALSE from these routines to indicate the stream does not support these functions. This might make sense for her particular use case, but it won't work for mine, and it looks like modifying her code to emulate a real file system (which S3 is not -- no directories at all in reality, and I don't think that chmod() would make sense either) would be a large project, especially since the project does not have unit tests -- you'd really need to write tests for this, since regression is a big problem in this kind of case.
amazon_s3 is a file browser and bucket manager for S3. The main author doesn't have the time currently to update it for Drupal 7, and while a couple of folks have taken a stab at helping, the D7 code is dev quality right now. In any case, while this would be a useful utility, it isn't what I'm looking for.
D6 code
These 2 module are for D6 and make running Drupal with S3 actually possible as it defers file creation and caches file info.
http://drupal.org/project/imageinfo_cache - Images
http://drupal.org/project/advagg - CSS/JS
We ended up switching off of S3 for other reasons (it went down more then we liked in 2010) but the above 2 modules did fix the performance issues for us.
There are some core patches for this as well
http://drupal.org/node/818818#comment-6417508 - Race Condition when using file_save_data FILE_EXISTS_REPLACE
http://drupal.org/node/828268#comment-3117100 - Prevent serving JS/CSS files if they have a filesize of zero
http://drupal.org/node/1762772#comment-6413002 - Notice: Trying to get property of non-object in image_style_deliver()
http://drupal.org/node/755586#comment-3733436 - Fallback for CSS/JS aggregation for non-writable directories
S3
I'm also interested in moving to S3 for my back-end - in my case with D6. Right now I'm using EBS and my challenge is that I need to keep expanding the disk as file storage grows.
I originally tried s3fs and other options for mounting it locally but echoing other comments the performance was abysmal. I gave up on that a long time ago.
It seems that the best way at the moment is to consider PHP's stream wrappers and just do reads/writes directly to S3. Here's a module for D7: http://drupal.org/project/amazons3.
Of course, this is not available for D6 so I had been considering a backport of stream wrapper support for D6. Considering that Mikeytown2 has migrated away from S3 that is a concern to me, so now I am reconsidering this approach. Maybe it's best to stick with EBS+LVM and just continuing to expand the volumes.
AmazonS3's stream wrapper is not a drop in replacement
No one has much love for s3fs, but I think that AmazonS3's streamwrapper implementation is too narrow to let you emulate the files/ directory. It simply does not implement some of the directory related calls that Drupal would need working for much of Drupal's code to run and work.
This also raises the question as to how fast the stream wrapper would be, even if it did emulate the directory related calls well enough to "fool" Drupal into working with it. My sense is that S3 calls are surprisingly slow (looks like "cheap", "reliable" and "fast" don't tend to go together), and simply by wrapping the underlying AWS APIs, you'd end up with an implementation that was more stable than s3fs, but probably no faster.
If somebody has had better luck with AmazonS3, by all means chime in. I found it hard enough to configure that I'm not sure I got it right. But I have tested a good piece of the API directly, and at this point, it's clear that some calls that need to work for a full emulation of a files/ directory are returning FALSE from the wrapper, causing the higher level calls to fail.
I use AmazonS3 for image
I use AmazonS3 for image storage. However, uploading images directly to S3 or resizing images from s3 is not practical even with the additional caching added recently by the s3 module.
I upload images in a cron job to s3 and then change file_managed from public:// to s3://
How about GlusterFS to keep
How about GlusterFS to keep the file folders in sync?
http://www.gluster.org/