Amazon DynamoDB

Events happening in the community are now at Drupal community events on www.drupal.org.
rjbrown99's picture

Interesting announcement from Amazon today: managed NoSQL via their DynamoDB: http://aws.amazon.com/dynamodb

More details:
http://allthingsdistributed.com/2012/01/amazon-dynamodb.html

The interesting part of this is the data being stored on SSD, low latency, and the ability to pre-provision capacity and usage. My initial thought about this as it applies to Drupal would be along the lines of the mongodb project. IE, a Drupal module and API to enable certain tables to be stored and queried from a DynamoDB database.

I'm wondering specifically how well this would work for cache tables. Yes you can get memcache as a service but it's pretty expensive for what you get and you are charged hourly (smallest on-demand instance would be $69/mo and that's just for the cache). With DynamoDB, it could be a very low cost way of scaling out caching - especially for sites that are already using Amazon for hosting Drupal.

I'd be interested in hearing thoughts about this. I'm not planning on doing it, just throwing out an idea.

Comments

Cheaper memcached

1kenthomas's picture

Just a note-- it's far cheaper to set up memcached on your own instance. I have a few clients I run on micro instances, and I sop up any extra memory there.

~kwt

Yes

rjbrown99's picture

Yes, but you are limited by physical memory on your instance whereas this scales however large or small it needs to be. Latency would certainly be worse than local memory but likely better than EBS or database storage.

I also was curious about this

gateway69's picture

I also was curious about this announcement and what could be offloaded to a nosql dynamodb. I hear and haven't run into this myself yet that with a lot of concurrent users the sessions table can really slow things down with Drupal (we are talking lots of users) , some people store this data in memcached, but if your server goes down or something happens it can be a bit more volital than having it in another db thats highly optimized.

So far we have stayed away from RDS due to the slowness that people are reporting but deff being able to off load things to nosql quick node thats always their would be nice..

only for a massive scale of one/two node types

doublejosh's picture

This would only be a good idea if you have a massive scale of one/two node types or perhaps users.
Then a new instance for each thing you wanted to store.

For general avoidance of mySQL (and db churn) for cache table in increase scalability I agree with 1kenthomas, go with memCache. It has easy integration, more momentum and works great for normal uses.

Also you don't want to offload cache outside your close network (data center) because of all the calls within the build.

General NoSQL thoughts

rjbrown99's picture

Damien Tournoud has some interesting ideas published here for NoSQL in general:
http://drupal.stackexchange.com/questions/991/nosql-vs-other-sql-drupal-...

I imagine all of these would apply to DynamoDB as well. I wonder if the mongodb module could be refactored to be 'pluggable' whereby you could change the NoSQL backend.

Here's what Damian said via that link:

MongoDB can be used to store most or all your entities into fast, document-oriented storage. This type of storage scales way better then the standard SQL based storage we have in Drupal core (which is based on a "one table per field" schema).

In the current state of Drupal 7, you would have:

  • The base table of the entity stored on SQL (ie. the users table, the node table, etc.)
  • All fields stored in SQL
  • The properties of the entities from their base tables duplicated in MongoDB

This allows fast querying on the entities on MongoDB, and the ability to add complex indexes that no Opensource SQL database support (including indexes across tables). At the same time, you don't lose interoperability because the base table of the entity is still stored in SQL and can thus be joined by modules that are still SQL-only (like Flag).

This type of fast querying is available thanks to the EntityFieldQuery mechanism, a way to build queries on entities, their properties and their fields in an abstract manner. The default implementation in core translate those queries to SQL, but the MongoDB module has a full-featured implementation that can satisfy those queries from MongoDB directly.

Thanks to the EntityFieldQuery backend for Views, you can easily leverage this power, by using the tools you are used to. The only downside is that relationships are not supported (but in practice you rarely need them anyway - and this can be worked-around by pushing additional data into the entity object and add exposing them as additional properties of the entity).

In a nutshell, as soon as query performance is a problem on your project, which happens as soon as you have a significant dataset (let's say starting at a few tenth of thousands of entities on a given entity type), MongoDB is a net gain for very very few drawbacks. Highly recommended.

Can you convert that into

gateway69's picture

Can you convert that into english please :) j/k

I always had questions about nosql/mongoDB however not very skilled in it, I picked up a book the other day tho to get up to speed. My question is what from drupal can be offloaded to a nosql type of db.. for instances in cck when you create a new content type you get content_type_yourcontent table, which when a node of that type is created updates that table, is their any way to offload that content_type_table to a much faster nosql system.

Also Im guessing but still everything that goes though the LAMP stack, has to talk to apache, use php to communicate with mysql, use up memory for each call, return the data, then do it all over again. Of course then this comes into tweaking how many concurrent users can connect though this flow. Anyhow just rambling and hoping to find some good in between for our game server we are working on since we expect a few million users daily.

There's a Drupal module for

Jamie Holly's picture

There's a Drupal module for MongoDB storage to give you an idea:

http://drupal.org/project/mongodb

To get it working in D6 you would have to do some custom patching to CCK.

You do still have to make sure that apache is set to the maximum connections that everything can handle. That usually takes some time to get down as there are so many unknown variables at first - namely your average user activity. Having good monitoring of everything going on and keeping an eye on it is the key to getting everything tuned to perfect harmony.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Another idea...

rjbrown99's picture

Another potential use, which is probably much better than trying to use it as a memory cache: materialized views.

Details:
http://groups.drupal.org/node/17644
http://fourkitchens.com/tags/databases/materialized-views
http://drupal.org/project/mv

You can read more about how it works, but generally the idea is to try to eliminate tons of SQL JOINs by combining your content from different tables into a single table, and then querying that single table. My view of it is similar to that of something like Apache Solr - it indexes a bunch of data (on cron run or whenever) and then makes it searchable with good performance. If DynamoDB were used for MV tables, that would also move off any of the INSERT or UPDATE operations to maintain the MV table index. It could be an easier, cheaper way to scale out MySQL on Amazon via augmentation.

AFAIK DynamoDB can deal with

aries's picture

AFAIK DynamoDB can deal with 64K values only. It's too small, many cached objects might be much bigger than this.

If you want a out-of-the-box replacement of Memcache, try CouchDB, which understands the memcached protocol by default.

Aries

Current DynamoDB situation

mikat's picture

Currently the DynamoDB supports 400 KB item size, link. Also the Drupal DynamoDB module compresses the data before storing it.

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: