Big Data Drupal & Open PAAS at NYC Camp

Events happening in the community are now at Drupal community events on www.drupal.org.
forestmars's picture
Start: 
2014-04-12 02:30 - 03:30 America/New_York
Organizers: 
Event type: 
Drupalcamp or Regional Summit

In a follow up to his BAD Camp session on BDD (Big Data Drupal!) Niccolo will be presenting at NYC Camp at the United Nations.

Big Data Drupal & Open PAAS

Survey & demo big data, cloud, paas, api integrations of Big Data & Drupal

This is going to be a deep and wide session covering:

  • Big Data Drupal Search with Nutch, Hadoop & Cloudera
  • Cloudera big data stack - Hadoop, Hue, Thrift, Impala, Hbase etc etc
  • Docker Open PAAS - Flynn, Deis
  • Proxmox, Synnefo, Ganeti, Xen Server API
  • Dockering Aegir for Drupal Heaven
  • Packer, Vagrant & Hashicorp
  • Cloud PAAS - Acquia, Pantheon, Rackspace
  • 3Scale UN Data API - Womans Stats
  • Rapidminer, Pentaho, Mulesoft, Talend, Bonita

Already registered on nyccamp.org? Add this session to your schedule here: http://www.nyccamp.org/session/big-data-drupal-open-paas or register to attend NYC Camp

Comments

thanks forest

niccolox's picture

am building out demo open paas now

am expanding from the existing www.BigDataDrupal.org setup (5 node cloudera cluster, 2 solr, 1 aegir boa drupal hosting, 1 million page + solr index)

UC Berkeley, Badcamp 2013, Howto Make a Big Data Drupal Nutch Search with Cloudera Hadoop
United Nations, NYCcamp 2014, Howto Make a Big Data Drupal Open PAAS
possible future talk (maybe Badcamp 2014), Howto Make a Big Data Drupal Data Processing Workflow??

it will consist of (assuming all goes well)

-multi-data center vpn i.e. Canada & California (perhaps more)
-4 bare metal servers
-proxmox cluster of 6 proxmox servers (3 bare-metal, 3 vmware)
-live migration of vm's between California & Canada
-TO DO - High Availability i.e. auto failover (probably wont be completed before NYCcamp)
-approx 50 public dns / static ip VMs
-1 cloudera hadoop/mapreduce big data cluster (5 vm)
-3 aegir boa servers
-2 solr servers
-devshop
-drupalpro

possible includes

drupal site network
-commons
-civicrm
-openacademy
-openatrium
-light
-panopoly
-openscholar
-openoutreach
etc etc

docker/containers
-deis
-flynn
-dokku
-docker

ganeti/kvm
-synnefo

data stores
-ceph
-nfs
-zfs

application servers
-gitlab
-discourse
-wordpress mu
-ghost
-freeipa
-kolab
-edX
-moodle
-spree
-locomotivecms

dev tools, vnc/spice enabled
NOTE: I dont think I'll have sophisticated workflow or data processing working in time for event, its feeling like doing this middleware (old fashioned term?) will be a third talk
-talend
-bonitasoft
-rapidminer
-mulesoft
-pentaho kettle

budget
-300$USD / mth
-effort: 1-2 weeks work? (hard to say, but not as much as I should)

implementation
-I probably will have basic set-ups done of most of above, with some simple spike (end-to-end) or comprehensive if very thin and brittle workflow integrating various aspects and elements of the site network.
-this will be a proof-of-concept

please note, this is about the technology, and not a use-case, I am not working for the US government or Facebook etc, I am an independent grassroots developer and this is an example of what is possible for someone like me with very tight time, money & effort constraints