Posted by forestmars on March 27, 2014 at 4:03am
Start:
2014-04-12 02:30 - 03:30 America/New_York Organizers:
Event type:
Drupalcamp or Regional Summit
In a follow up to his BAD Camp session on BDD (Big Data Drupal!) Niccolo will be presenting at NYC Camp at the United Nations.
Big Data Drupal & Open PAAS
Survey & demo big data, cloud, paas, api integrations of Big Data & Drupal
This is going to be a deep and wide session covering:
- Big Data Drupal Search with Nutch, Hadoop & Cloudera
- Cloudera big data stack - Hadoop, Hue, Thrift, Impala, Hbase etc etc
- Docker Open PAAS - Flynn, Deis
- Proxmox, Synnefo, Ganeti, Xen Server API
- Dockering Aegir for Drupal Heaven
- Packer, Vagrant & Hashicorp
- Cloud PAAS - Acquia, Pantheon, Rackspace
- 3Scale UN Data API - Womans Stats
- Rapidminer, Pentaho, Mulesoft, Talend, Bonita
Already registered on nyccamp.org? Add this session to your schedule here: http://www.nyccamp.org/session/big-data-drupal-open-paas or register to attend NYC Camp

Comments
thanks forest
am building out demo open paas now
am expanding from the existing www.BigDataDrupal.org setup (5 node cloudera cluster, 2 solr, 1 aegir boa drupal hosting, 1 million page + solr index)
UC Berkeley, Badcamp 2013, Howto Make a Big Data Drupal Nutch Search with Cloudera Hadoop
United Nations, NYCcamp 2014, Howto Make a Big Data Drupal Open PAAS
possible future talk (maybe Badcamp 2014), Howto Make a Big Data Drupal Data Processing Workflow??
it will consist of (assuming all goes well)
-multi-data center vpn i.e. Canada & California (perhaps more)
-4 bare metal servers
-proxmox cluster of 6 proxmox servers (3 bare-metal, 3 vmware)
-live migration of vm's between California & Canada
-TO DO - High Availability i.e. auto failover (probably wont be completed before NYCcamp)
-approx 50 public dns / static ip VMs
-1 cloudera hadoop/mapreduce big data cluster (5 vm)
-3 aegir boa servers
-2 solr servers
-devshop
-drupalpro
possible includes
drupal site network
-commons
-civicrm
-openacademy
-openatrium
-light
-panopoly
-openscholar
-openoutreach
etc etc
docker/containers
-deis
-flynn
-dokku
-docker
ganeti/kvm
-synnefo
data stores
-ceph
-nfs
-zfs
application servers
-gitlab
-discourse
-wordpress mu
-ghost
-freeipa
-kolab
-edX
-moodle
-spree
-locomotivecms
dev tools, vnc/spice enabled
NOTE: I dont think I'll have sophisticated workflow or data processing working in time for event, its feeling like doing this middleware (old fashioned term?) will be a third talk
-talend
-bonitasoft
-rapidminer
-mulesoft
-pentaho kettle
budget
-300$USD / mth
-effort: 1-2 weeks work? (hard to say, but not as much as I should)
implementation
-I probably will have basic set-ups done of most of above, with some simple spike (end-to-end) or comprehensive if very thin and brittle workflow integrating various aspects and elements of the site network.
-this will be a proof-of-concept
please note, this is about the technology, and not a use-case, I am not working for the US government or Facebook etc, I am an independent grassroots developer and this is an example of what is possible for someone like me with very tight time, money & effort constraints