hi everybody,
finally, i'm finishing my master thesis on Geocluster, a Drupal 7 module that aims at enabling scalable maps with more than 1,000,000 items or nodes by clustering based on Geohash with MySQL or Apache Solr.
you can go ahead to the module page to check out the module and see if it fits your requirements
http://drupal.org/project/geocluster
in addition, here's the link to the final draft of my master thesis on Geocluster
http://dasjo.at/files/thesis-dabernig.pdf
also check out the poster as a visual summary of Geocluster and the thesis
http://dasjo.at/files/thesis-poster-dabernig.pdf
i'm adding the abstract of my thesis to provide an overview of what Geocluster is:
This thesis investigates the possibility of creating a server-side clustering solution for mapping in Drupal based on Geohash. Maps visualize data in an intuitive way. Performance and readability of digital mapping applications decreases when displaying large amounts of data. Client-side clustering uses JavaScript to group overlapping items, but server-side clustering is needed when too many items slow down processing and create network bottle necks. The main goals are: implement real-time, server-side clustering for up to 1,000,000 items within 1 second and visualize clusters on an interactive map.
Clustering is the task of grouping unlabeled data in an automated way. Algorithms from cluster analysis are researched in order to create an algorithm for server-side clustering with maps. The proposed algorithm uses Geohash for creating a hierarchical spatial index that supports the clustering process. Geohash is a latitude/longitude geocode system based on the Morton order. Coordinates are encoded as string identifiers with a hierarchical spatial structure. The use of a Geohash-based index allows to significantly reduce the time complexity of the real-time clustering process.
Three implementations of the clustering algorithm are realized as the Geocluster module for the free and open source content management system and framework Drupal. The first algorithm implementation based on PHP, Drupal’s scripting language, doesn’t scale well. A second, MySQL-based clustering has been tested to scale up to 100,000 items within one second. Finally, clustering using Apache Solr scales beyond 1,000,000 items and satisfies the main research goal of the thesis.
In addition to performance considerations, visualization techniques for putting clusters on a map are researched and evaluated in an exploratory analysis. Map types as well as cluster visualization techniques are presented. The evaluation classifies the stated techniques for cluster visualization on maps and provides a foundation for evaluating the visual aspects of the Geocluster implementation.
thanks to all the people who have helped me come along this way and hopefully see some of you at our mapping session in portland
http://portland2013.drupal.org/session/should-have-made-left-turn-albuqu...
feel free to provide feedback regarding the Geocluster module and my thesis by leaving a comment or use my contact form in order to get in touch.
greetings, dasjo

Comments
Hello dasjo, Thanks for your
Hello dasjo,
Thanks for your valuable contribution to Drupal mapping. Seeing such scientific work on Drupal is very exciting.
I will be trying your module as soon as I find time. Looks very promising.
One question though. I see
One question though. I see that on the demo map, the clustered markers are evenly placed most of the time. Is it related with the data you generated, or a usual behavior with clustering?
hi, you are right. the even
hi,
you are right. the even distribution stems from the fact that the demo site used devel_generate for creating test data and that the clustering algorithm is very basic.
in the thesis, i also include practical demo scenario based on drupaljobs, see the following screenshot:
.
it illustrates that the clustering looks more natural when based on real-world data. still, the "viral growth" of the algorithm can be improved upon, i believe. also see this blog post on clustering points for maps:
http://web.archive.org/web/20071121140547/http://trib.tv/tech/clustering...
thanks for the feedback,
regards dasjo
This module is fantastic.
I can't wait to try it on larger data sets.
This may be a chicken and the egg question, but is all the speed gain from your module lost when using modules like IP Geolocation Views and Maps, which looks like it only uses lat&long vice the hash from geofield module field?
http://drupal.org/project/ip_geoloc
James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com
Great work!
I wish more theoretical theses came with such a hands-on contribution to the community as well!
Love the one page summary poster, working my way through the thesis proper.
Look forward to seeing you to meeting you at DrupalCon Portland.
Rik
thanks again for the
thanks again for the feedback!
i'd like to add a final note from the conclusion section of my thesis:
see you in portland :)
Only PHP
Hello, I do not use Drupal, but I need that system;
Do you have it in "pure php" + Solr?
PLEASE! :D