Geocluster: Server-side clustering for mapping in Drupal based on Geohash

Events happening in the community are now at Drupal community events on www.drupal.org.
dasjo's picture

hi everybody,

finally, i'm finishing my master thesis on Geocluster, a Drupal 7 module that aims at enabling scalable maps with more than 1,000,000 items or nodes by clustering based on Geohash with MySQL or Apache Solr.

you can go ahead to the module page to check out the module and see if it fits your requirements
http://drupal.org/project/geocluster

in addition, here's the link to the final draft of my master thesis on Geocluster
http://dasjo.at/files/thesis-dabernig.pdf

also check out the poster as a visual summary of Geocluster and the thesis
http://dasjo.at/files/thesis-poster-dabernig.pdf

i'm adding the abstract of my thesis to provide an overview of what Geocluster is:

This thesis investigates the possibility of creating a server-side clustering solution for mapping in Drupal based on Geohash. Maps visualize data in an intuitive way. Performance and readability of digital mapping applications decreases when displaying large amounts of data. Client-side clustering uses JavaScript to group overlapping items, but server-side clustering is needed when too many items slow down processing and create network bottle necks. The main goals are: implement real-time, server-side clustering for up to 1,000,000 items within 1 second and visualize clusters on an interactive map.

Clustering is the task of grouping unlabeled data in an automated way. Algorithms from cluster analysis are researched in order to create an algorithm for server-side clustering with maps. The proposed algorithm uses Geohash for creating a hierarchical spatial index that supports the clustering process. Geohash is a latitude/longitude geocode system based on the Morton order. Coordinates are encoded as string identifiers with a hierarchical spatial structure. The use of a Geohash-based index allows to significantly reduce the time complexity of the real-time clustering process.

Three implementations of the clustering algorithm are realized as the Geocluster module for the free and open source content management system and framework Drupal. The first algorithm implementation based on PHP, Drupal’s scripting language, doesn’t scale well. A second, MySQL-based clustering has been tested to scale up to 100,000 items within one second. Finally, clustering using Apache Solr scales beyond 1,000,000 items and satisfies the main research goal of the thesis.

In addition to performance considerations, visualization techniques for putting clusters on a map are researched and evaluated in an exploratory analysis. Map types as well as cluster visualization techniques are presented. The evaluation classifies the stated techniques for cluster visualization on maps and provides a foundation for evaluating the visual aspects of the Geocluster implementation.

thanks to all the people who have helped me come along this way and hopefully see some of you at our mapping session in portland
http://portland2013.drupal.org/session/should-have-made-left-turn-albuqu...

feel free to provide feedback regarding the Geocluster module and my thesis by leaving a comment or use my contact form in order to get in touch.

greetings, dasjo

Comments

Hello dasjo, Thanks for your

Sinan Erdem's picture

Hello dasjo,

Thanks for your valuable contribution to Drupal mapping. Seeing such scientific work on Drupal is very exciting.

I will be trying your module as soon as I find time. Looks very promising.

One question though. I see

Sinan Erdem's picture

One question though. I see that on the demo map, the clustered markers are evenly placed most of the time. Is it related with the data you generated, or a usual behavior with clustering?

hi, you are right. the even

dasjo's picture

hi,

you are right. the even distribution stems from the fact that the demo site used devel_generate for creating test data and that the clustering algorithm is very basic.

in the thesis, i also include practical demo scenario based on drupaljobs, see the following screenshot:
Only local images are allowed..

it illustrates that the clustering looks more natural when based on real-world data. still, the "viral growth" of the algorithm can be improved upon, i believe. also see this blog post on clustering points for maps:
http://web.archive.org/web/20071121140547/http://trib.tv/tech/clustering...

thanks for the feedback,
regards dasjo

This module is fantastic.

JSCSJSCS's picture

I can't wait to try it on larger data sets.

This may be a chicken and the egg question, but is all the speed gain from your module lost when using modules like IP Geolocation Views and Maps, which looks like it only uses lat&long vice the hash from geofield module field?

http://drupal.org/project/ip_geoloc

James Sinkiewicz
Drupal Site Builder and Generalist
http://MyDrupalJourney.com

Great work!

rdeboer's picture

I wish more theoretical theses came with such a hands-on contribution to the community as well!
Love the one page summary poster, working my way through the thesis proper.
Look forward to seeing you to meeting you at DrupalCon Portland.
Rik

thanks again for the

dasjo's picture

thanks again for the feedback!

i'd like to add a final note from the conclusion section of my thesis:

I have to admit that integrating clustering into a complex stack such as the Drupal mapping stack has its advantages and disadvantages. Geocluster does a decent job at clustering data server-side, but the tight integration into the Drupal stack also comes at the cost of overhead and complex integration code. For a person that has the expertise in writing code, it might make sense to create a custom server-side clustering solution for a specific purpose without depending on a number of separate modules.

see you in portland :)

Only PHP

sanderluis's picture

Hello, I do not use Drupal, but I need that system;
Do you have it in "pure php" + Solr?

PLEASE! :D

Location and Mapping

Group organizers

Group categories

Wiki type

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: