Exploring large sets of geodata on the web

Events happening in the community are now at Drupal community events on www.drupal.org.
R.J. Steinert's picture

I'm interested in discussing various setups and techniques for exploring large sets of geodata on the web. For starters, I imported the entire dataset of US postal codes into Drupal (as nodes) to see what limitations I could find using the OpenLayers module. The OpenLayers module uses the Views module to assemble datasets and output them to a user's browser where it is then rendered using the OpenLayers javascript mapping library.

I found that the set up was able to display large amounts of data (+12,500 datapoints which populated the entire east coast) but data navigation began to slow dramatically after the display limit exceeded 1000 datapoints.

You can see my entire post titled "Benchmarking the OpenLayers module with Views and OpenLayers.js" at http://rjsteinert.com/node/64

Comments

What you're after is

smk-ka's picture

What you're after is clustering of geo data. I've done this before using a custom implementation used for static image maps, as well as AJAX implementations for Google Maps and a Flash based map (can be seen when clicking on the 2D/3D buttons). Those maps display a few thousand geographical locations (don't have current numbers, sorry), clustered in real-time using a k-means based algorithm. Depending on the size of the data set, it might become necessary to calculate the clusters in advance.

I've not worked with OpenLayers yet, so I can't tell what's possible or not, but am interested in a solution to this problem, too.

-Stefan

Cool example! We could

R.J. Steinert's picture

Cool example! We could implement k-means based algorithm options in the OpenLayers Views plugin and then perform the calculations in a preprocess function.

Yeah, this is more a

zzolo's picture

Yeah, this is more a limitation of the OL library itself. 1000 points is pushing it for sure, and the overall max seems to be about 200-300 depending on the browser. Unfortunately I don't know of any official benchmark that would be a good reference for this.

Overall, if you want that many points/features on a map, you will have to pre render them in a raster image. There are a few ways to do this, but none are easy. You should look into some of the work that @tmcw and Dev Seed are doing.

--
zzolo

In the same boat

sbauch's picture

Trying to run a map with 4000+ markers. @tmcw was kind enough to provide me some explanation as to why my current set-up is not going to get the trick done. The fact that none of the ways to accomplish my goal are easy is certainly a bummer, as I'm not at all an advanced user, but I am definitely committed to diving in and trying to get this figured out. Anybody know of any resources I could use? I'm leaning towards using WFS and geoserver to render points in a raster image. Thoughts?

Raster images are fine if you

R.J. Steinert's picture

Raster images are fine if you don't need points on the map to be interactive. For example, if you wanted a dialog box to pop up when a data point was clicked on so the user could see more info on that datapoint and perhaps click a link to find out more.

Another k-means clustering example

R.J. Steinert's picture

My Friend Max Ogden suggested k-means on my blog post above and gave a nice example of it in use: http://polymaps.org/ex/cluster.html

A possible solution for my requirements

R.J. Steinert's picture

When I started this thread as "Exploring large sets of geodata on the web", I should have mentioned what I consider "Exploring large sets of geodata" to be. My goal is to find a way to display a large data set (+1000 nodes) and have the user still be able to click on each individual geopoint to find out more information on that geopoint. As we've discussed, performance suffers when displaying more than 1000 geopoints using the Open Layers library and two solutions have been proposed:

  1. Display a map of the data to the user that utilizes a k-means clustering method.

  2. Prerender the data as a raster image.

The first method does not meet my requirement of being able to interact with individual datapoints. This is definitely true in the example that Max Ogden brought forth, but the example smk-ka brought forth (click on 2d) looks more promising as the k-means cluster display looks like it is recalculated at each zoom level. This could perhaps at some point display raw data according to some "safe to display" algorithm.

The second method could perhaps display an unlimited amount of datapoints but lacks the ability to interact with the data on the map.

A possible solution for my requirements might go as follows:

Create a map with two 'modes' (or perhaps displays), the default display is a k-means clustering display of the data, and the second display is the geopoints laid out individually. The user begins their journey by only seeing the k-means clusters. To drill down into the data they select a circle tool to draw a circle where they would like to see the datapoints displayed (a proximity search). Before the geopoints are laid out on the map, the system tells the user how many geopoints would be rendered in the proximity they have defined. If the datapoints are +1000 the user then has the opportunity to draw a smaller circle, the system returns a new count, and so on and so forth until the user has found a selection with a safe amount of data in it.

Any thoughts?

I could start on this route by adding a count button to the OL Proximity module's exposed Views Filter but the proximities selected will be rather blind until it is paired with a k-means cluster display.

The first method, k-means

tmcw's picture

The first method, k-means clustering, could easily be modified to allow per-point interaction; the examples I've seen all are implementations from the ground up to be quick and not generally to allow some kind of interaction. With some Javascript knowledge and time, it'd be possible to, say, ungroup items when clicked or display grouped data. how to mix this with Drupal is unclear. It'll probably have to hacked-in unless you have a lot of time. The GeoJSON patch to OpenLayers would be useful.

A hybrid method, which we use on pakistan survey and a few other sites, allows you to do interaction with vector points. It's very fast and would scale - with interaction - to thousands of points. However, installing the stack (which consists of the stylewriter module, openlayers with geojson patch, and tilelive server) requires a high level of expertise (you'll need to compile mapnik from scratch) and can't be done on shared servers.

Your Pakistan Survey example

R.J. Steinert's picture

Your Pakistan Survey example is nice. I found an example of k-means clustering that breaks apart nicely at http://haiti.ushahidi.com/. It's an installation of the Open Source Ushahidi project. I'm hoping to have time to dive into the Ushahidi code to see if I can't figure out a way to implement similar map functionality in Drupal.

I just saw this screenshot

R.J. Steinert's picture

I just saw this screenshot floating around the twittersphere. Perhaps our prayers have been answered? :) http://www.flickr.com/photos/developmentseed/5494703217/sizes/l/in/photo...

Facing the same problem

davidhk's picture

I've been putting off looking at this, but this looks like a good chance to join in.

We're currently at around 1,300 markers, and at current rates add a few hundred a year. Our 'map of all nodes' is already too slow to be useable.

I made a test map with several layers, each with different numbers of markers. You can turn off/on different layers to get a different total number of markers, and you can get a feel for performance. You can see it at http://gwulo.com/oltest

Some observations:

  1. We're sending 1,925 points to the browser. Building the page (ie SQL query, Views, and time to send HTML to browser) is slower than I'd like, but still bearable.

  2. The OL map is happily able to handle the 1,925 points, as long as no layers are displayed. Displaying the marker layer causes the perfomance hit.

  3. On my PC, FF 3.6.12 is a bit jumpy, but still useable with 1000 markers. With IE 8, it's already struggling with 250 markers.

  4. There's also a noticeable difference between number of markers in the layer, and number of markers currently visible. eg using IE with the default map, if I enable the 1,000-marker layer there's a long wait for it to construct and draw the markers, and any movement after that is slow. BUT, if instead I zoom right in first, and then enable the 1,000-marker layer, it draws it quickly and moves smoothly. Of course if you then zoom out it becomes progressively slower as more markers come into view.


Some of the maps we'll build let the user perform queries, so there's no way to know at design-time how many markers will be displayed. I'd like the map to be able to handle any number. (in practice I don't expect we'll go past 2,500 for a couple of years, so that's my initial goal). I don't have any current need to worry about counting the total number of markers across several overlay layers, I think a solution that applies on a per-layer basis should be ok.

That means there will need to be a limit, let's say 200, defined by the admin. If there are < 200 to be displayed, carry on as normal. If there are more we need to switch to some other display, and then think of a way to reduce the number of markers as we zoom in.

The other display could be:
- No markers, and just an alert / overlay message: 'You are trying to display 1,300 markers, but maximum 200 supported. Please zoom in.'
- Show first 200 markers and an alert / overlay message: 'Max 200 markers displayed. Another 1,110 markers cannot be shown. Please zoom in.'
- Use clustering to combine markers together, and display a smaller number (< 200) of cluster markers instead.
- Other?

As we zoom in, there will be less markers displayed. Will we be monitoring:
- The number of markers in the layer? If this is going to change, it suggests there is some sort of Ajax-type solution where we're contacting the server to re-build the list of points each time the user zooms / pans.
- The number of markers currently displayed by OL. That's much quicker if it's possible.

As the number of points in the database grows, there will be a bottleneck at the SQL / Views / HTML level, again requiring an ajaxy solution.

So every time the map is redrawn we'd want to say 'how many points are on display? If < limit then show as normal else show other in other format (probably clusters)'

@rjstatic, is this the same type of solution you're looking for/working on? I'd struggle to start coding this from scratch, but can help test & debug if you get any prototype going.

Regards, David

Built-in clustering seems ok

davidhk's picture

I've upgraded to the current dev version, enabled the 'cluster' behaviour for the 1,000-feature layer, and performance is now ok in IE. There are some visual aspects to work on, as currently an individual feature and a cluster both use the same icon. But that's fixable, and performance should be ok for now.

@rjstatic, have you found any problems with the built-in clustering that makes you to look for other solutions?

regards, David

Clustering only part of the answer

davidhk's picture

Well, back to the drawing board.

Clustering makes performance a lot better when zoomed out. But, when zoomed right in it's the clustering code that becomes the new bottleneck.

Here are times (actually just me counting in my head, but good enough!) for a map to refresh after making the stated changes. Using IE 8.

Each line shows two times. The first is from a map with cluster set at 30pixel radius. The second is from a map with no clustering. Both maps have a layer showing 1,000 (actually 988) locations.

With all layers disabled, zoom right in to zoom level 20
enable 1K layer: < Alert: stop running this script?>, 1
Zoom out to 19: < Alert: stop running this script?>, 1
Zoom out to 18: < Alert: stop running this script?>, 1
Zoom out to 17: < Alert: stop running this script?>, 3
Zoom out to 16: 10, 7
Zoom out to 15: 11, 12
Zoom out to 14: 11, 17
Zoom out to 13: 10, 37
Zoom out to 12: 6, 36
Zoom out to 11: 3, 46

At high zoom, non-clustered is very fast, but clustering causes disconcerting alerts to the user.
From zoom level 13 and down, clustering gets faster, while non-clustered rapidly becomes unusable.

The quick solution is to turn off clustering above certain zoom levels. By the time we're zoomed in that far not many of the points are clustered anyway. Has anyone else worked out how to do this in code?

regards, David

PS Something else I found interesting is this clustering example on the Openlayers site:
http://openlayers.org/dev/examples/strategy-cluster-extended.html
Not for clustering, but for the fact it shows 700 points yet performance is still ok in IE. I wonder what's different about it?

More on clustering

davidhk's picture

Here's a map with around 1200 markers and clustering, with performance that is faster than in the previous comment, and good enough for now:
http://gwulo.com/place-map-search-OL

I've tweaked some of the OL clustering code to make it faster, and also added a hard-coded limit so that at high levels of zoom it disables clustering.

As the number of markers grows, this will also eventually get too slow. Then I think I can get another improvement by providing the list of markers to the javascript already sorted by latitude. That should cut the number of checks needed when building clusters. It would add some extra work to the SQL query, but reduce the load on the javascript, and overall should be significantly faster.

Kinda late in the thread...

arpieb's picture

I don't know if this would help anyone, but have you looked into the WFS module? It works in conjunction with GeoServer (a Java servlet application) to render your point data into raster tiles, has tile caching built in, and integrates a virtual clickable feature layer into OpenLayers so you can still perform feature-level lookups for your map data.

I haven't used the WFS module extensively yet, but am finishing up a 177-layer GeoServer implementation running with OpenLayers right now - and it's damn impressive. the WFS module provides a source for GeoServer's raster and tile-management system, and if I recall it uses Views to generate the data that GeoServer needs to build its map tiles.

Location and Mapping

Group organizers

Group categories

Wiki type

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: