Addresses 2.x Development - Geonames requirement

Events happening in the community are now at Drupal community events on www.drupal.org.
codycraven's picture

I am currently rebuilding Addresses for a 2.x release. As part of this I want to get away from using static country/province files as they easily go out of date and do not provide extensibility in data.

I am planning to overcome this hurdle by making Geonames a dependency of Addresses for 2.x. I know the licensing may not be ideal as it would require non-commercial users to provide links to Geonames, however I would rather have quality at a cost than garbage for free.

My reason for creating this discussion is to receive any feedback on this decision, pro or con. Please let me know what you think.

Comments

Talking details...

dahacouk's picture

www.geonames.org seems a pretty wholesome choice for the country list. If admins want to use another source could you make the pointer be configurable so that they could choose where they get their country/province data from? I mean rather than being hard-coded in. I guess that's an advanced option.

And will admins be able to edit the country/province data once downloaded locally? I would say as an admin I'd like to be in control of this data ultimately.

And surely the issue of attribution only comes into play when one is distributing the source data, right? It's OK to use it without having to put notices everywhere, right? Not sure actually...

Cheers Daniel

Daniel, As far as

codycraven's picture

Daniel,

As far as override-able data sources that sounds like a fantastic idea. There are some technical hurdles to the approach, but I think it would be well worth it, if it is not insane from a development and implementation standpoint.

The issue with allowing admins to override their country/province data fetched from another source is that admins often set something up once and forget about it. This means that if a site has a large number of countries supported it is very likely that their available countries/provinces would become irrelevant in certain instances where changes occur. Maybe supporting a display name override would be the best solution. That way if a province merges, ceases to exist, etc the new data will be available from the data source and the admin will only be responsible for setting a new name override if necessary.

The Geonames license is http://creativecommons.org/licenses/by/3.0/ - under this I am unsure if simply listing countries/provinces would qualify as needing to be attributed or not.

Thanks for your input Daniel

Hi, I'd suggest that having a

alarcombe's picture

Hi,

I'd suggest that having a pluggable architecture for multiple back-end data sources is pretty much a baseline requirement for this. eg I may choose to use Yahoo's Geoplanet data rather than Geonames. I'd further this by suggesting that this should be configurable on a per-country basis (eg my local mapping agency may be more accurate/produce more frequent updates) than Geonames, but Geonames may suffice for the rest-of-the-world.

Cheers,

Andrew

Talking more details...

dahacouk's picture

Yes, a "display name override" would do I guess as you want the underlying data to be fresh. But I have a concern with all of this and it's all to do with de-coupling data from the application - using more abstraction and services via Drupal wide APIs. For instance, country/province data (and address format for that matter) could be used by other modules so it makes sense that there is an idea of a master list within a given Drupal install that all modules can refer to. It makes sense to have it as an API. Or is Addresses an API?

I'm not a real coder so I'm talking at an overview level and not at code level. I'd generally like to see far more more Drupal wide services and less one-stop-show contrib modules. To be honest we need as much done on APIs and services as we do contrib modules. The only problem is that APIs don't have the coder kudos! :-( Sorry if I'm ranting.

Anyway, would be good to choose the external source of the data. Or point to a local data set in a specified standard format. Soon, in D7 hopefully, this ingesting of country/province data could be done via RDF seeing as that's built in. But then it would be good to be able to override data. My concern is that you would be polling the Geonames server live every time you need to use the data. If not you've got a local cache. Where and what form is it stored? CCK/fields? How will you sync this local cache if there is a change of data at the Geonames server?

Daniel, For the Geonames data

codycraven's picture

Daniel,

For the Geonames data I would of course be making use of the API from the Geonames module.

My plan is for Addresses 2.x to be an API with configuration screens. It could serve as a master list for other modules, in which admins select their data source and are then able to override whatever they wish in the UI, this customized data would then be available for use by other modules. Currently Addresses make no use of data on it's own and requires the install of either Addresses CCK or Addresses User - I plan to keep this separation and make Addresses 2 more flexible for use by other modules.

From the standpoint of caching data, yes I plan to have a local cache in Addresses which stores names (United States, Cananda, California, London, etc) along with the unique identifier for that item relating it to the datasource. Through cron or manually initiated (I'm not sure which yet) this cache of existing data could be polled against the datasource so as to maintain the display of existing input addresses and support new changes.

For CCK (Fields when D7 release is made) I am toying with the idea of storing the actual data within the Addresses module and letting the integrating module (addresses_cck, addresses_user, etc) simply act as the UI, output, data facilitator and store whatever differing information it requires in it's module providing it with a unique identifier for Addresses stored.

Addresses currently supports formats. I plan to make the formats work better by parsing the format and leaving out items such as a trailing comma if a field is left blank. In addition I am working out an API that will allow other modules to provide their own custom formats so that they can be provided in XML, RDF, Adr, or whatever the implementing module provides.

Wow!

dahacouk's picture

Great stuff on all of this and on being an API already! ;-)

And great you're storing the datasource per item so that alarcombe's idea of choosing different data sources per country is doable.

I would enable the admin to choose if the update is via cron or manually initiated.

I question "letting the integrating module... store whatever differing information it requires" in that I may have 10 integrating modules and want to change "London City of" to "London". One way I'd have to do it 10 times in each individual integrating module and the other way I'd only make the change once in Addresses 2.x. Again I'd say implement both ways of working so that the admin can decide what's best in their particular case. So, it's another level of abstraction away from the original source but that's OK and, in fact, good.

Excellent on formats.

Excellent direction.

Good luck...

FYI, Geonames module doesn't

lyricnz's picture

FYI, Geonames module doesn't currently support saving province data - only a list of countries, as listed in countryInfo.txt

You can query the list of regions within a country (to various levels), and this is cached, but not saved explicitly.

Are you referring to the list of country+regions listed at admin1Codes.txt or admin2Codes.txt? If so, please file a feature-request against geonames module, describing what you need, and how you would want to query it.

Simon Roberts
Taniwha Solutions

Simon, Addresses currently

codycraven's picture

Simon,

Addresses currently functions based on the country selected, where states/provinces are just options that users can select for a specific addresses.

My train of thought is that this data will be handled by the data provider modules. In the case of GeoNames, the state/province data would be fetched through the GeoNames children query your module exposes. The data caching would be based on whatever the GeoNames module is set to cache the data for.

I am planning to write the data provider modules such that: if a data resource is no longer available from the data source (in this case GeoNames) then a cached value is used from when the resource was initially available.

If you think it would be better to store this as metadata, as you do countryInfo.txt, and leave this planned functionality out of the data provider module then I will certainly submit a feature request to the GeoNames module.

Thank you for your insight, it's great to have the maintainer of GeoNames involved in the discussion.

Location and Mapping

Group organizers

Group categories

Wiki type

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: