Introduction
Taxonomy is Drupal's well-known classification system. It is both powerful and outdated. As chx put it: "Taxonomy is beyond saving.". Taxonomy is slow, doesn't use modern APIs like FAPI and it needs Field integration. Updating existing code would be a long and painful process, since so many things need to be changed: all sorts of code needs to be cleaned up, Field integration, Node.module independency, etc.
Proposal
I want to create an entirely new version of Taxonomy. Since terms should be applied as fields, the first step is to create the Field integration. After that all existing features and new features will be added.
- Field integration
Terms should be applicable through Field.module, to allow all fieldable content to be classified using Taxonomy. - Terms & vocabularies
These will be the same as they are now. Terms can be nested hierarchically. They can have synonyms and related terms. It is still open to discussion if multiple parents will make it into D7. - FAPI integration
All term selectors will be proper FAPI widgets, defined in hook_elements(). - Better loading functions
Taxonomy_get_tree() is _the_ infamous way of loading Taxonomy data. Its query suggest it sorts terms by weight and name, while it doesn't. Its successor(s) will be able to sort and feature keyed arrays for easy access, eliminating the need for slow loops eveyr time. - API
The existing hooks will stay the same, since new and more flexible hooks have just been added to everybody's satisfaction. - Documentation
Documentation will be written in the form of Doxygen comment blocks, help and handbook pages. During development all code will be committed to a contrib at drupal.org and API docs will be made available at api.nederdev.com.
Road map
- Get familiar with Field API and make a plan of what features are needed. (early June)
- Making it work: write the actual code and tests. (end of July)
- Bug hunting. Usability review. Write documentation. Make it ready for a release. (early August)
Communication
First and foremost this is a rewrite of an existing system. When rewriting the code I will take feature requests into account, but they are low priority, since we a proper replacement for the current Taxonomy module is more important. I want people to be involved, because this project is something to be used by those people. Tips and feature requests are most welcome. Everything I do will be posted at g.d.o so people can track my progress and comment.
Profit for Drupal
- Better use of existing APIs.
- Developer-friendlier code.
- Greater extendability.
- Every kind of content that is fieldable can be classified.
Difficulty
Medium.
Field integration will be the greatest challenge.
Mentors
- Preferably people interested in Taxonomy and Field API.
About me
I am Bart Feenstra, known to some as Xano, to others as Kaaskop, although I recently changed that IRC nickname to Xano_ to match my (groups.)drupal.org username. I am a twenty-year-old student at the University of Twente (The Netherlands), where I am a first-year bachelor student in Communication Studies.
Regarding Drupal I am mostly active in the Dutch community, helping people out on the forums and IRC and organising events (DrupalJam, the bi-annual Dutch DrupalCon; Kroegmeet, intended for socialising and expanding your network). Next to that I maintain several modules (Vocabulary Index, External links filter, External Search) and write patches for Drupal 7. I have an excellent understandig of HTML, JS and CSS. My PHP skills are good as is my knowledge of Drupal and the API.

Comments
in the future, terms might
in the future, terms might become nodes, it would be nice to implement it in a way so when it happens, only some code need to be changed.
multiple parents? That's the only reason I used Drupal. But, really, if multiple parents are eliminated from the core and instead becomes a module... Maybe the data structure for taxonomy relationships should be made into nested intervals?
or materialized path for small trees...
1) As far as I know terms
1) As far as I know terms will not become nodes, because they are are completely different kind of data.
2) Multiple parents could be replaced by related terms. It is by far not sure if multiple parents will be removed, but rest assured, if this happens there will be a proper upgrade path.
theoretical speaking, a
theoretical speaking, a taxonomy can completely be built on top of the current node system, as a taxonomy is in fact nodes with relationships.
but I'm seeing that node system will completely change in the future as every node become nothing but a collection of fields(see everything is a field).
but that seems to be too far away(like Drupal 9 away...)... so you are right, there is no need to consider it...
Kinda but not really.
Taxonomies and vocabularies are end-to-end, total pieces of knowledge, so while it might be easy to think of terms as nodes, it misses the point. Taxonomy.module's strength is managing vocabularies, and if we lose focus on this, taxonomy.module will become less relevant.
No!
Start over, because taxonomy is "beyond saving"? First, that's not fair to people that worked a lot to improve the current taxonomy module, plus the issues you describe are not harder to solve in the current module. Starting over is a receipt for failure, IMHO.
Have a clear plan, start with small baby steps, make a series of small reviewable patches. That's the receipt for success.
Damien Tournoud
http://drupalfr.org
Damien Tournoud
start over, no, complete refactoring though, pretty much
I don't think Xano's talking about completely starting over from scratch in reality - but taxonomy as field and any storage solution we might come up with mean major refactoring. As always the problem with GSOC and core is finding stuff that won't be fixed before the time you start. Work on taxonomy vocabularies as fields has already started (although there'll doubtless be a bunch of work remaining for both widgets and display even if the initial patch is in by the time gsoc starts), there's work on storage / taxonomy_get_tree() as well - but again that's likely to be ongoing when gsoc starts even if it's a lot further on.
But this really needs to be broken down into "these are the patches I'd be planning to work on for three months, here's a bunch of extra work which would be great to do if those turn out not to be enough or if other people end up working on them first".
Some stuff which is a long way off even if taxonomy as fields and fieldable taxonomy gets in - related terms and synonyms as fields. RDF support for taxonomy -> field instance -> object relationships. Drill down CCK widgets for hierarchy (hierarchical select etc.). Loads of fun things to work on.
I want to do what's best. I
I want to do what's best. I don't care if it's a complete rewrite or taking small baby steps. I believe there are too many big changes that need to be made it's practically impossible to make small steps. Field integration comes with Node independency. Voila, two big changes that depend on eachother.
I am a big fan of creating small and reviewable patches, but this... There are so many things that need to be done and with that approach I don't see them happening before September.
reactions
I think (I hope) #1 is already in progress.
#2 is fine. Multiple parents are a mystery to me, and I think it would be important to understand why this feature is in Drupal now, how people use it, and what alternatives there might be if it goes away. I would need to be convinced that taxonomy fundamentals like hierarchy should be expressed as term-fields-on-terms. (i.e. Vocabulary X is hierarchical, so just add a vocabulary X term field to terms in vocabulary X and call it "parent.") I just don't know what kind of incoherent mess this would make of the data tables. I also don't know if anyone is thinking along these lines now, so my anxiety may be misplaced.
#3 is vital. Let's do it. In fact, I think it's something we need to start on right away. We need autocomplete + add widget to replace the tagging widget. We need formatters that render links and formatters that render plain text. We need taxonomy fields to use all the widgets in options.module.
#4 is a little bit beyond me. Because taxonomy is essentially about relationships, I'd like to see a completely different model for managing and using this data. In the WebObjects world, there's a concept of "faults." Because all data is linked to other data, it's impossible to select everything in once query. "Faults" are basically decoy or mock objects representing the data that would be at the other end of a relationship, and attempting to read the fault actually triggers additional database queries to load the complete version of that object. Right now, terms tell us who their parents are, but they don't tell us who their children or siblings are. It would be nice to be able to "walk" around a vocabulary hierarchy starting from term X, trigger these "faults" and never have to explicitly call functions to load parents or load children.
And if I may add another request to this pile:
We need to be able to define term fields as more or less than a single vocabulary. I need to be able to create fields where the valid options are just the children of a particular term. I also need to create fields where valid options are terms in any of 3 different vocabularies. These are cases where taxonomy module is being used to manage huge lists of field options and not as much about classification schemes.
Standard naming in case "Faults" are a bit alien...
Look up the patterns for "Lazy loading" and "Proxy" to grasp what Faults are :)
Field conversion is indeed
Also, I'd like to move Synonym Collapsing to core.
This is a fantastic, vital discussion
Which I will participate in shortly. For now I'll note that webchick wants nothing to do with mixing Summer of Code with core work, and the sheer scope of this helps explain why, so let's not have SoC be the only reason for having this thread here!
benjamin, Agaric Design Collective
benjamin, agaric
Please add to the SoC site!
Xano- please add this to the official summer of code site! We need to figure out who the best mentors would/could be for this, but either way you should add to the site asap.
Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology
Alex Urevick-Ackelsberg
ZivTech: Illuminating Technology
To prevent collisions with
To prevent collisions with other developers working on Taxonomy I decided it isn't such a good idea to work on core after all. I'll just continue this plan in my spare time, just like the rest of us :)