Taxonomy construction using WordNet?
WordNet (http://wordnet.princeton.edu/) is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
I would like to develop a new module for Drupal that creates the site taxonomy using the tags introduced by the users everytime they add content, and the semantic relations between words in WordNet. The ideal thing here would be to make users just introduce the tags and everything else be automatic, but I can guess this will probably not work very well. So, I think it will be necessary to show the users a list of possible relationships/synonyms and then make them choose or validate the right ones.
Basically, the final idea is to get a good taxonomy that guarantees a good organization and search results without the need of some 'expert' building it beforehand. I want the users to do that. I don't know if this will work. I just want to explore the results, so if this whole thing doesn't work it's ok too.
I have to do this project to finish my major in Computer Science (in Spain we have to do some kind of 'final project' before finishing). I don't know if I've never programmed in PHP and it's my first time playing around with Drupal, so I suppose the start will be difficult. However, I think my background is good enough to be able to learn quickly.
What's your opinion about this idea? Do you think it can be done? Should I focus it in a different way to be able to apply it to Drupal? Any feedback or hints are completly welcomed, and I'm a big newbie in this =)
Thank you everyone!
Ángel.-



Interesting idea
I'm not sure I completely follow how you plan to map these concepts to Drupal (are words taxonomy terms, and links between them related terms?) or in fact quite what workflow you envision, but here are a lot of projects and issues in a similar space you might want to check out:
Feel free to create a Wiki page if you want to track resources and planned features for your project in public here in the taxonomy group!
benjamin, Agaric Design Collective
Detailed Explanation
Thank you for your response and for all the links, Benjamin =) In fact, the Smart Tags is similar in some way to this WordNet module. It, doesn't any auxiliary dictionary, but it also tries to solve some of the problems with free tagging.
Let me try to explain a little bit better what I would like to do. I will give you an expample. Imagine there is a web page based in Drupal about food receipes and dishes. The module will work as follows:
1) Bob adds a receipe of a Beef Sandwich, and tags it with the words "beef" and "sandwich".
2) The module goes to WordNet, checks the hypernym relations for both words.
3) A list of possible categories for those word is showed to Bob, and he is asked if he thinks his tags are in those categories:
Beef:
- Longhorn
- Santa Gertrudis
- Angus
- Meat
Sandwhich:
- Snack Food
4) Bob selects "Meat" and "Snack Food"
5) The taxonomy is updated with this information
There are various interesting points here:
Well, I hope everyone can undestand better what is in my mind. As I said before, any kind of help is very welcomed! :)
Best regards,
Ángel.-
This should not be too hard in Drupal
I understand a lot more clearly now!
With hook_nodeapi on the operation 'validate' you ought to be able to catch and look at the vocabulary, or probably use that as a failsafe (now that I think of it that seems hackish, better approach anyone?) and use an AJAX 'fetch from WordNet' button to grab the information. If you just use free tagging (add more terms from WordNet but without hierarchy) all you have to do is add them to the taxonomy array that is submitted. To create hierarchical terms on the fly you need to do some more but it's possible, I know I did it for (still alpha) community managed taxonomy.
benjamin, Agaric Design Collective