More taxonomy related features in views + taxonomy query system

Posted by Mgccl on March 31, 2009 at 1:00am

Overview: Views have only used minimal amount of the power of taxonomy. Some of the rich feature of taxonomy were never explored. This is a project to implement some codes to make views do more with taxonomy.

Description:
Currently, the only thing we can query about taxonomy in views are.

Is one of a set of terms
Is non of a set of terms
Is all of a set of terms

There is no support of term relations at all.
If a node N is associated with term A and B, where B is a child of A, then the user might only tag the node with B. With the current views, a user specifies to select nodes associated with A, N will not show up. Thus is one of all children/ direct children comes in handy.

With this idea in mind, I believe here are some features might be awesome for views to have

(Is one of/Is non of/Is all of) + A term + (all parents /direct parents /all children /direct children /related)
(Is one of/Is non of/Is all of) + few terms + (Union/Intersection of)+(all parents /direct parents /all children /direct children /related)
When those are not enough, more complex taxonomy queries can be made in views. Only a query language based on term/term will be implemented. (i ('triangle' 'ellipse') (! (p 'circle'))) means return terms 'triangle', 'ellipse' that is not a parent of 'circle'. The returned terms can then be used to find nodes, perhaps though tql, which is a query language based on term/node relation.

The all parents and direct parents functions can be replace by a nth level parents function, gives it more functionality.

Here is a description of the query system and what will be achieved.

The query system can take the full power of the taxonomy system and return all the terms(not nodes, but could be extended to nodes) meet the specific requirement. The syntax is lisp like, but different keywords.

There will be 3 stages in the query system. One must be completed in order to work on the other. User can chose which stage to activate in module administration.

STAGE 1. Relational select.

This stage only supports queries based on taxonomy relations. The result of a query is a list of term(s).

During stage 1,There are only 2 types of manipulate-able data. list(of terms), terms(or negative terms).

Here are some functions.

Basic structure

'term name' returns 1 or a list of terms with "term name" as it's name
('foo' 'bar') is a list, list a can contain terms or more lists. Each element is separated by space
1 this is a integer, in stage 1, it is the same as return a term with id 1.

Selection functions

These functions are kept to be minimal in size, because these are the ones most likely to be used.

term return the term.
(c term) returns all direct level children.
(c list) (c term term term) is the same as ((c term) (c term) (c term))
(p term) returns all direct level parents.
(p list) (p term term term) is the same as ((p term) (p term) (p term))
(r term) returns all related terms.
(r list) (r term term term) is the same as ((r term) (r term) (r term))
(| list) (u list) union lists of terms, also works as 'or' in conditions
(& list) (i list) find the intersection of lists of terms. , also works as 'and' in conditions
(! list) (n list) turn every term in the list to it's negative counter part. For a term, it return negative term. For a non-zero integer, it returns integer * -1. For 0, it returns 1. so it can be used as not in conditionals.
(lca list) Lowest common ancestor of all terms in the list. This is a example of higher level functions in stage 1, more will be implemented, for efficiency reasons.

This set of functions is enough to select any possible set of terms though term relations. Efficiency is the main concentration.

For any function in stage 1, it's ok to have multiple list as inputs, it will flatten and join all the lists. u( p('foo' 'bar') 'bar2') is the same as u( (p('foo' 'bar') 'bar2'))

Stage 2. Conditionals and loops

Stage 1 should be good enough for most people, considering entering stage 2 will make the query language enough to do programs.

Conditionals and loops

(> int int) first integer larger than 2nd integer, return 1, else 0
(< int int) first integer smaller than 2nd integer, return 1, else 0
(<= int int) first integer smaller than or equal to 2nd integer, return 1, else 0
(>= int int) first integer larger than or equal to 2nd integer, return 1, else 0
(/= int int) (!= int int) first integer doesn't equal 2nd integer, return 1, else 0
(if int list list) if 1st integer > 0, do the 1nd list, else do the 2nd list.
for and while loops.

Stage 2 functions

(is_child list list) return 1 if every term in list 1 is the child of every term in list 2, else return 0.
(is_parent list list) (is_parent a b) is same as is_child(b a)
(is_related list) return 1 only if each term is related to all other terms in the list.
(is_sibling list) (is_sibling a) is the same as (if (> (size (i (p a))) 0) 1 0)

(size list) return a list of number in the list.
(rsize list) return the total number of terms in the list, it runs recursively.
(+ list) (- list) (* list) (/ list) (% list) basic arithmetic functions.

A possible example of return all terms under "triangle" with more than 3 children.

It's possible all the condition and loops be replaced by a sugarcoat of SQL WHERE.

If it's the latter, everything in the conditionals and loops list will be replaced by (w string), where here string can be a SQL WHERE statement.

Stage 2 allow a new type, integer. There is no distinction between term and integer in representation.

STAGE 3

During stage 2, it will support more than queries involve on calculations of taxonomy relations. It will work on other taxonomy data, like name and description. It can also some useful function will be made into the system. Include a new type string, and possible uses for it. For example, return terms starts with letter "c" and is a child of "animals" will be possible. It could be merged with stage 2 if SQL WHERE statements are more efficient.

Why is the query language useful?
The query language is not useful for small, tree-based, not-well-structured taxonomy. But when there is any taxonomy having terms with multiple parents, the query language can show it's power.
If a person want a wine that is both "Loire" and "White Wine", he can select the children of the common children of "Loire" and "White Wine". In this case, it returns any node tagged with "White Loire". Suppose a user enjoys "White Loire" and "Semillon". He want to find any other wine to drink. It's reasonable to say there could be some common property of those 2 wines made the user like them. So the user can use the lowest common ancestor(LCA) function. LCA finds the nearest common parents for both terms, which can be considered as the common property of both terms. LCA of those 2 terms will return "White Wine", and the user then can browse other terms in "White Wine" and might find something he likes. The query can become complex. Users who don't like "Loire" but enjoy "White Wine" can return only terms that are children of "White Wine" but not "Loire", unless it's both "White Wine" and "Loire". That query can be written intuitively as (u (i (! (c 'Loire')) (c 'White Wine')) (i (c 'Loire' 'White Wine'))).

Rough timeline

Starting from the coding day:

1st week of coding: Complete all taxonomy query operations for views described in the proposal. This does not include the query language.

2nd week: start implementing stage 1 of the query language

1 week before mid-term: complete stage 1 and make sure stage 1 system works correctly.

just before mid-term: make the decision on how to implement stage 2 and 3.

after mid-term: implement stage 2 and 3.

1 week before August 10th: integrate the query language with views + work on documentation

August 10th to 17th: fix bugs, do some tests and set up a demo site.
Mentors:

randommentor0 - Anyone interested to be my mentor?
randommentor1 - backup mentor, will help with coding standards, etc.
ALocalMentor - I will be in US only till middle of July, and I will be in China for the rest of the summer.

Difficulty: medium

More taxonomy related features in views + taxonomy query system

SoC 2009

Group organizers

Group categories

Type

Admin Tags

New groups

Group notifications

Hot content this week