Base 36 (Vancode) - leading digit/character

superjacent's picture

I hope this is the right area for this question which refers to the 'thread' field of the table 'comments'. I provide a non php solution to convert Wordpress data to Drupal. I recently became aware that the thread field is not a string of decimal numbers but is in fact a string of base 36 (Vancode) numbers. I've created the necessary functions to convert to and from base 36 format but I'm a little confused re - the leading digit/character.

Here is an excerpt from http://api.drupal.org/api/function/int2vancode/5

Generate vancode.

Consists of a leading character indicating length, followed by N digits with a numerical value in base 36. Vancodes can be sorted as strings without messing up numerical order.

It goes: 00, 01, 02, ..., 0y, 0z, 110, 111, ... , 1zy, 1zz, 2100, 2101, ..., 2zzy, 2zzz, 31000, 31001, ...

So, decimal 35 = z and so it is written as 0z. The leading zero in this case, how does that indicate length? Therefore decimal 36 = 10 and so is written as 110 as per the excerpt above. Again, how does the leading character 1 indicate length when the length of 10 is 2.

Obviously I can see a pattern forming whereby the leading character for the first 36 is 0, the next 36 is 1, the next 2 and so on up until z. Is there an assumption that the leading character will never be greater than "z".

Any help appreciated.

Steve
http://prime357.org

Update: Since no response from the initial group (Database Schema API) I'm posting to other groups I think might be relevant. Secondary question is, for specific backend database type questions which group or forum is most relevant. I've got other backend database type questions waiting in the wings.

Login to post comments

Anyone. I am hoping this

superjacent's picture
superjacent - Wed, 2008-06-11 06:54

Anyone. I am hoping this is the right forum or group..

Steven Taylor
Melbourne Australia
http://prime357.org


..

dmitrig01's picture
dmitrig01 - Thu, 2008-06-12 03:15

because length starts at 0 - add 1 to the length.
01 - length 0, number, 1


vancode?

yhager's picture
yhager - Fri, 2008-06-13 04:38

I have never heard of vancode before, I'm curious why it is needed at all..

Anyway, just by looking at the code in the API link you sent, I see that the "length" character is just a character from the ASCII table, not a vancode number:

php > echo chr(11 + ord('0')-1);
:
php > echo chr(36 + ord('0')-1);
S
php > echo chr(75 + ord('0')-1);
z
php > echo chr(79 + ord('0')-1);
~

Also, the reverse function, vancode2int() ignores the first character alltogether..

you might want to try and open an issue against Drupal core, or as at the development mailing list, or IRC..


I'd never heard of Vancode

superjacent's picture
superjacent - Fri, 2008-06-13 05:36

I'd never heard of Vancode either. Even Google doesn't return any references to it, other than links back to Drupal.org, so I'm not sure whether it is a correct term for base 36 as Octal is for 8 and hex for 16.

I'll have to check how I implemented the leading digit, from memory I think I used 'vancode'. I understand that vancode will be a shorter string than the equivalent decimal number and so when they are strung together separated by a slash for a deep threaded comment, it will be a shorter string overall. That I understand.

I can't see the need prepending a leading digit. PHP would have to do some additional string parsing to retrieve the vancode before determining the decimal number.

I'm thinking aloud now. Lets say a number as required for the thread field is created. A number could be "112yz3". There is no clear indication as to which part of the string is/are the "leading digit(s)". It could be 1, 11 or 112. I know this is a huge number and would never be reached for thread field purposes but in the 1960's year 2000 was not an issue. It seems like a lot of unnecessary work/processing to keep track of threaded comments.

Anyway, thanks for your input.

Steven Taylor
Melbourne Australia
http://prime357.org


A number could be "112yz3".

yhager's picture
yhager - Fri, 2008-06-13 06:41

A number could be "112yz3". There is no clear indication as to which part of the string is/are the "leading digit(s)".

Well, "clear indication" is made at the code level:

  • int2vancode() uses only one character for the length, so the "correct" vancode would be: "412yz3".
  • vancode2int() wouldn't care about this error, since it would parse the number as "12yz3", regardless of the first digit (1818255 in decimal if you insist on knowing.. :)

Light globe's just lit up

superjacent's picture
superjacent - Fri, 2008-06-13 07:04

Thanks, yes is making sense now. I was treating the first character (or characters) as the integer of number/36, hence my confusion re - how to determine the leading digits. I've re-read the initial explanation and realise the leading digit to be length of string(number) - 1.

Any chance we can wipe this thread, as if it never happened......

Thanks for your help.

Steven Taylor
Melbourne Australia
http://prime357.org


Any chance we can wipe this

yhager's picture
yhager - Sat, 2008-06-14 15:27

Any chance we can wipe this thread, as if it never happened......

heh.. don't.. it was fun.. and might benefit someone sometime :)


Any chance we can wipe this

liam mcdermott - Sat, 2008-06-14 07:56

Any chance we can wipe this thread, as if it never happened......

I'd rather we didn't. I need this information to help fix a (minor) bug in vbtodrupal.module. :)


History of comment.module

mshmsh5000 - Sat, 2008-06-14 14:55

I had some time to kill over coffee:

int2vancode() and its inverse function were introduced to comment.module in revision 1.428, Thu Feb 9 08:33:36 2006, by unconed, with the CVS log comment, "#48239: Comment thread coding inefficient".

I can't find the "unconed" user, but Steven Wittens, one of the core Drupal developers, developed an old Drupal theme called UnConeD.

Still though, no indication of why you'd call this "vancode". But we can define "vancode" as base 36, or alphadecimal, with a (single) leading character indicating string length.

I suppose we could ask Steven.

Matt Holford
Helen Marie
http://helen-marie.com/


AFAIK Unconed dreamed this

Heine's picture
Heine - Sat, 2008-06-14 16:14

AFAIK Unconed dreamed this up at the Vancouver Drupalcon in 2006