Defining the schema data structure

Posted by bjaspan on April 16, 2007 at 8:59pm

One of the first things we to do is define the data structure that modules will use for declaring their tables. Schema module, Frando's patch, dopry's patch, and CCK fields all use different structures though their underlying approaches are basically the same. We need to decide what kind of information the data structures must contain and can contain. This will obviously include at least columns (name, type, length, nullable, default), indices (name, columns, primary vs unique vs multi), and join/relationship information.

We should be database-independent enough to support a broad range of DBMS's but not try to be so general that we go crazy. The vast majority of Drupal uses int, varchar, and longtext columns, with a float here and there. We don't have to support everything.

Defining the join relationships is critical. Eliminating CREATE TABLE is a primary goal, of course, but I believe the functionality enabled by having join information available will be the killer app for the Schema API.

So, the floor is open. What should the table-definition data structure look like?

Comments

As to db types to support,

Posted by karens on April 17, 2007 at 10:51am

As to db types to support, the event module is going to start using a datetime field, so it would be good to support that, and generally a good idea to have a datetime field. I have worked with a number of different databases and I think all of them have a datetime field in the format YYYY-MM-DD HH:MM:SS. Also, there is a current issue on CCK discussing the need for a true decimal field instead of only a float field, and I think there is a consensus that it is needed.

not true for PostgreSQL?

Posted by pwolanin on April 19, 2007 at 1:38pm

As far as I can tell (though I havent' really used it) PostgreSQL does not have a datetime format:

http://www.postgresql.org/docs/7.4/interactive/datatype-datetime.html

The type that postgres calls

Posted by karens on April 19, 2007 at 1:50pm

The type that postgres calls 'Timestamp' is in the format we want.

really?

Posted by pwolanin on April 19, 2007 at 2:37pm

timestamp values are stored as double precision floating-point numbers

But maybe the input/output is ISO format?

I know I can use SQL like "

Posted by karens on April 19, 2007 at 2:55pm

I know I can use SQL like " <= '2007-03-07 12:45:00'" on those fields, and they seem to accept that format for input, but I guess I'm not sure how they're stored internally.

Defining table keys

Posted by bjaspan on April 17, 2007 at 1:37pm

Frando, your most recent patch at http://drupal.org/node/136171 shows that table definitions (what you call a model) can have separate elements called 'primary key', 'unique key', 'keys', and 'indexes'. Questions:

The name 'unique key' makes it sound like there can only be one but a table can have multiple unique keys, no? In fact, your code supports multiple unique keys.
What is the difference between 'keys' and 'indexes'? In MySQL at least, KEY and INDEX are synonyms, aren't they? Is this a distinction that other DBMS's have that MySQL doesn't?

In the Schema module, the single 'keys' element contains all the key definitions. It is an assoc array like 'cols' that maps key name to data about the key. The name 'PRIMARY' is defined to indicate the primary key and the numeracy of the key (unique vs multi) is encoded as a positional field in the array. It looks like this:

$schema['tablename'] = array(
  'cols' => array(...),
  'keys' => array(
    'PRIMARY' => array('col1, col2'),
    'key1' => array('col3, col4', 'unique'),
    'key2' => array('col5, col6', 'multi')));

I now think the way I did this is wrong; the column names for a key should clearly be stored in an array, not as comma-separated values. Also, I think using a positional element for unique vs multi is wrong since everything else is associative.

What our approaches share is direct access to the primary key which, I've found while writing Schema, is important. However, I don't see the need for separate 'unique key', 'keys', and 'indexes' elements, unless you clue me in. So, I propose we use your structure but rename 'unique key' to 'unique keys' and combine 'keys' and 'indexes'. So:

$schema['tablename'] = array(
  'cols' => array(...),
  'primary key' => array('col1', 'col2'),
  'unique keys' => array(
    array('key1' => array('col3', 'col4')),
  'keys' => array(
    array('key2' => array('col5', 'col6')));

comments?

PostgreSQL support underway

Posted by bjaspan on April 24, 2007 at 2:43am

I started working on the postgres engine for Schema module today. I've written most of the database-inspection code. Still need to decide on the support set of types and their mappings to MysQL and PostgreSQL.

In my previous comment I wrote, "What is the difference between 'keys' and 'indexes'? In MySQL at least, KEY and INDEX are synonyms, aren't they? Is this a distinction that other DBMS's have that MySQL doesn't?"

The answer is two the last two questions are "yes" and "yes." In MySQL, an INDEX and a non-unique KEY are the same thing. In PostgreSQL, all keys are unique; indexes are a different type of entity. Therefore, I think Frando's approach of defining them separately is a good one. I suggest that each table entry can define:

$schema['table'] = array(
  'cols' => array('col1' => array(...), ...),
  'primary' => array('col1', 'col2', ...),
  'unique' => array('key1' => array('col3', 'col4', ...), ...),
  'index' => array('idx1' => array('col5', 'col6', ...), ...),     
);

Please don't forget comments

Posted by joemoraca on April 28, 2007 at 11:32am

I think its important to allow for both table and field comments. This is a great place to build in some documentation. If you are using a tool that displays the comments it makes understanding a new schema element much easier.

Joe Moraca
http://www.webdevgeeks.com

Joe Moraca
WebDevGeeks.com

Excellent suggestion.

Posted by bjaspan on April 28, 2007 at 12:49pm

Yes, descriptions should be included. How about just adding an optional 'description' element to each level?

$schema['mytable'] = array(
  'description' => t('mytable stores foo data...'),
  'cols' => array(
    'id' = array('description' => t('what I want to say about id'), 'type' => 'int'),
    ...),
);

Schema can then display these fields in appropriate admin overview pages. Particularly when we add join info, it will be able to draw out the self-documenting schema.

Defining the schema data structure

Comments

As to db types to support,

not true for PostgreSQL?

The type that postgres calls

really?

I know I can use SQL like "

Defining table keys

PostgreSQL support underway

Please don't forget comments

Excellent suggestion.

Database

Group organizers

New groups

Group notifications

Hot content this week