Time to move away from timestamp?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
robertDouglass's picture

In my opinion, the use of timestamps for the representation of dates in Drupal core is problematic. It is fine for recording all events that happen now, and it is even fine for recording historical events, as long as they happened 1970 or later. They are utterly useless if you want to make a geneology site.

More disturbingly, the event module also relies on timestamps for representation of events. As far as I can tell, this is a huge limitation. The CCK date field uses the ISO 8601 standard which saves times as as string: 20060610T20:47:48+01:00. While there are lots of arguments about how the data should be persisted in the database, to me, it is clear that the ISO 8601 string is ideal for representing the date in Drupal code (unless you're doing archeology, then I've got no idea what you do with BC dates).

So the event module has a set of private functions (all those that start with _) that are really useful for getting bits and pieces of dates all packaged up with the appropriate time zone offsets and so forth. The functions would be ideal candidates for a date.inc file to be included in core, in my opinion, except that they use timestamps that have the 1970 limitation and not something more flexible, like ISO 8601.

I've been working on a datewidgets module for CCK to finally allow a sane interface for the date fields, and I see that as long as Drupal uses timestamps, there will be the need to convert.

Is it time to move away from timestamp?

Comments

I am in full support of

Gerhard Killesreiter's picture

I am in full support of using "real" dates for events. I have programmed a date form widget for evaluation.module and plan to use this for event.module too.

thanks Gerhard

robertDouglass's picture

I'll send you my code tomorrow. I've not done much coding with dates and have some questions to resolve.

signed timestamps

chx's picture

Before PHP 5.1.0, negative timestamps were not supported under any known version of Windows and some other systems as well. Therefore the range of valid years was limited to 1970 through 2038.

Problem already solved.

Doing complex searching on

merlinofchaos's picture

Doing complex searching on ISO 8601 timestamps is very slow and can have a serious impact on performance.

Ideally database native datestamps are the best -- the downside of course is cross database compatibility.

Unix timestamps are next best -- the downside is limited range

And ISO 8601 is the worst. Converting them takes extra time, indexing on them means using string indices (and a relatively long one) that are inefficient.

native db

gopherspidey's picture

Why is cross database compatibility an issue for people?

The "TIMESTAMP" field type is a sql standard since the SQL-92 standard. I know that not all of the standards are followed, but most DBs (all that drupal supports) support the SQL-92 date and time implementation.

http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
http://dev.mysql.com/doc/refman/5.0/en/compatibility.html
http://www.postgresql.org/about/

Is it the problem of moving from one DB to another? I am just curious.

I would think that if we would use the native storage of the DB we would find speed and range improvments over both the "ISO varchar" and the "UNIX timestamp".

Agreed

kbahey's picture

The best date is the database date. It allows us to do intervals, group by, ...etc. Think about showing page views for a site by day/week for the last X weeks/months.

As merlin said, the problem is portability and even the date functions of the database are different (yet another example where abstraction leads to lowest common denominator).

Not sure what the best way is given the choices. Each has drawbacks.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Complex Queries?

jjeff's picture

Perhaps I'm missing something. But since the current date.module stores the dates in a varchar column, there's not any current way to do a listing of items happening later than or before a given date/time... right? For example you can't do a:

<?php
db_query
('SELECT nid FROM {node} n INNER JOIN {node_data_field_date} d ON n.vid = d.vid WHERE d.field_date_value > %d', time());
?>

The only way to do this is to load ALL of the values for ALL of the nodes and convert EACH ONE to a numeric date stamp for comparison.

I think I'm with Earl and Kalid: The best way is going to be to use the database-native date format, then a numeric (unix) timestamp, and then ISO. I like that ISO can store additional information, like timezone... but until we can manipulate the data to easily query "everything happening next week", "everything happening within 2 days of this event", etc... it's just going to be very frustrating.

...but again... perhaps I'm missing something.

-= Jeff Robbins | Lullabot | Drupalize.me =-

Torenware's picture

My major problem with ISO8601 is that the bloody thing uses offsets from UTC, rather than POSIX time zones. (look at the article here if that's Greek to you). This would perpetuate a problem that many of us are having with event dates around the time that the clocks get set forward and set back. It's bad now. ISO8601 would not make the situation better as far as I can tell.

Given the choice of ISO8601 and Unix time stamps, I'd have to go with the time stamps for now: OS support is good, there are reasonable solutions to the time zone problem, and they are fast to work with.

The range problem is not trivial, although you can always take the Java approach to the problem: use a signed double or some such.

If we choose a database-friendly solution: is there standard solution in ANSI SQL yet (or whatever they're calling the standard nowadays)?

Awesome article!

robertDouglass's picture

That really opened my eyes to the timezone problem. Has anyone reviewed your code yet?

So you haven't researched the storage problem to the same extent as the timezone problem? I'm looking forward to seeing better solutions hit core. Keep pointing out your work to other people... I don't think many are aware it is going on.

DB vs Drupal representation

robertDouglass's picture

So what if we stored them db native in the database and had a slew of helper functions to convert them to ISO 8601? If we knew we always needed to invoke an abstraction function to persist a date/time, that wouldn't be so bad:

INSERT INTO foo VALUES (db_date($time));

Representation & integer overflow

narres's picture

I think that the "1.Jan.1970" is "ground zero" isn't the problem, cause you are able to store 4.July.1962 as "FROM_UNIXTIME(-236500000)".

But that datetime values stored only as int(11) (MySQL default) may cause problems since "FROM_UNIXTIME(-1234567890)" is "1930-11-18 01:28:30" but "FROM_UNIXTIME(-12345678901)" is "1987-02-02 01:09:47". There happened an integer overflow.

The representation in strings like "20060610T20:47:48+01:00" uses more space (24 bytes) and it's harder to calculate time differences.

Using Timezones for logical timestamps doesn't make really sense. Logical timestamps should always stored as UTC/GMT. They should be represented in users or local timezone (e.g.).

The ISO8601 has the logical "ground zero" at 1.Jan. of year 0. But that's only the view of our http://en.wikipedia.org/wiki/Gregorian_calendar.
In fact you will only get more 500 hundred years to store.

This are only some thoughts, no suggestion.
The only thing I'm sure about: Time is a steady monotone growing function ;)

Time discussion is worth to have a own group!

Personally I am happy with

Owen Barton's picture

Personally I am happy with the timestamp implementation we have now. The only significant issue is the 1970 limitation, and I feel it is quite reasonable to tell people that they need to either (a) not use windows or (b) upgrade to PHP 5.1.0 if they want to use pre-1970 dates. Supporting pre1970 dates should then be as easy as adding a sign to the database field (well, and testing the events code with -ve timestamps). I definately support storing dates in UTC with a separate TZ field (as we do now), to allow easy sorting and comparison.

Using a database native date format would also be acceptable to me, but could make our database .inc files much more complex. There are a few neat things we could do with this, but the differ so much by database it could get very complex very quick.

DB Native Dates? Roll our own?

jjeff's picture

Can someone discuss the problems with using database-native dates?

I'm guessing that they are supported differently on different platforms, but what about Robert's suggestion of a series of db_datetime() -type commands to convert things around? For reference, here's the docs for MySQL's date/time data formats.

And here's a crazy, off the wall suggestion: Could we somehow improve on the current solutions by doing it ourselves? Is there a "best of all worlds" solution that incorporates everything we need, but might not be an exact replication of any standard?

-= Jeff Robbins | Lullabot | Drupalize.me =-

adodb

moshe weitzman's picture

not to pester, but the adodb db library has a wrapper around the various date functions. it is very well regarded..

i am not so versed in this area, but it seems that native date fields with a wrapper is a good direction. then again, i don't mind timestamps either, especially since the 1970 problem is solved.

adodb++; 1970 problem not solved

robertDouglass's picture

The adodb lib looks good... have people suggested using this before?

I understand that you have to use PHP 5.1 for the 1970 problem to be solved. Is that correct? And when you say solved, does this mean that I can have dates going all the way back to 0? And how can we call this solved unless we start requiring Drupal to use PHP 5.1?

Maybe with adodb we could connect to different database engines simultaneously. It would certainly increase our supported database list. And just think how the module developers would be thrilled =) They haven't had to rewrite their modules in a while, after all.

Possible solution

Owen Barton's picture

Changing the DB backend for storing dates may improve our ability to filter/extract dates from the database, but won't solve the 1970 (or 1901 on PHP 5.1.x) problem, as we will often still need to convert to a timestamp for processing the dates (e.g. calculating the length on a event).

Here is a possible solution:
We could use the ADOdb date time library (http://phplens.com/phpeverywhere/adodb_date_library) without needing to switch to using the ADOdb database layer (which is a separate matter).

This supports dates from 100 A.D. to 3000 A.D. and later, and uses the native (i.e. fast) date functions where possible, and only switches to PHP code and floating point arithmetic when the dates fall outside the 32-bit signed integer range. We can still store dates as good old integer timestamps if we want.

As it is mostly compatable with the php date functions, so it would be quite possible to push this into contrib (either event.module or a new date-extension.module), for those who needed events in this date range. It seems overboard to include 38Kb of date handling code in core, for what is a relatively rare use case AFAICS.

ADODB, Julian Day, other thoughts for historical posts

marye's picture

I hacked up Drupal a few years back with ADODB. (Wrote a thread on it in Drupal recently). Easy but scary since I was hacking into node, and changing the creation data column to a date/time field ...

I've also been following the CCK thread, perhaps I'll take a turn at this once the non-PEAR patch is submitted.

ADODB has its limitations: you can't post dates prior to 100, you may have to write your own hacks for some of the time functions in PHP (not complete coverage). It assumes the 1582 conversion for Julian. So some dates in the English speaking countries are off (and some other countries as well). I'm not sure what it does about new years when it wasn't Jan 1.

Others have mentioned the need for "circa" type dates. This takes another field (taxonomy or something) as you really can't tell what's going on in straight datetime DB fields (althought in mysql you can set the datetime field to "ALLOW_INVALID_DATES"). If you use varchars you can use something like 0 month to indicate a circa year and 0 day to indicate a circa month. And more rework around the date conversion functions, whatever is used.

Storing a date in Julian Day count (as in the astronomical calculation) form is also an appealing idea, but I have no idea how well the conversions work in php, and then there's the conversion to ISO formats and other user-friendly formats ...

Suggestion...

njt1982's picture

This is probably going to sound stupid, but the unix timestamp is an INT(11).... Why not increase it to 12 bits from 11? That would double the number of milliseconds to play with... Or is that not the issue? I personally prefer unix timestamp. Its not quite as human friendly, but its computer friendly and if you have a site with 500,000 visitors a month then thats the important thing. Its EASY to convert unix time to human time AND its easy to do maths with unix time. Its HARD to convert from human time to unix time (easy to code, slow to run) and doing maths on human time directly is not efficient in the slightest...

This would work for the DB storage

narres's picture

but try:

<?php
print date("Ymd-His", time())."\n";
print
"\n";
print
date("Ymd-His"123456789012)."\n";
print
date("Ymd-His"12345678901)."\n";
print
date("Ymd-His"1234567890)."\n";
print
date("Ymd-His"123456789)."\n";
print
"\n";
print
date("Ymd-His", -123456789)."\n";
print
date("Ymd-His", -1234567890)."\n";
print
date("Ymd-His", -12345678901)."\n";
print
date("Ymd-His", -123456789012)."\n";
?>

This results on my 4.4.2 PHP as
20060612-134725

19350326-055028
19521130-005013
20090214-003130
19731129-223309

19660202-032651
19301118-012830
19011213-214552
19011213-214552

The storage in database is easy to be solved by increasing the number of bytes, but the calculation in PHP is not really integer overflow protected.

So, my opinion is:

  • Drupal is a framework
    Since Drupal is a framework, we shouldn't use the default storage method of the RDBMS (e.g. MySQL) or the programming language (PHP), but should use explicit defined datatypes in DB (e.g. integer(16) for timestamp) and a set of Drupal owned wrapper functions (like adodb).
    So done, we are able to have a defined range of valid values for several datatypes and don't cut of the possibility to use native access.
  • Defined limitations and precision
    A datatype as timestamp doesn't make sence to be used for historical date-time information, cause for an event, which has happened 750 years ago, you normally don't need the exact time in "hours-minutes-seconds"-precision and nobody is talking about "Monday, 01.April -5004 at 12:13:59".
    Several datatypes should be defined as timestamp, date and time (e.g.).
  • Usability
    That data and time are to be stored is not really hard to be solved. What's hard is that the storage method is easy to be used.
    A representation like ISO8601 and saving as string like "20060610204748" ("YYYYMMDDhhmmss") sounds easy, but has significant disadvantages. The limitation is set to a 4 number year if you don't want to loose native DB's "order by"-clause.
  • My conclusion
    There will be no universal datatype which may be used for all data or time related problems. The reason is that the greater the range of values is, the nessesary precision is smaller.
    I strongly agree to a "helping" liibrary like integration (like adodb) to get more independ from technical environment and influences causing in errors.

Granularity

jjeff's picture

Just to toss another wrench in the works, it would be really great if there was an option for granularity.

For instance, there should be a difference between something happening on

the year 2006
January of 2006
January 1st, 2006
and 12:00am, January 1st, 2006

I'm guessing this is up to the input and output widgets. And/or there should be a separate column for granularity. But I just wanted to toss it into the mix.

-= Jeff Robbins | Lullabot | Drupalize.me =-

How about 2 date fields?

KarenS's picture

I think one of the problems is trying to think about this as a one-size-fits-all solution. As someone pointed out somewhere in one of these threads, you generally don't care about time for historical dates, and it's the time part that gets you into all the confusion with gmt and dst conversions. I think we need 2 date fields. One is a date/time field for current events that takes into account gmt and dst to be used for post-1970 dates where time is a factor, and the other is a simple date field with no time that could be used for earlier dates or any date where time is not an issue. The date/time field could be a timestamp, and the date field could be an iso date-only format (YYYY-MM-DD) which is a string, but a shorter one than a date/time field.

Then we just need to write things like the events module to accept either type of field, at the administrator's option, which could be done by re-writing the module so that all the functions that take a date are getting it from a date object rather than directly manipulating timestamps, and concentrating all the conversion/format logic in a central function or set of functions that creates and manipulates that object.

Also, another reason I like an option for an iso format date is that there are times when you want only a month and year or only year, and you can do that with an iso date. With a timestamp you have to store a real date by artificially giving it a month and day, and you then have no way of knowing if the month and day are real or fabricated values.

2 date fields sounds good

RayZ's picture

KarenS, I think this is a very reasonable compromise. Each of the approaches mentioned so far has its drawbacks, but I think with some documentation detailing the features/limitations of each, offering the option of two different date types sounds like the most reasonable solution I've heard so far.

I agree the two field types

gopherspidey's picture

Two field types makes alot of since to me. For me this hole discussion centers around the storage method of the date/time information within drupal.

My vote is still for Native database date storage and not storeing the date as a number type or a string in the database.

You gain alot of benefits to storing the date as a date in the database. You get better select and join capabilities. You have already optimized timezone handling (this will vary from db to db, but a wrapper can created). You push some if not all date calculation back to the DB, and that will help with webserver/php load.

Why recreate the wheel?

I agree the two field types

gopherspidey's picture

Two field types makes alot of since to me. For me this hole discussion centers around the storage method of the date/time information within drupal.

My vote is still for Native database date storage and not storeing the date as a number type or a string in the database.

You gain alot of benefits to storing the date as a date in the database. You get better select and join capabilities. You have already optimized timezone handling (this will vary from db to db, but a wrapper can created). You push some if not all date calculation back to the DB, and that will help with webserver/php load.

Why recreate the wheel?

Please, let's start using

rkerr's picture

Please, let's start using proper date/time fields in the database tables, and use the db_* layer to pull out specific date and time formats as required.

+1 for this suggestion...

Christoph C. Cemper's picture

+1 for this suggestion... the unix-timestamp is the most bizarre design-flaw I know in Drupal... all modern (1970+) RDBMS have a datetime type... and drupal should just use THAT. fullstop.

Christoph C. Cemper

General Manager of CEMPER.COM Internet Marketing Services
running Internet Marketing Fan Blog and a lot of other Drupal sites

Also causes a problem with Views

hickory-gdo's picture

I agree: for example, Views uses

$fieldinfo['field'] = "YEAR(FROM_UNIXTIME(node.created+$timezone))";

to select nodes from a certain year, but if node.created is negative (any node with a created date before 1970), then FROM_UNIXTIME returns NULL and you don't get any nodes.

robertDouglass's picture

Karen's patch offers the following features:

Features:

  • date.module now can create two types of dates, an iso date or a unix timestamp

  • the date.inc api creates a date object that has both db and local values for a date and timezone info. That object is then passed to all other functions that need it.

  • optional incusion of adodb date library extends the valid date range from year 100 to year 3000, without the library it is the usual 1970 to 2038. To keep overhead to a minimum, the library is only included when used.

  • actual timezones are used, not offsets, because offsets don't provide enough info to do dst adjustments. the event_timezones.inc file is used to calculate offsets. It works for dst for many common zones, but not all. If event module is enabled, the api uses the event file, if it is not, it uses the included copy of that file. To keep overhead as low as possible, it is only included when used.

  • timezone handling options include using gmt, the site timezone, a date-specific timezone (selected when the date is edited), or no timezone handling. The last option will store and display dates exactly as entered, useful in situations where timezone conversions don't work accurately, or when they are just not desired. I've omitted any option to adjust to the user's timezone since we don't know what it is, we only know their offset. That can be extended or worked on in the future if someone figures out a way to make it work.

Great post robert!!

aaantos's picture

Great post robert!!