Get the diff

Events happening in the community are now at Drupal community events on www.drupal.org.
ernestd's picture

There are two moments in the application where we need to find the difference between the old content and the new content of the textarea where the users are writing. Actually, the techniques used to do this are infinites but all of them share the same principle of delta compression (or delta encoding or differential compression). There is a lot of research in this field. Since the 70's there has been better algorithms to do this aim, improving the way to find the difference between two files (or delta). Delta compression is concerned with many files transmisions compressions, control version systems and remote file synchronization in order to increase the efficiency of data transfers, reduce space requirements on the receiving device, etc

One of the first important papers that talk about this is A technique for isolating differences between files (1978). Actually, John Resig (JQuery fame) based his javascript diff algorithm on this paper, and Cacycle from wikipedia improved it with moved blocks visualization.

A good start point is Algorithms for Delta Compression and Remote File Synchronization paper, which sumarizes perfectly all the techniques made so far. It also explains those algorithms based on LZ77 like vdelta (used in vcdiff), xdelta (xdfs), zdelta (zlib), the string-to-string correction problem (based on edit operations) and the string-to-string correction problem with block moves (based on copy operations)

To implement the collaborative editor we use a similar technique than the one used to search the longest matching prefix between two strings. This is good for our purposes since we get the difference in one text block delimited by the common difference before and after the matching part (prefix and suffix). And we use the parameters we get from this in the logic implemented on the server side (See Pending Changes). However, to achieve this aim we compare character by character the two strings to find the start and the end of the matching part, and these are too many iterations for long strings (take a look the getDiffBlock() function in the js file)

So, I'm sure that better, shorter and faster algorithms could be implemented to do the same. And they are welcome! ;)

SoC 2006: Collaborative Editor

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: