Serialized Data Format Evaluation: CMIS

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Status: This evaluation is currently in process. Please add any additional information or correct as you are able. If correcting, please also post a comment with the grounds for the correction.

TL;DR: CMIS is a powerful domain model complemented with extensive bindings for XML and, recently, JSON services. Undertaking a CMIS server implementation on Drupal would be a very, very large and complicated project.

General Information

As summarized on the Alfresco CMIS Wiki:

CMIS (Content Management Interoperability Services) is a standard for improving interoperability between ECM systems. It specifies a domain model plus a set of services and protocol bindings for Web Services (SOAP) and AtomPub [and, JSON as of v1.1].

CMIS's domain model includes an extensive data model capable of representing documents, relationships, "folders", policies, repository information and other extensible object types. The specification also includes services for performing CRUD operations on objects and querying objects using a SQL-like interface. Bindings include use of XML, AtomPub, XML and HTTP multipart form mime type requests.

Version 1 is mature and stable, and used in a wide range of enterprise content management solutions and client projects. Version 1.1 is currently in draft form, and appears near completion according to current document versions and the posted planning timeline. Due to the inclusion of more flexible, extensible object types and JSON bindings in v1.1 this document assumes a Drupal 8 CMIS server project would target support for v1.1.

There is currently a Drupal CMIS module that implements CMIS client functionality, based on the official PHP library.

CMIS is an extensive, powerful, highly domain-specific specification. Implementing a CMIS server atop Drupal would include a number of major projects including:

  • Creation of a CMIS v1.1 PHP request handling library.
  • Mapping of Drupal data model to CMIS data model, both of which are extensive and neither created with the other in mind.
  • Full CMIS implementation coding for Drupal

Homepage and Specification

Relevant v1.1 updates, see section 1.5 of draft spec:

  • Type flexibility (better support for entity types?): flexible "item" object type and support for entity property CRUD via API calls.
  • Browser binding (JSON)
  • Bulk updates

Additional Resources and Links

JSON/XML Representations

CMIS v1.1 offers multiple bindings: WebService XML, AtomPub XML and Browser Binding JSON.

The JSON bindings are a new addition as of v1.1 and are currently specified in draft form. While the browser binding service uses JSON to deliver all object data from server to client, all service requests from client to server are sent using "multipart/form-data" HTTP requests. This has the advantage of ensuring compatibility with browser-form based submissions, but the drawback of being non-standard among API consumers such as JavaScript clients, etc.

Format Expressiveness/Hypermedia Linking

Data and hypermedia linking is implemented according to two data elements in CMIS: Relationship objects for linking between internal objects, and URI properties for referencing external objects.

Relationships are a primitive object in the CMIS data model, at the same level as documents, folders and policies. They express direction, typed relationships between any two objects. For additional documentation, see the CMIS v1.1 specification sections:

  • 2.1.6 Relationship Object documentation
  • 2.2.8.1 getObjectRelationships

Properties of the URI type hold a URI as a value, which is rendered to a string value or link tag in XML, JSON and AtomPub bindings (respectively). See Section 2.1.2.1 of the v1.1 specification for more details on properties.

In addition, internal objects might represent file or external resources. Each object can have one set of Content Stream properties, specifying the type, content and filename of any binary data associated with the object, supplemented by an arbitrary number of "renditions" of each object, each presenting a variant on the base object, such as thumbnails of an uploaded images.

Overview of resource structure

Example Hypermedia node content

Should includes local file link, local resource link, remote resource link and remote file link. Remote resource links should be Youtube, Flickr and/or other demonstrative examples.

"Webservice" XML implementation

See http://tools.oasis-open.org/version-control/browse/wsvn/cmis/trunk/cmis-... for an example XML response to a getObject request.

Remote resources might be implemented as oembed URI/URL, a stream wrapper URI (has complications), or similar. Perhaps of the form:

<propertyUri queryName="UriProp" displayName="Sample Uri Property" localName="UriProp" propertyDefinitionId="UriProp">
  <value>youtube://xIpLd0WQKCY</value>
</propertyUri>

A local file link would look similar, with either a local scheme for the URI value, such as "public://file.type" or the full URL of the file accessed via HTTP.

Links between resources would most likely be better represented as relationships on each object, though they might also be URIs. Related objects are returned through queries on each object for all objects connected via a specific relationship or any relationship type.

Unfortunately, the current v1.1 specification does not include examples of relationship query responses. Jeff Potts' CMIS tutorial does include a lengthy example on page 31, in Atom form for CMIS v1.0.

"Browser Binding" JSON Implementation

The same data structure principles and questions noted for XML resources apply for JSON.

See http://tools.oasis-open.org/version-control/browse/wsvn/cmis/trunk/cmis-... for an example JSON document object, though not Drupal-specific.

Example:

"FieldYoutubeVideo": {
  "id":"UriProp",
  "localName":"FieldYoutubeVideo",
  "queryName":"FieldYoutubeVideo",
  "value":"youtube://xIpLd0WQKCY",
  "type":"uri",
  "displayName":"Sample Uri Property",
  "cardinality":"single"
},

Unfortunately, neither the (current .orderly specification)[http://tools.oasis-open.org/version-control/browse/wsvn/cmis/trunk/BrowserBinding/schema/cmis-schema-v0.1-browserbinding.orderly] nor the draft specification documents provide details regarding the representation of relationship object types in JSON at this time.

Example ImageCache-like Object

This is taken from the Aloha Editor specification's example of what a Flickr image may look like when exposed through a CMIS service. Note that there may be multiple rendition types, each referring to a different presentation (view mode, imagecache preset, etc.) of a given resource:

{
  id: 'gailenejane/5008283282’,
  name: 'Quiet moment’,
  type: ‘image/jpeg’,
  url: 'http://www.flickr.com/photos/gailenejane/5008283282/‘,
  renditions: [{
    url: 'http://farm5.static.flickr.com/4128/5008283282_f3162bc6b7_s.jpg’,
    mimeType: ‘image/jpeg’,
    filename: '4128/5008283282_f3162bc6b7_s.jpg’,
    kind: ’thumbnail’,
    height: 75, width: 75
  }]
}

This is a JSON example, though apparently based on the CMIS v1.0 spec. In the case of XML, I believe each of the json property names would be implemented as XML tags with the "cmis" namespace, containing "value" tags to store each value.

Support for ad-hoc/configurable resource definitions

Object types include schema indicating the properties included for each object. Property types in the current specification include (and are limited to?): string, boolean, decimal, integer, datetime, and uri (from the XML spec) as well as URI and HTML property types. It appears that implementations might be able to supplement this list with additional property types as well, with the specification indicating that "Individual protocol bindings MAY override or re-specify these property-types." (Section 2.1.2.1)

Formatter Implementation/Handling

One challenge with hypermedia resources is the need for either formatted or unformatted values depending on use case (displaying vs. editing the URI of a video, for example). Discuss how this format might address this issue.

Internationalization

The CMIS standard does not include specifications for handling localized requests for resources nor for handling translation relationships. It appears that a combination of other internationalization standards and specific CMIS and platform configurations are used to meet I18n requirements.

Support for localized API requests

It appears that CMIS implementations may make use of the xml:lang attribute (see effulgentsia's comment), the i18n:international element of a WS-I18N header (see IBM's CMIS for FileNet documentation, page 15)

Encoding of internationalized content

It is not clear how language and other localization settings are stored in CMIS objects. One can assume String and other property types are used, along with object relationships for translations.

NEEDS FURTHER EXPERTISE/EXPLORATION

Support for Collections

Collection Overview

CMIS supports two sorts of collections:

  • Folder object types are hierarchically organized collections of
    objects (including other folders).
  • Relationship collections are less structured, directional
    collections of objects as they relate to each other.

In addition to these object types that return collections, CMIS also implements an extensive Query language for searching the object store and returning all objects meeting the specified criteria. Queries might search on all Folder children or related objects, or on specific property values.

Collection Example

The following example collections are provided in the CMIS v1.1 draft documentation:

UUID

Alfresco Object IDs are a required property for all object types and are UUIDs of the form

urn:uuid:0e2dc775-16b7-4634-9e54-2417a196829b

Versioning/Conflict Resolution/Locking

CMIS has extensive versioning support, as required properties on document objects. Version 1.1 introduces support for resource locking as well.

Semantics

There is no standard for including semantic information in CMIS resource responses. The IKS project has developed some add-ons to the CMIS standard to support semantic data integration:

http://blog.iks-project.eu/stanbol-cms-adaptor-allows-you-to-transform-j...

PHP Libraries

While there are mature PHP client libraries available, it is important to note that no standard CMIS server library is available for PHP. This actually makes a good deal of sense, as a server implementation for CMIS is basically a full CMS with CMIS support, but it means that **implementation of CMIS for Drupal means creating the entire CMIS server stack in PHP.***

PHP Libraries Available

The Apache Chemistry project maintains the reference CMIS client and server implementations in Java and also maintains reference client libraries in other languages as well, including PHP.

Apache Chemistry PHP client: http://chemistry.apache.org/phpclient.html

This implementation is not complete, however, and does not implement features such as getting relationships, checking in and out of resources, and some folder CRUD.

There is also a library that appears to have been developed for a Summer of Code project and is no longer maintained:

http://code.google.com/p/cmis-phplib/source/list
https://bs-solution.com/blog/article/vente-par-telephone-et-prospection

PHP Library Development Status

The PHP client has not been developed since June of 2011, development is managed by the Apache Software Foundation.

It is not an active community project.

JS Libraries

There is no official CMIS JavaScript client library.

JS Libraries Available

JS Library Development Status

Development of jquery-cmis seems to have stopped.

Drupal

Current Drupal Projects and Groups

Community Experts

  • Drupal Users w/ expertise: dries, cfuller, IanNorton, others.

Other Drupal Resources

TODO: List any other resources around the web, such as tutorials, case studies, etc. related to this format.

Anticipated "Lift" for Core Implementation

The lift of CMIS is extremely intensive. The major tasks of making Drupal a CMIS server would be:

  1. Architecting: mapping Drupal concepts and entities to CMIS
    objects and designing a CMIS implementation to suit Drupal.
  2. Developing: all CMIS server code will need to be written from
    scratch, and will need to meet an extensive, demanding and rigorous specification.

Market Share

CMIS' market share is considerable, as it was developed with the support of many of the largest makers of online systems (Adobe, IBM, Microsoft, Alfresco etc.)

Other CMS's supporting

Taken from the CMIS Wikipedia page, implementations include:

  • Alfresco
  • IBM FileNet and other IBM products
  • eXo
  • Microsoft Sharepoint Server 2010
  • Nuxeo
  • Typo3

Client platforms supporting

There are a large number of CMIS clients, including CMS's (such as Drupal) which inter-operate with other enterprise content providers on the back-end. Some notable examples include:

  • Confluence
  • Wordpress
  • Silverstripe
  • The Aloha HTML5 Editor

Other languages/platforms

Reference implementations are in Java.

A list of clients, as of 2010, can be found at http://blog.exoplatform.org/2010/02/18/list-of-cmis-clients/.

Comments

Might be good to wait until CMIS 2.0

effulgentsia's picture

Crell will be posting a write-up of the recent Paris WSCCI sprint where we discussed CMIS and JSON-LD extensively. A major reason why we are currently leaning away from implementing CMIS in core is that CMIS 1.1 does not support hierarchical/complex properties. It's a candidate feature for CMIS 2.0, but I have not found any info on how far along that is. In Drupal, we commonly have fields with multiple "columns", for example text_field_schema() defines "value" and "format" for the "text" field type and "value", "summary", and "format" for the "text_with_summary" field type. Not being able to map Drupal fields to CMIS properties results in any attempt to map Drupal entities to CMIS objects very clunky.

CMIS 1.1 also doesn't specify how to model language-specific property values and translations, but that's easier to handle outside the spec than complex properties are.

Given that CMIS 2.0 may expand to cover both internationalization and complex properties, I would recommend waiting for that to come out rather than trying to make Drupal (a square peg) fit CMIS 1.1 (a round hole).

Drupal CMIS vs PHP-CMIS

resplin's picture

I recently discussed this with the author of the Drupal CMIS and PHP-CMIS (Chemistry) libraries. In order to meet a timeline, PHP-CMIS was copied into Drupal CMIS. PHP-CMIS hasn't received much attention because there are a number of fixes in Drupal CMIS that need to be merged back. Going forward, Drupal CMIS should probably depend on PHP-CMIS instead of having a copy of the library.