What features of JSON-LD would we use in Drupal 8?

Posted by linclark on July 23, 2012 at 2:45pm

We should review which features of JSON-LD we are likely to use in Drupal 8 and what their status is, as some are still under development.

Since many contributors do not have a grounding in linked data and haven't been involved in the JSON-LD selection process, I start with the basics, which are pretty stable. I then then move on to the parts of the spec that are fluctuating.

Identifying things with URIs
Identifying properties with URIs
Explicit type handling
Language handling
Language version handling
Easy API to retrieve specific data

What's available with regular JSON?

JSON uses an entity-property-value data structure to communicate information. The entity-property-value way of modeling things should look familiar... it maps well to Drupal's Field system.

An example of entity-property-value:

Entity	Property	Value
Larry	fullname	Larry Garfield
Larry	authored	The Kernel has landed
Larry	knows	Lin

And this can be represented with regular JSON.

For example:

{
  "field_fullname": "Larry Garfield",
  "field_authored": "The Kernel has landed",
  "field_knows": "Lin"
}

The second two values in the table above (the article and the person) are actually their own entities which could have their own properties.

Entity	Property	Value/Entity	Property	Value
Larry	knows	Lin
		Lin	name	Lin Clark
		Lin	wrote	Microdata in Drupal early preview

To represent this additional information about other things, we can turn the values into objects and add properties to those objects. This way the information is connected to the thing, rather than being represented as a property of Larry.

{
  "field_fullname": "Larry Garfield",
  "field_authored": {
    "field_name": "The Kernel has landed",
    "field_about": "Drupal"
  },
  "field_knows": {
    "field_name": "Lin Clark",
    "field_wrote": {
      "field_title": "Microdata in Drupal early preview",
      "field_about": "Drupal"
    }
  }
}

What more do we need?

The structure above is a nested set of objects, also called a tree. This becomes a problem when two or more parent objects have the same object as a property, as would be the case if two nodes had entity references to the same entity.

For example, Larry and I might co-author an article. In the JSON above, there's no concept of the article as an entity independent of its tree, so it is hard to refer to it from another place. We would need to duplicate the info about the article, once under each of us.

Modeling these kinds of graph relationships between objects, particularly between objects on different web pages and sites, is the kind of use case that Linked Data is really tailored for.

Identifying things with URIs

JSON-LD uses universal identifiers to make it easy to merge objects together. Specifically, the kind of identifiers it uses are links, aka URIs.

Entity URIs:

Larry = http://www.garfieldtech.com/user/1
The Kernel has landed article = http://www.garfieldtech.com/blog/wscci-kernel-merge
Lin = http://lin-clark.com/#me
Microdata article = http://lin-clark.com/blog/microdata-drupal-early-preview

(To the SemWeb pedants, yes, I am aware that the hashless URIs above return a 200 response, and thus as identifiers of NIRs are not compliant with httpRange-14...)

Because we use URIs for entities, it is easy to take two objects and merge them based on the entities' IDs.

This allows us to separate our data into sensible chunks. The object representing Larry could have the info that Larry knows me. The object representing me could contain the info that I wrote the microdata article. Since I'm indentified with the same URI in both objects, the the final graph would show that Larry knows the person who wrote the microdata article.

Here's what it looks like with Larry and I each handled within our own objects, using URIs so that the info can easily be merged.

{
  "@id": "http://www.garfieldtech.com/user/1",
  "name": "Larry Garfield",
  ....
  "knows": "http://lin-clark.com/#me"
},
{
  "@id": "http://lin-clark.com/#me",
  "name": "Lin Clark",
  "authored": {
    "@id": "http://lin-clark.com/blog/microdata-drupal-early-preview",
    "name": "Microdata in Drupal early preview",
    "about": "Drupal"
  }
}

Identifying properties with URIs

Similarly, because we are using URIs for properties (and using external, widely-used vocabularies to provide those URIs), it's easy to know that my 'field_name' provides the same kind of information as Larry's 'field_fullname'.

Property URIs:

field_fullname, field_title, field_name = http://schema.org/name
field_authored = http://xmlns.com/foaf/0.1/made (I use FOAF here because Schema.org only has the inverse relationship, $this->author)
field_about = http://schema.org/about
field_knows = http://schema.org/knows

Here is the full object. Note that the context has aliases for the property URIs. This means that we can use short attribute names as we do in regular JSON, but that we still have access to the property's full URI if we need to merge data from different sites.

{
  "@context":
  {
    "name": "http://schema.org/name",
    "authored": "http://xmlns.com/foaf/0.1/made",
    "about": "http://schema.org/about",
    "knows": "http://schema.org/knows"
  },
  "@graph":
  [
    {
      "@id": "http://www.garfieldtech.com/user/1",
      "name": "Larry Garfield",
      "authored": {
        "@id": "http://www.garfieldtech.com/blog/wscci-kernel-merge",
        "name": "The Kernel has landed",
        "about": "Drupal"
      },
      "knows": "http://lin-clark.com/#me"
    },
    {
      "@id": "http://lin-clark.com/#me",
      "name": "Lin Clark",
      "authored": {
        "@id": "http://lin-clark.com/blog/microdata-drupal-early-preview",
        "name": "Microdata in Drupal early preview",
        "about": "Drupal"
      }
    }
  ]
}

ISSUE

We actually need to be able to use multiple property URIs for each term.

The primary use case for the WSCCI initiative does not require merging data from different sites, but rather is about communicating server-to-server within one site (the deployment use case) or server-to-client on the same site, or with custom built apps. We want developers to be able to do this without having to figure out which external vocabulary to use, so we will need to automatically generate a vocabulary for each site. This vocabulary simply exposes the data as it is modeled in Drupal.

However, for those who do want to share data with an independent service or a network of sites using a shared vocabulary (such as Schema.org), we need to allow the facility to map additional property URIs to the same term.

Mapping one JSON attribute name to multiple property URIs is not currenlty possible in JSON-LD, but I have raised an issue and it is likely to be resolved soon.

Explicit type handling

Sometimes it makes sense to explicitly give a datatype for a literal value. For example, if you have an Event entity type with a start_time field, you might use xsd:dateTime. However, one thing to be aware of is that many datatypes expect specific formats. For example, xsd:dateTime expects a variant of ISO 8601.

These datatypes can be assigned at a property level, so any value for this field will automatically be typed with the datatype.

For example:

{
  "@context":
  {
    "startDate": {
      "@id": "http://schema.org/startDate",
      "@type": "http://www.w3.org/2001/XMLSchema#dateTime"
    }
  },
  "startDate": "2012-06-20T17:45:00-04:00"
}

You can also use types for objects themselves. For example, http://schema.org/Event can be used to indicate that the object is an event.

{
  "@context":
  {
    .....
  },
  "@type": "http://schema.org/Event",
  "startDate": "2012-06-20T17:45:00-04:00"
}

We would also create types in the site generated vocabulary which would correlate to the bundle (something like http://example.org/schema/types/node/article). If we want to, we could provide machine readable definitions of types at that URL, which would make it easy to reuse another site's content type definition.

Language handling

Language handling for strings is fairly straightforward. We can add the language in the @context attribute, and the language will be applied to all string values in the tree. For example:

{
  "@context":
  {
    "@language": "en"
    ...
  },
  "@graph":
  [
    {
      "name": "Microdata in Drupal early preview"
    }
  ]
}

Note that child objects can override the parent context. For example, the following snippet sets the language to German for a particular article:

{
  "@context":
  {
    "@language": "en",
    ...
  }
  "@graph":
  [
    {
      "@id": "Wolfgang",
      "authored": {
        "@context":
        {
          "@language": "de"
        },
        "@id": "htttp://wolfgangziegler.net/vorgeben",
        "name": "vorgeben Lin spreche Deutsch",
      },
    },
  ]
}

Language version handling

A more complex part of Drupal's multilingual capability is the fact that a node can have different entity references based on which language is active.

For example, the German version of a node might be tagged with Term 1 and Term 2. The English version might also be tagged with Term 1, but not tagged with term 2. Another example of this is the reference to the node's author. If we have an article about the multilingual intiative, Wolfgang might write the German version and Gábor might write the Hungarian version. We want to be able to maintain that distinction.

Because language handling in JSON-LD only applies to strings, we can't depend on the language that is defined in the context. We have to find a different way to separate the English version of an entity reference field from the German version.

The approach that I would recommend based on the current spec is named graphs. Named graphs allow you to add a 'source' column to your entity-property-value relationships. For example:

Entity	Property	Value	Source Graph
node/1	tags	term/1	graph/de
node/1	tags	term/2	graph/de
node/1	tags	term/1	graph/en

Using named graphs, this would look like:

{
  "@context": {
    "entityId": "http://example.com/node/1",
    "graph": "http://example.com/node/1/graph/",
    "title": "http://example.com/schema/fields/title",
    "tags": "http://example.com/schema/fields/tags"
  },
  "@graph": [
    {
      "@id": "graph:de",
      "@graph": [
        {
          "@context": {
            "@language": "de"
          },
          "@id": "entityId",
          "tags": [
            {
              "@id": "http://example.com/taxonomy/term/1",
              "title": "Das Kapital"
            },
            {
              "@id": "http://example.com/taxonomy/term/2",
              "title": "Schadenfreude"
            }
          ]
        }
      ]
    },
    {
      "@id": "graph:en",
      "@graph": [
        {
          "@context": {
            "@language": "en"
          },
          "@id": "entityId",
          "tags": [
            {
              "@id": "http://example.com/taxonomy/term/1",
              "title": "Capital"
            }
          ]
        }
      ]
    }
  ]
}

ISSUE

Named Graphs were only added to JSON-LD a few months ago, and some tools don't have support for them yet. It is also possible (though unlikely) that named graphs will be removed from the final spec. In addition, some parts of the spec have not been reworked to function with named graphs yet (specifically framing, mentioned below).

Named graphs also start moving us into linked data deep magic, and Drupal would depend on the (very few) necromancers who could debug, support, and maintain it. While necromancer status is flattering, it isn't healthy for the project.

Another potential approach is language maps, a recent proposal being developed to support the Wikidata project. This matches Drupal's internal data structure more closely, as the language switching happens at the field level instead of the entity level.

This is a proposal in flux, and the latest version of the proposal doesn't fully cover our use case, so I've done a bit of extrapolation to come up with this example:

{
  "@context": {
    "entityId": "http://example.com/node/1",
    "dc": "http://purl.org/dc/terms/",
    "title": "http://example.com/schema/fields/title",
    "tags": {"@id": "http://example.com/schema/fields/tags", "@container": "dc:language"}
  },
  "@id": "entityId",
  "tags": {
    "de": [
      {
        "@id": "http://example.com/taxonomy/term/1",
        "title": "Das Kapital"
      },
      {
        "@id": "http://example.com/taxonomy/term/2",
        "title": "Schadenfreude"
      }
    ],
    "en": [
      {
        "@id": "http://example.com/taxonomy/term/1",
        "title": "Capital"
      }
    ]
  }
}

ISSUE: The language maps proposal is still in heavy development, so it's unclear whether the final version will work for us or not, and also unclear what the timeframe is for standardization and library support.

Easy API to retrieve specific data

To provide a better developer experience, we want to have functions that make it easy to manipulate the data. This is particularly important if we have to use named graphs, which have a complicated tree structure that is unfamiliar to most devs. It is less important if we can use language maps.

The JSON-LD API's approach to this is still evolving.

The first proposal had a toProjection function.
The currently spec-ed approach is framing.
An active proposal, which seems likely to either work with framing or supplant it, is objectify.

ISSUE: We need to consider how important this part of the API is to us and whether these approaches are workable for us.

Conclusion

I believe that we need to answer three questions before we move forward with JSON-LD.

Can we use multiple vocabulary terms for a field?: As mentioned above, I have filed an issue for this and it seems likely that it will be accepted. It is on the agenda for the JSON-LD community group's weekly telecon tomorrow, so we should know shortly.
How will we represent language versions?: Named graphs are currently speced, but the rest of the spec still hasn't been changed to accomodate their use. Plus, they are a pretty complicated concept. The current language maps proposal seems to fit our use case well, but it's unclear what the final proposal will look like.
How do we want to use the API?: The API for retrieving a specific subset of the data is still fluctuating. How important is this part of the API to us? Do we like the way the current proposals work?

Comments

I joined the JSON-LD call

Posted by scor on July 24, 2012 at 4:01pm

I joined the JSON-LD call today during which the multiple vocabulary terms per field issue was discussed, and tried my best to make the case for Drupal. The group is supportive of that feature and understands our needs. This resolution was taken:

Support a single term expanding to multiple IRIs when an array of @ids are associated with a single term in the @context.

So we should soon be able to express multiple terms in @context, which might look like this:

{"@context": {"title": ["dc:title", "schema:name"]}}

So that means that, as

Posted by linclark on July 24, 2012 at 6:42pm

So that means that, as expected, Feature #2 has no more issues and we can feel comfortable about moving forward with it.

We do still have to answer the other two questions surrounding Features #5 and #6.

Language version handling
My belief is that we want to avoid using named graphs because the concept is too unfamiliar to most devs, and it makes the JSON object hard to navigate without framing or querying. We should evaluate whether the latest language maps proposal would work for us. To me it looks a lot like what fago suggested, so I think it should work.

If it does, then we should talk with the JSON-LD CG to see what the current momentum is, and how much effort it will take libraries to add support.

We should also define what our timeframe is for moving forward. I'm not sure whether the specification work will be possible in our timeframe.

Easy API to retrieve specific data
There is a discussion that just started on the JSON-LD mailing list around this topic. We need to consider whether we feel comfortable moving forward if the API isn't yet fully defined.

My gut feeling is this: for our needs in core, language maps make it easy enough to directly access the data... so if language maps are added to the spec, we can move forward without the querying API. This would leave some wiggle room for JSON-LD's filtering/querying API to fall into place later.

A big thanks to the JSON-LD

Posted by linclark on July 31, 2012 at 9:26pm

A big thanks to the JSON-LD community group for spending today's telecon reviewing our use case and the language maps proposal. The issue has been resolved.

What this allows us to do

With language maps, we will be able to have a JSON structure similar to the one that fago originally proposed. For example, to get the English version of the body field's safe value, it would look like this.

obj.body.en[0].safe_value

In JSON-LD, this would look like the following:

{
  "@context": {
    "entityId": "http://example.com/node/1",
    "dc": "http://purl.org/dc/terms/",
    "body": {"@id": "http://example.com/schema/fields/body", "@container": "@language"} ,
    "safe_value": "http://example.com/schema/fields/text_formatted/safe_value",
    "tags":  {"@id": "http://example.com/schema/fields/tags", "@container": "@language"} 
  },
  "@id": "entityId",
  "body": {
    "en": [
      {
        "safe_value": "<p>This is the body.</p>"
      }
    ]
  },
  "tags": {
    "en": [
      {
        "@id": "http://example.com/taxonomy/term/1",
      }
    ]
  }
}

The great thing about this is that it's easy to read and attribute values are easy to access without any use of the JSON-LD API, so we don't introduce a dependency on a JSON-LD parser.

What it leaves unresolved

It wasn't clear how round tripping between the compacted and expanded versions would work. My hope is that the data will still have the same shape when it is expanded. However, processor implementers were concerned this might add too much complexity to the processing algorithms.

If maintaining the data shape when using language maps is not added to the spec, we may need to write our own term expansion. We could use this when combining data from two Drupal sites (since both data sources would be using language maps in the same way and have the same data shape). Term expansion allows us to use the full URIs to identify entities and properties, which is particularly important when combining data from multiple sites.

I don't imagine that the term expansion part of the code is too complex, so while this would be inconvenient, my gut feeling is that it isn't a dealbreaker.

EDIT: fixed minor consistency issue in JSON-LD snippet.

Great news

Posted by Crell on August 1, 2012 at 2:54am

Thank you, Lin, and Scor, for your help engaging on this! It sounds extremely promising.

So what's the next step here? Assuming the Entity/Property refactoring proceeds as planned, what do we need to do in order to have the JSON roundtripping ready for it? (Whether it lives in core or contrib at the end of the day is an implementation detail that shouldn't matter at all at this point.)

I think that the JSON-LD spec

Posted by linclark on August 1, 2012 at 4:04am

I think that the JSON-LD spec is in place enough for us to move forward. I don't think that we can wait for them to finalize the roundtripping from compacted form to expanded form. We should figure out whether that unresolved issue is a no-go for us, or whether we feel comfortable moving ahead using direct variable access as you would with regular JSON.

Once we decide that, here are a few of the different development tasks, in order of their importance.

Basic serialization

Serializing an entity to JSON-LD should be pretty straight-forward once the Entity Property API is in place.

Site specific vocabulary generation

We will need to figure out the pattern we want to use in generating the site specific vocabulary.

Additionally, we could offer some really interesting information as part of the vocabulary. For example, we could have a standard Drupal vocabulary for offering a content type definition at http://example.com/vocab/entity/node/my-new-type (Crell, I believe you mentioned an idea like this). If we developed a processor to consume such a definition, then content types would be reusable between sites just by pointing to the URL. That's more of a nice to have than a need, though.

Consuming the data within Drupal

Independent of the serialization, we should flesh out how we want to use the data. I'm particularly thinking of the deployment use case here. How do we plan to merge incoming data?

For example, if a node is created at http://staging.example.com/node/1, do we store that URI with the record when it gets pushed up to live.example.com so that updates on that node on staging can be pushed to live? Or do we assume that deployment from staging is only run once?

** If Dick or any of the other folks have diagrams or issues describing the deployment use case, that would help inform the discussion.

Depending on what we need to do here and how far the JSON-LD CG work has come, we may need to implement our own term expansion for consuming JSON-LD from other Drupal installs.

Mappings to external vocabs

This isn't something we necessarily need, but is something we've talked about wanting to support. This is a larger problem.

If we want to support mappings to external vocabularies, then we'll need to find a better way to manage namespaces in core. RDF module supports prefix-based indirection, a mechanism for abbreviating URIs. A prefix is a pointer to a full URI. When the local term is combined with the namespace URI, it creates the meaningful identifier.

The Drupal implementation of this mechanism is flawed. It mistakes the prefix for a meaningful token. The mappings are stored in the database using CURIEs; however, the full namespace URIs can be changed independently. For example, someone might change Facebook's 'og' prefix to point to 'http://www.facebook.com/2008/fbml', in which case Open Graph tags would expand to the incorrect URI, starting with the new namespace. Sometimes this is the behavior you want, but in Drupal it's far more common for people to do this accidentally.

Modules can invalidate eachothers' RDF unknowingly, a user can override a prefix to output incorrect terms unknowingly, etc. Bad situation.

If we want to support mappings to external vocabs, we'll need to fix the namespace handling.

For example, if a node is

Posted by gdd on August 1, 2012 at 4:48am

For example, if a node is created at http://staging.example.com/node/1, do we store that URI with the record when it gets pushed up to live.example.com so that updates on that node on staging can be pushed to live? Or do we assume that deployment from staging is only run once?

I can prepare a more detailed response later, but I don't think we can live with that assumption. If this is going to work, then the uuid needs to be the canonical identifier for nodes. If this means they go in the URL, well, I guess a lot of sites are going to be running Pathauto. That's a pretty compelling case for Pathauto in core actually IMO.

I may be missing something

Posted by adub on August 1, 2012 at 10:24am

I may be missing something obvious but I'd assumed that uuid would always be the externally facing canonical identifier and that the hardcoded index ids would only ever be used internally, solely for performance purposes - isn't that the whole point of moving to uuid? (Replying to heyrocker's point above)

No aliases

Posted by Crell on August 4, 2012 at 3:43am

Actually for REST purposes, we concluded at the sprint that we never want to use aliases. node/5 is a canonical URL. I don't know if we want to use nids or uuid as the canonical URL, but we do not want them aliased except in user-facing parts of the browser. That just ends up too messy when aliases change (and thus per REST it becomes a different resource).

No of course. My only point

Posted by gdd on August 4, 2012 at 3:55am

No of course. My only point is that as you say, per REST, these are the same resource, and since the nid may be different between sites, you basically have to use the UUID as the canonical URL.

Opposite

Posted by Crell on August 7, 2012 at 2:43pm

No, I mean exactly the opposite. http://one.example.com/node/5 and http://two.example.com/node/5 are two different resources per REST. Similarly, http://one.example.com/node/abc123 and http://two.example.com/node/abc123 are also two different resources. That they have the same UUID doesn't matter, because in a REST-think world the full URI is the ID, not just the last substring of it.

Whether having the UUID in the URL is helpful for Drupal figuring out that those are actually aliases of each other, I don't know, but REST-wise UUIDs don't help us.

In Linked Data, there is a

Posted by linclark on August 7, 2012 at 3:01pm

In Linked Data, there is a concept of sameAs.

For example, I could say:

<http://example.org/lin-clark> <sameAs> <http://foo.com/lindsay-clark>

I haven't fully thought through how we would use this concept and whether it would solve the problem, but it's an option to consider.

I think we are mixing our

Posted by gdd on August 7, 2012 at 4:07pm

I think we are mixing our definitions of the word 'resource'. When I was thinking of a resource, I was thinking of a thing - a piece of data. Both of these are the same piece of data. When I said 'for REST purposes' I was also mainly thinking of 'For the purposes of identifying a piece of content in a REST service.' I can see what you're getting at though. I need to watch my terminology a little more carefully.

Apple oranges

Posted by scor on August 7, 2012 at 4:42pm

Resource has a different meaning depending what context you're in:
- content staging across Drupal instances for the same site (whether it's test-stage-prod or pushing data across web nodes of a production site)
- content sharing on the Web

I'm not sure we can find a workflow that will work perfectly for both of these use cases at the same time. In the first use case you have good control of the code base, the structure of your data and you know where the data is located / coming from, but because the URIs of your entities vary, it's challenging to find which resources are equivalent and matching. UUIDs are a better fit here IMO. In the second use case, you have no control over the site, and you also need a way to dereference the resource, in which case URIs are essential, and UUID are not such a requirement since the URI (which might contain a UUID) serves as a unique identifier on the Web.

I think trying to apply linked data URIs to the first use case is counter productive, and UUIDs are easier to work with in this case. The situation is reversed in the second use case. (that's similar to what I wrote in the other thread). HTTP URIs act as UUIDs on the Web and include a dereference mechanism, but in a control workflow such as content staging where your dereferencing mechanism is pretty much set by the deployment plan, all you need is UUIDs, and URIs tend to get in the way.

Right, this relates to my

Posted by linclark on August 7, 2012 at 5:02pm

Right, this relates to my question about whether we should support both use cases from the other thread.

It doesn't have an impact on the JSON-LD features we use though. The full URI will always be used for the "@id", we just might have a "uuid" attribute as well, or a "sameAs" property with an array of sameAs URIs. Or both.

Since the separation (or not) of these two use cases doesn't impact the JSON-LD features we use, I'd suggest we discuss them in the other thread.

Sweet!

Posted by Crell on July 29, 2012 at 5:13pm

Great writeup, Lin! And definitely good news that it looks like the multi-vocabulary stuff will be moving forward, as that neatly solves that problem for us.

My assumption to date, and feel free to correct me if this is dumb, is that we'd be using JSON-LD as strictly a wire-format. That is, a receiver of a JSON-LD-ified Entity would first convert it back into a PHP Entity object (Node object, User object, whatever) and then manipulate it in PHP. That means we wouldn't need an in-JSON API at all.

Although, now that I think about it, any client-side code we have WOULD need to do that, say for client-side theming. Hm. That may be an area to ask Bergi about, since Create.js is a key JSON-LD consumer and a client tool I would want to support. He may have some input here.

I don't have much opinion on the language front, other than it being something we should try to hide from module developers as much as possible either way.

Great write-up Lin. Please

Posted by lanthaler on July 30, 2012 at 9:20am

Great write-up Lin. Please correct me if I'm wrong but I don't think the language map you used in your example solves your problem. Let me try to explain why.

In your example you model the data like this:

{
  "@context": {
    "entityId": "http://example.com/node/1",
    "dc": "http://purl.org/dc/terms/",
    "title": "http://example.com/schema/fields/title",
    "tags": { "@id": "http://example.com/schema/fields/tags",
              "@container": "dc:language"}
  },
  "@id": "entityId",
  "tags": {
    "de": [
      {
        "@id": "http://example.com/taxonomy/term/1",
        "title": "Das Kapital"
      },
     ...
    ],
    "en": [
      {
        "@id": "http://example.com/taxonomy/term/1",
        "title": "Capital"
      }
    ]
  }
}

This would be equivalent to

{
  "@context": {
    "entityId": "http://example.com/node/1",
    "dc": "http://purl.org/dc/terms/",
    "title": "http://example.com/schema/fields/title",
    "tags": { "@id": "http://example.com/schema/fields/tags" }
  },
  "@id": "entityId",
  "tags": [
    {
      "@id": "http://example.com/taxonomy/term/1",
      "title": "Das Kapital",
      "dc:language": "de"
    },
    ...
    {
      "@id": "http://example.com/taxonomy/term/1",
      "title": "Capital",
      "dc:language": "en"
    }
  ]
}

Which makes it impossible to distinguish whether the English and the German was applied to "entityId" or whether there are really two versions of "entityId", an English one with the English tag applied and the German one with the German one.

You will run into the same problem with the author - you do not translate the author's name but apply it to the translated version of the article.

A potential alternative to the use of named graphs would be the use of different entity IDs for each version of an article. Something along the lines of

{
  "@context": { ... },
  "@graph": [           <-- this is the default graph, not a named graph
    { 
      "@context": { "@language": "de" },
      "@id": "entityId#de",
      "tags":
        {
          "@id": "http://example.com/taxonomy/term/1",
          "title": "Das Kapital"
        }
      }
    },
    {
      "@context": { "@language": "en" },
      "@id": "entityId#en",
      "tags":
        {
          "@id": "http://example.com/taxonomy/term/1",
          "title": "Capital"
        }
      }
    }
  ]
}

Of course you could also just have a generic top-level entity for the article and then mint a "translations" property to which you associate all localized data.

{
  "@id": "entityId",
  "translations": [
    { "@id": "entityId/de", "other": "properties" },
    { "@id": "entityId/en", "other": "properties" }
  ]
}

If we are going to introduce subject maps as proposed by Niklas you could even access the language specific properties directly

{
  "@context": {
    ..
    "translations": { "@id": "...", "@container": "@id" }
  "@id": "entityId",
  "translations": {
    "entityId/de" : {
      "other": "properties"
    },
    "entityId/en": {
      "other": "properties"
    }
  }
}

I think the key point here is that you do not just translate some values but have completely (well almost) different data for the various languages (different authors, different tags - not just translations of their labels, different publications dates, etc.) and you might be better of treating them as such. How is this modelled Drupal-internal?

Hope this helps

Markus Lanthaler
@markuslanthaler

Thanks for your feedback on

Posted by linclark on July 30, 2012 at 1:50pm

Thanks for your feedback on this :)

You are right that we do have different data per language version, which I needed clarification on as well. I asked questions about this in the WSCCI Web Services Format Sprint Report and in one of our meetings with the D8 multilingual initiative lead.

Regarding Drupal-internal modeling, we are changing the way we model things from D7 to D8. In D7, each language version was handled as its own entity. I haven't been involved in the Multilingual initiative, so others can speak better to the current plan than I can. However, my understanding from others' replies is that we will rely on field-level translations.

An example node with field-level translations. Note that the field_tags is an entity reference, so it refers to different resources by their (internal) IDs.

<?php
array(
  'title' => array(
    'en' => array(
      array(
        'value' => 'Title in English',
      ),
    ),
    'de' => array(
      array(
        'value' => 'Titel in Deutsch',
      ),
    ),
  ),
  'field_tags' => array(
    'en' => array(
      array(
        'tid' => 1,
        'taxonomy_term' => stdClass, 
      ),
    ),
    'de' => array(
      array(
        'tid' => 1,
        'taxonomy_term' => stdClass, 
      ),
      array(
        'tid' => 2,
        'taxonomy_term' => stdClass, 
      ),
    ),
  ),
);
?>

Larry has stated a preference for using a single entity ID, rather than different IDs for different language versions. We'll have to see how it works out, but I'm hoping that we can make it work without minting different ids per language version.

Handling each field value as a blank node (as Niklas L's proposal does) allows us to associate a language with that field value, even if the value is a reference and not a literal. This more closely matches the way language is handled internally.

You're right that my version doesn't really do this, since I use the "@id" key. When transformed to other RDF formats, the field value will simply be a pointer to that other ID and we will loose the language versioning for reference fields. However, when playing with Niklas L's expanded snippet, the language versioning (using dc:language) does seem to be maintained within JSON-LD expansion and compaction. We should consider whether maintaining the language separation in other RDF formats matters to us as a community.

It does also make it more difficult to reference a specific language version of an entity from another entity (for example, to create a reference from Gábor's user specifically to the Hungarian version of a node). Since we plan to drop translation IDs, I assume we have another way of handling this internally. Can anyone involved in the D8MI explain how these types of references will work?

Collections/Indexes

Posted by ethanw on August 22, 2012 at 4:39pm

A few notes from the DrupalCon Munich JSON-LD conversation, specifically re: handling indexes, collections of nodes (such as Views output), etc.

The two proposals raised were (a) something similar/derived from AtomPub (an option discussed at the serialization format sprint) and (b) using JSON-LD graphs to represent the dataset, likely storing the individual list items as data or links in an array property of the collection "item".

This would allow additional data relating to the collection to be included, such as pager status, view parameters, etc.

References vs. Embedded Item Serialization in Collections

Posted by ethanw on August 22, 2012 at 5:02pm

I noted that it's important to be able to request collections of items with each item serialized, as performance and # of requests is a repeated concern of those in the Backbone sessions.

Fago felt that it's not advisable to have the complete serialization for linked entities within the referencing entity or collection container by default.

The option for adding could be to "post-process" via the Property API and add the serialization for any linked items for which full serialization is desired.

Use cases for this include bootstrapping JSON-based applications by including a large number of serialized items in the HTML page and batch loading Views or other query results.

Versioning/Revisions

Posted by ethanw on August 22, 2012 at 4:42pm

JSON-LD is agnostic about version tracking, we can track versions via VUUID, with possibility of including links to each version resource.

JSON data and Backbone JS

Posted by aswinvk28 on October 16, 2012 at 4:23pm

I have used Backbone JS for templating my site I'm currently working on. The Json data for it is getting loaded in the page by script tags.

I suppose the links generated by Backbone and underscore templates won't be indexed or searched in search engines.

Are there any ways so that the links present in the HTML created by backbone is indexed ? Or do I get redirected to the page, when searched in a search engine, where the JSON content containing the search term is present.