Remote Field Revision

neoliminal's picture

This discussion will revolve around issues of Remote Field Revisions. This regards getting field population from single, multiple, and unknown sources. Sending data to other sites and how they can use, modify, and potentially change that data from source. Also regarding race state and origination of data sources and trusted sources for field revision.

Login to post comments

field flowing on servers recursively A -> B -> A -> B

neoliminal's picture
neoliminal - Wed, 2008-03-05 20:36

If a field is replicated on two machines and each is pointed to the other you have potential for recursion.

Example:

  1. Server Alice creates a Title Field.
  2. Server Alice sends Title Field to Server Bob.
  3. Server Bob accepts Title Field.
  4. Server Bob send the same Title Field to Server Alice.
  5. Server Alice, with a revision to Title, then sends this revision to Server Bob...

Possible solution sets:

  • Hash attribute on field. Server checks that it has already had current revision based on Hash and does not send out revision to other servers.
  • Originating Server attribute on field. Originating server (Alice) could be included and when originating service receives field it ignores the revision. (problems with other more complicated server node patterns A -> B -> C -> B-> C)

Local revisions control of fields. A -> B(b) !-> C

neoliminal's picture
neoliminal - Wed, 2008-03-05 20:37

If a server sends a field, the secondary server should have the ability to revise the field and not forward on the revised field (keeping it as a local change only.) It should still forward on the original field.

Example:

  1. Server Alice creates Title Field(a).
  2. Server Alice sends Title Field(a) to server Bob.
  3. Server Bob revises Title Field(a) locally, and creates Title Field(b). It does not want this revision to be forwarded.
  4. Server Bob send Title Field(a) to server Carol.

Possible Solution:

  • Server Bob needs the ability to mark a revision as local only so that it's revised content does not spread past it's own server but to spread the original content as it normally would. This is likely a content manager control panel where a toggle would allow for revision locally only to any particular field.

Multiple Field sources that don't agree. A -> C(a), B -> C(b)

neoliminal's picture
neoliminal - Thu, 2008-12-11 19:02

A server can aggregate a Field from multiple sources. A system needs to be created to prioritize source input should these Fields disagree.

Example:

  1. Server Alice creates Title Field(a).
  2. Server Bob creates Title Field(b).
  3. Title Field(a) != Title Field(b).
  4. Server Alice sends Title Field(a) to Server Carol.
  5. Server Bob sends Title Field(b) to Server Carol.
  6. Server Carol decides which source is more authoritative for this Field. Server Carol chooses Title Field(a).

Possible Solutions:

  • Allow servers to prioritize sources in an order that puts any given source above another. Server Alice is above Server Bob in priority and takes precedence as authoritative replacing a previous field.
  • If Fields come in are of the priority order and do not agree, configurations should be made to either automatically use the authoritative Field from a higher ranked server as a revision to the Field or to retain the older (less authoritative data) to be manually promoted.

Receiving Servers find erroneous Field Data A -> B(x) -> A(x)

neoliminal's picture
neoliminal - Wed, 2008-03-05 18:43

If a Field is send from a server and it's data is either NULL, inaccurate, munged or otherwise unusable, there should be a mechanism for the receiving site to notify the first server that the data is invalid.

Example:

  1. Server Alice creates Title Field.
  2. The Title Field(a) from Alice is invalid data resulting in Title Field(x).
  3. Server Alice sends Title Field(x) to Server Bob.
  4. Server Bob rejects Title Field(x) as valid data.
  5. Server Bob sends back an error flag to Server Alice. Title Field(x)(invalid).
  6. Server Alice stops sending Title Field(x), deletes/modifies/revises Title Field(x) to correct data Title Field(a).

Possible Solutions:

  • Invalid data Fields from a server should have the ability to receive an error message from receiving servers.
  • Like user moderation modules, there should be a threshold configuration so that servers that have 10 feeds may get a message with one error, but a server with 1000 feeds may have a threshold of 45.

Questions:

  • Should errors filter back through multiple servers? For example if Server Alice feeds Server Bob, but Server Bob feeds Servers Carol, Dave, Eve, Ivan, Justin and these servers find the invalid field status, should errors sent back to Server Bob send back to Server Alice. Should multiple errors sent to Bob have more weight if it's sent to Alice?

We discussed this a bit at

KarenS - Thu, 2008-03-06 15:49

We discussed this a bit at DrupalCon and I asked John to post these issues to be sure we have them documented. These are the kinds of things we'll need to keep in mind as we work out the details.


I see little discussion

Chris Johnson's picture
Chris Johnson - Thu, 2008-12-04 18:04

I see insufficient discussion about anything but local, database-stored fields in all of the discussion. We will be severely limiting what Drupal can do if we don't consider and architect for remote data sources for fields.

We should be able to provide something like CCK-based content types which have some fields which are local to the database, and others which may be drawn from other APIs (e.g. web services). The demand for such things is already there (see Robert Douglass's posts regarding proxy-node sites, Larry Garfield's posts 3rd party data, etc.).


Web Services fields.

neoliminal's picture
neoliminal - Thu, 2008-12-11 18:47

Are you considering relying on external api's from other sites on each load of a field? Otherwise we need local copies, at least in cache, in order not to run into speed issues. We also want to be respectful of other's bandwidth/processors and not request fields from other sites on an ad-hoc basis.

This all points to local versions of these fields in the drupal database, while they will be getting the data originally from some other source.

If this isn't what you're talking about then I'm missing something.

--
John Kipling Lewis


Need both / all of the above

Chris Johnson's picture
Chris Johnson - Thu, 2008-12-11 22:19

Thanks for your reply.

I think we need both the ability to pull cached copies from remote sites (with several methods of expiring and refreshing cache, e.g. cron or age), and the ability to hit other sites on load of the field.

I would agree that in most cases, the former caching solution would be preferred, exactly for the reasons you state (speed, respect).

However, there are some situations where immediate access from another site is feasible and desirable. When the other site is both designed for such use, and when currency of the data displayed is a priority over speed of displaying, then non-caching solutions might be the choice.

I'm working on a situation more similar to the latter than the former right now. I have a very light-weight REST API to retrieve a representation of a resource the other system has. I could do a direct connect to the other system's database, but that solution does not scale or manage very well from a complexity and security point of view.

Still, I do agree local versions of remote fields in the Drupal database is something we really need.