Hi there.
In a local organizational newspaper I have been helping with over the years, I did an extensive investigation into NewsML as a way of standardizing archiving and editing content format for hard copy production, as a way of guaranteeing that the archive would never be isolated and impossible to translate (via xslt or other means) into a newly adopted format.
NewsML (see http://www.newsml.org ) is being used by Reuters, AFP, and in tens of the world's top newspaper organization publishers (see http://www.newsml.org/pages/whouse_main.php ).
In the case in point, once they are done producing the hard copy for the printing press, there exists a function in their custom built production environment to produce html to be used for their website publication.
But wouldn't it be neat to funnel their existing NewsML based edition right into Drupal (import/export) or, better yet, simply perform an aggregation on the existing text files?
This could be achieved via adding support for NewsML as well as Atom and RSS to the aggregator (does it have support for Atom yet?); or making an aggregator module for NewsML based content.
Anyone interested?
Victor Kane
http://awebfactory.com.ar

Comments
Nodes
Aggregator may not be the right tool; it would be better to load NewsML data as full Drupal nodes.
We haven't done any work with NewsML, as we're not Reuters subscribers. AP uses NITF, and our various newsroom content management systems claim to export NITF (they exaggerate). We have a custom NITF loader for 4.6 but it's not a releasable module.
For those who aren't acquainted with either of these formats: NITF is "News Industry Text Format," an XML standard that focuses on text but supports references to media objects such as audio, video, etc.
NewsML is conceptually a wrapper around NITF with a much more sophisticated approach to bundling multiple resources of multiple types into a package.
Node is interesting
Right, NITF is one of several ways of including the data section of the NewsML document (others are XHTML, etc.). The metadata is translated into NewsML tags and attributes via XSLT.
Fascinating. Yes, a node could be interesting, in the sense that the NewsML (and NITF) metadata could be translated/retranslated to/from XML vis a vis CCK (Content Construction Kit) or custom node enabled added fields. And/Or else the full XML could be stored.
This would involve an import process from the NewsML or NITF repository (in my case I am using a native XML database, xindice, with excellent results) if that is separate.
With the NewsML as node, a separate archive would not be necessary, Drupal would serve as archive repository, and I am confident that a production workflow could be setup (Action and Workflow modules..., status metadata). This is a painful admission for me, since I developed a CMS in Java J2EE which would now be much better off using Drupal directly as a framework.
The idea of Aggregator is that, given the preexisting NewML files (and/or existing repository) as an act of god :) it would be redundant to store them again in Drupal, rather just aggregate them into a page, especially since they are in a state of flux.
Depends on the use case.
Fascinating discussion, I am sure there is room for a lot of creativity here.
As for the added value of NewsML, it would mainly lay in the inclusion of automatic topic categories (which could be easily translated into the Drupal category system), the recursive bundling of multiple resources / types, as you point out, and the built in version control system ideal for data transmission.
The other added value is the guarantee that your archive is resistent to time, i.e., can be translated into the formats of tomorrowyear: with NITF you have plenty of protection for that, there is an XLST that transforms, for example, NITF to NewSML proper, or includes a series of NITF to NewSML.
The new features of NewsML 2, soon to be released, I believe will make it much easier to work with.
The use of Drupal as a NewsML platform would, in one fell swoop, do away for the need for expensive, gobblygook solutions.
By the way, it is not just AF and Reuters, in Asia NewsML is fairly prevalent, both in China and Japan.
Victor Kane
http://awebfactory.com.ar
Victor Kane
http://awebfactory.com
Node is a requirement
At least it would be to me. The advantage to importing NewsML into Drupal is that you get all the benefits of nodes (comments, taxonomy, ratings, views, et. al.).
In similar cases, I've had success using simpleXML parsing (http://us2.php.net/manual/en/ref.simplexml.php) and the various node_insert(), node_update(), node_save() hooks to import XML into Drupal.
Aggregator2 used that approach. I've also done it for REST interfaces.
Simply^ use simpleXML to parse the data into manageable parts, then construct the node and insert it. In most cases, you also have to create an additional table to handle data associated with the node. In the case of NewsML, you might need to store multiple authors (not supported in core) or a sidebar.
^Of course, using simpleXML creates a PHP 5 dependency....
--
http://ken.therickards.com/
--
http://ken.therickards.com/
Sort of a dependency ...
The alternative to SimpleXML is the PEAR XML Serializer package, which gives you similar functionality that will work in PHP4.
Import XML into Drupal
Is it possible to import XML (not rss) into Drupal and create nodes, with some Drupal module? I tried import/export api but it wants too much data to be imported. Would such a module work with PHP4?