Integrating Views output into Feeds Module for further processing - as suggested by @dereine

Events happening in the community are now at Drupal community events on www.drupal.org.
adityakg's picture

Hi all!

Overview:

This project intends to enable Drupal importing data from various existing WebAPIs (e.g. Twitter, Facebook, XML, RDF, etc) and put it into Drupal's element (nodes, users, database data, etc..)

This is achieved by utilizing Views and Feeds module in Drupal.

The process of importing external data from WebAPIs will be:
external WebAPIs -> Views -> Feeds -> Drupal (nodes, users, database data, etc..)

About Me:

I am Aditya, you can call me Adi for short, 3rd year Electrical Eng. in National University of Singapore. I have some experience in Programming Contest (IOI, National Science Olympiad - Gold Medalist, BNPC-HS National Scale Competition - Withstanding Champion, ACM-ICPC Regional Contest). While it might not be directly related to Drupal, the experience helps me to quickly learn and read about codes written by other people. I have earned experience in PHP as well as database system (MSSQL and MySQL) from developing various software and Intranet using symfony (PHP framework, ongoing, so no showcase yet) and .NET (details here)

As for experience in Drupal, I have developed several sites in Drupal (details here). But honestly because Drupal is so awesome, so far I only did little bit of coding to make Drupal works as I wanted to. To help myself on learning Drupal (especially Feeds, Views and Drupal core), from last week I have been submitting minor patches and documentations into Feeds, Views and Drupal Core.

Description:

Inspired by: http://groups.drupal.org/node/57168 by dereine

Creating generic Feeds Importer that fetch the data from Views output and pass it into Feeds module for further processing it into Drupal (e.g. inserting into nodes, users, database, etc). The data from the Views can come from any sources given the Views Backend Query is implemented.

The process of importing external data from WebAPIs into Drupal will be (the focus of GSOC Project is in BOLD):

external Web API -> Views Backend Query (filters, arguments, fields) -> Views -> Feeds -> Drupal (database, node, user, etc..)

If time permits, in GSOC or even beyond that, I would like further develop or co-develop some Views Backend Query to add support for common formats (e.g. XML, RDF, Facebook, YQL, Google Codebase API, etc..). When the Views Backend Query is further established, this package will be awesome! :)

Use Case:

  • Combined with appropriate Views Query Backend, Drupal can fetch data from existing web application API (e.g. Twitter, Facebook, XML, RDF) easily without any coding required (including your own web application!).
  • After fetching data from web application API, the data can be further processed into Drupal by using Feeds module (e.g. putting into database, into node in Drupal, etc..)

Implementation:

Implementation from Views side:

  • Views Type for common Formats (containing the Views Query Backend - addition if time permits/extend after GSOC)
  • Views Display (to output the data in PHP array that can be used by Feeds Importer)

Implementation from Feeds side:

  • Feeds Importer
    • Fetcher: implement dummy fetcher, because the data fetching is already been done in Views
    • Parser: execute Views and in case of batch import is needed (e.g. in case of multiple page views output), prepare data from Views and initialize it for batching.
    • Processor: no additional implementation needed. use existing processor (to output to database, node, user, etc)

Related modules: Views, Feeds

Timeline

I can start working after my last exam, 4 May 2010. I have nothing else planned and will delegate almost all my time on GSOC (40+ hours/week) for the 3 months vacation break. There might be a 3-4 days holiday trip in the middle.

  • Until Week 1: Familiarising with Views and Feeds, keep in touch with the communities. Basically continuing what I am doing now.
  • Week 1-3: Finishing Views Display (output into PHP Array) + Testing
  • Week 4-9: Finishing Feeds Importer - Parser + Testing
  • Week 10-12: Cleaning up code + Docs + if there is extra time, start building on Views Query Backend

Proposed Mentors: (the names here are already contacted and agreed to mentor in case the proposal goes through)

Main mentor: dereine (Views)

Co-mentor (help with technical questions only): alex_b (Feeds)

Contact Details:

Difficulty:

Medium-Hard

Comments

Any comments?

adityakg's picture

Oh yea, btw.. Any comments/suggestions? Feasibility of the project? :)

At first I have the thought that the user should be able to format the output freely. E.g. you have this data structure (structure taken from Wiki):

<quiz>
  <question>which one comes first, egg or chicken?</question>
  <answer>chicken</answer>
</quiz>

By giving this code into the Views module:

Question: [xml-quiz-question]
Answer: [xml-quiz-answer]

will output

Question: which one comes first, egg or chicken?
Answer: chicken

Just as what the output replacement function in the Views did. This will be getting trickier if there are multiple elements in the same hierarchy. Thus, I don't dare to say anything about this without dwelling depth into the Views code and the specification of the Formats (XML, RDF, etc..).

I'm not really sure what you

dawehner's picture

I'm not really sure what you have in mind to build:

Importing data from several standard format for Views field

I suggest to writing a import plugin for the feeds module, which allows to take any kind of views data and import it into drupal, you don't have to write all this import code, its already in feeds.

So your task would be:
* Write the import plugin for feeds
* Write some query backends

I guess there should be some discussion first at http://groups.drupal.org/node/57198#comment-162118

Ah what i forgot: I hope you

dawehner's picture

Ah what i forgot: I hope you plan to write a contrib module not a patch for views. For me this definitive shoulnd't be part of views.

Project Scope Modification and Clarification

adityakg's picture

Thanks @dereine for the feedback. I was not aware of the Feeds module when I wrote the proposals above. I have just played around with the Feeds module, skimmed through the Feeds developer guide and skimmed through some of the Feeds source files.

I changed the Title, Description and the Implementation part to clarify the project scope. Please comment on it :)

By using the existing Feeds

dawehner's picture

By using the existing Feeds module, I planned to create Feeds Parser for the formats mentioned above.

I'm not sure whether i got you right, but you want to write a feed parser foreach type?
I would suggest to build a generic feed parser and the view returns the data. Then any parser logic could be moved to the query plugin, which is needed anyway. I hope you understand this :)

So the filtering can be done full by views. I know that writing views query backens is not that easy.

So here is a try of a chart

External source -> views -> feeds -> database -> views for displaying

@aditya_kristanto
Do you have irc, we could discuss the idea better there.

For querying RDF with Views

fago's picture

For querying RDF with Views implementing a sparql backend would be awesome.

Refined Description and Implementation

adityakg's picture

Changed the Description and Implementation based on chat yesterday with @dereine. Thanks @dereine! Along with the addition of (very) rough timeline. Please comment on it! :)

Mh: Generic feed parser will

dawehner's picture

Mh:

Generic feed parser will need to be implemented that convert from the Formats listed above into data structure that can be used by Views

I still think we have something else in mind :) I would have written:

Generic importer which takes the data from a view and make it availible for feeds import. The data of the view can come from the listed data types above.

@fago: thanks for the lead :)

adityakg's picture

@fago: thanks for the lead :) I was not very familiar with rdf and only starts to look through it these past few days. I will try to look at sparql, arc2 and all the dependent libraries to understand more about it before saying much. But I guess it is (if it isn't it will be) one of the important format in web standard, will put priority on it. Sorry I didn't reply to your comment beforehand. I swear I didn't see it when writing comments :D

@dereine: ok, changing it :)

Some lines with

dawehner's picture

Some lines with alex_b

[21:05] <dereine> alex_b: did you already saw the gsoc idea of writing a FeedsPlugin to use any data from views (which can come from any external service) and import it with feeds?
[21:05] <alex_b> dereine: I haven't
[21:05] <alex_b> dereine: that sounds interesting.
[21:06] <dereine> alex_b: http://groups.drupal.org/node/57223
[21:06] <dereine> alex_b: i would like to know what do you think about this in general. i'm not sure whether you have planned something similar already
[21:07] <alex_b> dereine: I haven't.
[21:07] <alex_b> hm.
[21:08] <alex_b> but just yesterday I was thinking of a views style plugin that would write its output into a MYSQL table that you would have automatically generated views integration.
[21:08] <alex_b> dereine: *SQL table
[21:09] <alex_b> dereine: the use case being complex analysis scenarios (control schema for complex queries) or caching
[21:11] <alex_b> dereine: http://groups.drupal.org/node/57223 would essentially be a Views based Feeds Parser.
[21:11] <Druplicon> http://groups.drupal.org/node/57223 => XML, RDF, JSON, and other formats Parser using Views Module - as suggested by @dereine => 9 comments, 2 IRC mentions
[21:11] <alex_b> dereine: Feeds distinguishes between a fetching, parsing and processing stage on import
[21:11] <dereine> alex_b: whats fetching exactly?
[21:12] <dereine> alex_b: the cool thing is that he also wants to write some views query plugins so feeds could use a lot of new data without extra effort
[21:12] <alex_b> between parsing and processing there sits a mapper that contains configuration on what field to map where in Drupal.
[21:12] <alex_b> dereine: the fetcher would be e. g. a HTTP fetcher fetching a document via HTTP GET request
[21:12] <dereine> ah ok
[21:12] <alex_b> dereine: in the GSoC scenario fetching would also be the responsibility of Views
[21:13] <alex_b> dereine: I would solve that by providing a dummy fetcher.
[21:14] <alex_b> dereine: the result of the view would be exposed to the processor via the mapping API (FeedsParser::getSourceElements())
[21:14] <dereine> alex_b: which plugin type would do the "pager"-ing
[21:15] <alex_b> dereine: you mean aggregating a multi-page source?
[21:15] <dereine> alex_b: yes
[21:15] <alex_b> dereine: that would be responsibilty of the parser plugin again
[21:15] <dereine> ah fine, so its possible, thats nice
[21:15] <alex_b> dereine: this would be basically "batching" parsing
[21:15] <alex_b> there is batching support for processing right now
[21:15] <alex_b> there are already feature requests for batching on parsing
[21:16] <alex_b> it can then be up to the parser to decide on how to batch.
[21:16] <alex_b> e. g. : batch chunks of 1000 lines from a CSV file or batch page by page from a remote resource = same thing

Revamped Proposal

adityakg's picture

One week worth of researching, reading and talking into the community :) Special thanks to @dereine and @alex_b for the help.

Updated: almost all aspects of the proposal, making it more concise and further elaboration in implementation and description.

Please do comment! :)

The new time line is focused

dawehner's picture

The new time line is focused much more on the feeds part. I like this.

For me this proposal looks fine.

Minor revision in

adityakg's picture

Minor revision in implementation and timeline, based on implementation approach discussed with alex_b for the feeds part. Confirmed mentor and co-mentor (i.e. providing time for technical qs only) if the proposal goes through :)

GSOC Proposal

its private

publicmind's picture

The URL you provided is visible only to you, or mentors...here is the public,

http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/ad...

Thanks publicmind for

adityakg's picture

Thanks publicmind for pointing that out :)

Views Developers

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: