Hi all!
Overview:
This project intends to enable Drupal importing data from various existing WebAPIs (e.g. Twitter, Facebook, XML, RDF, etc) and put it into Drupal's element (nodes, users, database data, etc..)
This is achieved by utilizing Views and Feeds module in Drupal.
The process of importing external data from WebAPIs will be:
external WebAPIs -> Views -> Feeds -> Drupal (nodes, users, database data, etc..)
About Me:
I am Aditya, you can call me Adi for short, 3rd year Electrical Eng. in National University of Singapore. I have some experience in Programming Contest (IOI, National Science Olympiad - Gold Medalist, BNPC-HS National Scale Competition - Withstanding Champion, ACM-ICPC Regional Contest). While it might not be directly related to Drupal, the experience helps me to quickly learn and read about codes written by other people. I have earned experience in PHP as well as database system (MSSQL and MySQL) from developing various software and Intranet using symfony (PHP framework, ongoing, so no showcase yet) and .NET (details here)
As for experience in Drupal, I have developed several sites in Drupal (details here). But honestly because Drupal is so awesome, so far I only did little bit of coding to make Drupal works as I wanted to. To help myself on learning Drupal (especially Feeds, Views and Drupal core), from last week I have been submitting minor patches and documentations into Feeds, Views and Drupal Core.
Description:
Inspired by: http://groups.drupal.org/node/57168 by dereine
Creating generic Feeds Importer that fetch the data from Views output and pass it into Feeds module for further processing it into Drupal (e.g. inserting into nodes, users, database, etc). The data from the Views can come from any sources given the Views Backend Query is implemented.
The process of importing external data from WebAPIs into Drupal will be (the focus of GSOC Project is in BOLD):
external Web API -> Views Backend Query (filters, arguments, fields) -> Views -> Feeds -> Drupal (database, node, user, etc..)
If time permits, in GSOC or even beyond that, I would like further develop or co-develop some Views Backend Query to add support for common formats (e.g. XML, RDF, Facebook, YQL, Google Codebase API, etc..). When the Views Backend Query is further established, this package will be awesome! :)
Use Case:
- Combined with appropriate Views Query Backend, Drupal can fetch data from existing web application API (e.g. Twitter, Facebook, XML, RDF) easily without any coding required (including your own web application!).
- After fetching data from web application API, the data can be further processed into Drupal by using Feeds module (e.g. putting into database, into node in Drupal, etc..)
Implementation:
Implementation from Views side:
- Views Type for common Formats (containing the Views Query Backend - addition if time permits/extend after GSOC)
- Views Display (to output the data in PHP array that can be used by Feeds Importer)
Implementation from Feeds side:
-
Feeds Importer
- Fetcher: implement dummy fetcher, because the data fetching is already been done in Views
- Parser: execute Views and in case of batch import is needed (e.g. in case of multiple page views output), prepare data from Views and initialize it for batching.
- Processor: no additional implementation needed. use existing processor (to output to database, node, user, etc)
Timeline
I can start working after my last exam, 4 May 2010. I have nothing else planned and will delegate almost all my time on GSOC (40+ hours/week) for the 3 months vacation break. There might be a 3-4 days holiday trip in the middle.
- Until Week 1: Familiarising with Views and Feeds, keep in touch with the communities. Basically continuing what I am doing now.
- Week 1-3: Finishing Views Display (output into PHP Array) + Testing
- Week 4-9: Finishing Feeds Importer - Parser + Testing
- Week 10-12: Cleaning up code + Docs + if there is extra time, start building on Views Query Backend
Proposed Mentors: (the names here are already contacted and agreed to mentor in case the proposal goes through)
Main mentor: dereine (Views)
Co-mentor (help with technical questions only): alex_b (Feeds)
Contact Details:
- aditya.kristanto@gmail.com (Google Talk/Primary e-mail, most of the time online)
- aditya_kristanto@hotmail.com (MSN, rarely online but will be online on request)
- http://drupal.org/user/309310
- Location: Singapore, UTC+8
Difficulty:
Medium-Hard

Comments
Any comments?
Oh yea, btw.. Any comments/suggestions? Feasibility of the project? :)
At first I have the thought that the user should be able to format the output freely. E.g. you have this data structure (structure taken from Wiki):
<quiz><question>which one comes first, egg or chicken?</question>
<answer>chicken</answer>
</quiz>
By giving this code into the Views module:
Question: [xml-quiz-question]Answer: [xml-quiz-answer]
will output
Question: which one comes first, egg or chicken?Answer: chicken
Just as what the output replacement function in the Views did. This will be getting trickier if there are multiple elements in the same hierarchy. Thus, I don't dare to say anything about this without dwelling depth into the Views code and the specification of the Formats (XML, RDF, etc..).
I'm not really sure what you
I'm not really sure what you have in mind to build:
Importing data from several standard format for Views fieldI suggest to writing a import plugin for the feeds module, which allows to take any kind of views data and import it into drupal, you don't have to write all this import code, its already in feeds.
So your task would be:
* Write the import plugin for feeds
* Write some query backends
I guess there should be some discussion first at http://groups.drupal.org/node/57198#comment-162118
Ah what i forgot: I hope you
Ah what i forgot: I hope you plan to write a contrib module not a patch for views. For me this definitive shoulnd't be part of views.
Project Scope Modification and Clarification
Thanks @dereine for the feedback. I was not aware of the Feeds module when I wrote the proposals above. I have just played around with the Feeds module, skimmed through the Feeds developer guide and skimmed through some of the Feeds source files.
I changed the Title, Description and the Implementation part to clarify the project scope. Please comment on it :)
By using the existing Feeds
I'm not sure whether i got you right, but you want to write a feed parser foreach type?
I would suggest to build a generic feed parser and the view returns the data. Then any parser logic could be moved to the query plugin, which is needed anyway. I hope you understand this :)
So the filtering can be done full by views. I know that writing views query backens is not that easy.
So here is a try of a chart
External source -> views -> feeds -> database -> views for displaying
@aditya_kristanto
Do you have irc, we could discuss the idea better there.
For querying RDF with Views
For querying RDF with Views implementing a sparql backend would be awesome.
Refined Description and Implementation
Changed the Description and Implementation based on chat yesterday with @dereine. Thanks @dereine! Along with the addition of (very) rough timeline. Please comment on it! :)
Mh: Generic feed parser will
Mh:
I still think we have something else in mind :) I would have written:
@fago: thanks for the lead :)
@fago: thanks for the lead :) I was not very familiar with rdf and only starts to look through it these past few days. I will try to look at sparql, arc2 and all the dependent libraries to understand more about it before saying much. But I guess it is (if it isn't it will be) one of the important format in web standard, will put priority on it. Sorry I didn't reply to your comment beforehand. I swear I didn't see it when writing comments :D
@dereine: ok, changing it :)
Some lines with
Some lines with alex_b
[21:05] <dereine> alex_b: did you already saw the gsoc idea of writing a FeedsPlugin to use any data from views (which can come from any external service) and import it with feeds?[21:05] <alex_b> dereine: I haven't
[21:05] <alex_b> dereine: that sounds interesting.
[21:06] <dereine> alex_b: http://groups.drupal.org/node/57223
[21:06] <dereine> alex_b: i would like to know what do you think about this in general. i'm not sure whether you have planned something similar already
[21:07] <alex_b> dereine: I haven't.
[21:07] <alex_b> hm.
[21:08] <alex_b> but just yesterday I was thinking of a views style plugin that would write its output into a MYSQL table that you would have automatically generated views integration.
[21:08] <alex_b> dereine: *SQL table
[21:09] <alex_b> dereine: the use case being complex analysis scenarios (control schema for complex queries) or caching
[21:11] <alex_b> dereine: http://groups.drupal.org/node/57223 would essentially be a Views based Feeds Parser.
[21:11] <Druplicon> http://groups.drupal.org/node/57223 => XML, RDF, JSON, and other formats Parser using Views Module - as suggested by @dereine => 9 comments, 2 IRC mentions
[21:11] <alex_b> dereine: Feeds distinguishes between a fetching, parsing and processing stage on import
[21:11] <dereine> alex_b: whats fetching exactly?
[21:12] <dereine> alex_b: the cool thing is that he also wants to write some views query plugins so feeds could use a lot of new data without extra effort
[21:12] <alex_b> between parsing and processing there sits a mapper that contains configuration on what field to map where in Drupal.
[21:12] <alex_b> dereine: the fetcher would be e. g. a HTTP fetcher fetching a document via HTTP GET request
[21:12] <dereine> ah ok
[21:12] <alex_b> dereine: in the GSoC scenario fetching would also be the responsibility of Views
[21:13] <alex_b> dereine: I would solve that by providing a dummy fetcher.
[21:14] <alex_b> dereine: the result of the view would be exposed to the processor via the mapping API (FeedsParser::getSourceElements())
[21:14] <dereine> alex_b: which plugin type would do the "pager"-ing
[21:15] <alex_b> dereine: you mean aggregating a multi-page source?
[21:15] <dereine> alex_b: yes
[21:15] <alex_b> dereine: that would be responsibilty of the parser plugin again
[21:15] <dereine> ah fine, so its possible, thats nice
[21:15] <alex_b> dereine: this would be basically "batching" parsing
[21:15] <alex_b> there is batching support for processing right now
[21:15] <alex_b> there are already feature requests for batching on parsing
[21:16] <alex_b> it can then be up to the parser to decide on how to batch.
[21:16] <alex_b> e. g. : batch chunks of 1000 lines from a CSV file or batch page by page from a remote resource = same thing
Revamped Proposal
One week worth of researching, reading and talking into the community :) Special thanks to @dereine and @alex_b for the help.
Updated: almost all aspects of the proposal, making it more concise and further elaboration in implementation and description.
Please do comment! :)
The new time line is focused
The new time line is focused much more on the feeds part. I like this.
For me this proposal looks fine.
Minor revision in
Minor revision in implementation and timeline, based on implementation approach discussed with alex_b for the feeds part. Confirmed mentor and co-mentor (i.e. providing time for technical qs only) if the proposal goes through :)
GSOC Proposal
GSOC submitted proposal is up in http://socghop.appspot.com/gsoc/student_proposal/private/google/gsoc2010...
its private
The URL you provided is visible only to you, or mentors...here is the public,
http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/ad...
Thanks publicmind for
Thanks publicmind for pointing that out :)