[GSoC Proposal]Repo Families for Better Colaboration

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
rabisg's picture

Abstract

Although Drupal has switched to Git for its Version Control needs for about an year, yet the workflow is by and far patch based, when it comes to collaborating. While some choose the path of creating sandboxes and then create an issue in the main project, yet we are far from using the power of distributed workflow that Git offers us. The following proposal aims at providing the base of using Git-like workflow while collaborating on projects.

Previous Discussion(Summary)

The detailed description of repo-families can be found here[2].
Following the discussion and this follow up[4] the conclusion is that the community is on the side of 'One repository per issue'.
Even [1] points in the same direction though the UI is slightly different from what @Cornil explained in [4].
Summarizing the approach:

  1. To have a Git-like workflow, we would start by creating Per-Issue Repositories.
  2. Each issue will have its repository. Everyone works at the same place other than in the case where 2 people approach the same problem differently and they decide to branch.
  3. When a user has a fair amount of code that he want to push, he can comment on the main issue queue where he can additionally specify branch or even the Maintainer can have a look at the related sandboxes view and decide to merge tip of branches.
  4. Each per-issue repo(s) are maintained by a set of maintainers who can push commits to the issue-repo and the Maintainer of the project pushes it into the main repo.

Access Permissions

  • The power to create a new repo for an issue can lie with either the Maintainer for the project or be open. The former will eliminate issue-duplication though.
  • Each maintainer can appoint a set of co-maintainers who have commit-access to the various issue repositories(but not necessarily all of them, or the main repository). Adding maintainers per repo can also be an option.

Approach

Phase 1: VC_APi, VC_Git, VC_Project for D7

The goal is to understand Version Control* projects by brainstorming with mentors and simultaneously porting VC_Project it to D7(if its not done already).
This phase would involve writing/editing simpletests for the given modules as this would help me better understand how the module works.

Phase 2: Introduce the concept of Repo Families on VersionControl API and VersionControl Git and Version Control Project

We would have to implement a new table for referencing repositories to Issues(the unique key in this case being the 'Issue Number') and provide an API for the same. Since these are normal repositories, branch, merge, commit actions would not change.
Changes on Version Control and Version Control Git (according to the previously mentioned API) to allow for repo families.

Phase 3: Branches on per-issue Repositories

The first part would involve providing a help text on the Issue page (like the Version Control Tab on Projects Page) providing step by step instructions how to create branches. The more important part would be to allow users to specify the current branch they are working on in comments (much like users attach patches now). We can also look at generating patches when they specify the branch.
Another aspect of this Phase would be to club the work done in Phase 1 and Phase 2 to create access permissions for these per-issue repo(s).
(access permissions for repo(s) described above)

Phase 4: Creating views handlers for new database tables/fields on versioncontrol_project and change/add default views to it.

Presently VersionControl Project has views for commits(user/global git). A standard view showing sandboxes for different issues on a specific project would have to be added. Also, the different branches can be represented as a tree, the tip of each branch being a patch(have to discuss this further). So that the maintainer can review various issues and merge the best branch (in short auto generation of patches).

Timeline

May 2: End-Semesters end.
till May 5: Initial Reading Phase and brainstorming with mentor.
May 6-May 22: Phase 1:Writing Simpletests for the given modules and porting VC_Project if necessary.
May 23-June 15: Phase 2
June 16-July 5: Phase 3
July 6-July 9: Testing the code till now and giving final touches. Writing Test Cases if time permits
Mid-Term Evaluation
July 10-July 17: Phase 4
College Re-Opens on July 25
July 18-August 13: Writing Simpletest.(Contributing to the test suite of VC Projects.
Final Evaluation

Who am I?

Rabi Shanker Guha, a undergraduate student at Indian Institute of Technology, Kanpur currently pursuing my B.Tech in Computer Science and Engineering. A open source enthusiast, who loves to Code, Sleep and Eat in the same order :)
I have been working on a variety of projects ranging from OpenCV to App/Website Development and participating in challenges like Algorithmic Challenge, Unknown Language Challenge, Development Competitions, etc.
Familiar with C/C++/Java/Python/MIPS and PHP/MySQL/Javascript/JQuery.

Why Drupal?

I have been working Drupal for quite some time by now(mostly worked on understanding the API and learning to write Modules, etc). I had created my first website using Drupal at a Hack-a-Thon and am currently working on an intranet portal for my college. Though I am highly motivated by the OSS ideology, I haven't really contributed much except for a few patches to Statuses under the guidance of IceCreamYou. I want to make utmost of this summer by contributing to Drupal.
Frankly, the success stories of past GSoC participants at Drupal motivated me a lot.

How will the community gain?

If the project goes through and we are able to come up with code base for repo-families, it will radically change the way Drupal contributors collaborate and lead to better contribution workflow. This was discussed recently at BOF at Denver[3] and many other times. I pledge to give in my full effort towards achieving of the same.

Projects

The following projects would serve as the base for my project

  • Version Control API
  • Version Control Git
  • Version Control Project
  • Comments

    Links

    rabisg's picture

    Couldnt submit it in the main proposal without being considered a spammer.

    References

    This proposal is very strongly based on

    1. Provide better integration of sandboxes and parent projects
    2. Personal sandboxes/repos/branches for issues for git
    3. BoF Discussion notes from Denver 2012 DrupalCon
    4. A detailed description of the workflows after we implemented phase 3
    5. and interactions with @sdboyer, and other gurus over IRC and through mails

    On the abstract you do not

    marvil07's picture

    On the abstract you do not differentiate between github and git, that could be a major problem. In general you need to assume github is not always right, and on drupal community we may need to have different ways of interaction, based on our own community history.

    1. To have a fully functional VC_API and VC_Git for D7.

    I guess that a D7 version will be ready when GSoC coding time starts, at least that's the plan. I have seen that you are planning that point to happen before coding time starts, so it's ok.

    2. To facilitate creation of per-issue repo.

    I would really like to see that happen, and it has a good chance to end up on drupal.org. But I think you are not thinking yet on all the sides, and I understand it, since I guess you still do not play enough with the related modules.

    This feature will need basically:

    • Introduce the idea of repository families on versioncontrol. Please notice IMHO this will be helpful only on DVCS.
    • Make changes on versioncontrol_git according to the last change.
    • Change versioncontrol_project to support the same idea of family, but for projects. I guess that is one of the places that the feature will need more code.

    3. Allow users to create and work on seperate branches when a reviewer/maintainer decides whose patch to commit.

    Another good feature will be autogenerating patches for review based on tip of branches, but maybe is enough with the git web viewer(a lot of them has arbitrary hash diffs).

    4. Provide a interface for creating pull requests allowing users to specify branch in comments.

    This could be a good feature, but I guess it will not be on drupal.org because IMHO it will not work a per user project repository.

    Based on what, IIRC, sun mentioned somewhere, per user per project repositories will be a nightmare to follow. It's already tricky to follow now what is happening on big projects, like drupal core, but it will be trickier if things happen on sandboxes for a long time without reviews(then big patches will be rejected: think burnout).

    Since this will not be so interesting for drupal.org, I would say that you could write/improve the test suite for versioncontrol* modules, we definitely need help there.

    5. Create views for related sandboxes.

    Yes, that's definitely part of the work.

    I would like to suggest to take a deeper look on versioncontrol* related projects to understand a little more what you will be doing, and make a great proposal.

    This could be a good feature,

    rabisg's picture

    This could be a good feature, but I guess it will not be on drupal.org because IMHO it will not work a per user project repository.

    Actually, what I was suggesting is that people will work on different branches for the same repo(per issue per project) and they have an option to specify the branch they are working on in the comments as @webchick specifies on https://img.skitch.com/20120223-kmpfw3uqmkwbuik65t755uee6j.png.
    So its not per user per project but per issue per project and per user branch(if they want)

    Correct me if I am wrong.

    @marvil: I hope I have

    rabisg's picture

    @marvil: I hope I have addressed all the points you have mentioned

    The following proposal aims

    marvil07's picture

    The following proposal aims at providing the base of using Git-like workflow while collaborating on projects.

    Git does not enforce any workflow. In the other side patch-based workflow is not bad either, i.e. git project itself use a mailing list to interchange patches.

    Phase 1: VC_APi and VC_Git for D7

    The goal is to understand Version Control* projects by brainstorming with mentors and simultaneously porting it to D7.
    ...
    May 3-May 22: Phase 1 + Testing VC_API and VC_API Git

    As mentioned, it's highly probable that d7 version of versioncontrol and versioncontrol_git projects will be ready by may 3rd. The other one you will really need will be versioncontrol_project, I guess you could work on that port if it is not ready by that date.

    In the other side you mention testing, I guess you mean try. But actually I would really appreciate if you can work on the tests(adding/editing simpletests) for versioncontrol, versioncontrol_git and versioncontrol_project on this phase 1.

    I would say that it will help you a lot if by May 3rd you already know how all the three mentioned projects work together(reading code, testing how it works). It's a good amount of code, that's what I am trying to suggest strongly that you read it/test it pretty early in the schedule, I would say before phase 1.

    Phase 2: Introduce the concept of Repo Families on VersionControl API and VersionControl Git
    ...
    In addition to this we would have to implement a new table for referencing repositories to Issues(the unique key in this case being the 'Issue Number') and provide an API for the same.

    I would say that's not a good idea. Again, please notice that versioncontrol does not know anything about projects, so it definitely does not want to know about issues.

    versioncontrol_project is the module that let map projects to versioncontrol repositories, so that would be the place to store that mapping(or maybe another submodule on versioncontrol_project, since it is actually a separate module, project_issue, which do the issues feature).

    Phase 3: Branches on per-issue Repositories
    ...
    The more important part would be to allow users to specify the current branch they are working on in comments (much like users attach patches now).

    Please notice that we do not want to promote per user branches, since we would like to encourage collaboration, that's one of the reasons why we are implementing per-issue repositories instead of per user. Instead, I guess the community will embrace more the idea of one branch when multiple people add commits, and only if necessary someone could add another branch if a new way of do it(tm) is needed.

    Phase 4: Creating Views for related sandboxes

    I guess you mean creating views handlers for new database tables/fields on versioncontrol_project and change/add default views to it.

    About the timelinefor this phase, you are proposing basically one month, that should not take more that 1 week IMHO.

    Finally, you are not yet mentioning versioncontrol_project, and that's the place where most of the code will be taking place, are you sure you reviewed it?

    I would say that it will help

    rabisg's picture

    I would say that it will help you a lot if by May 3rd you already know how all the three mentioned projects work together

    I understand your concern but my coursework will be in full swing before that and hence I do not want to make a promise I might not be able to keep. I'll try my best but still have kept 3 days as buffer, in which I can dedicate as much time as will be required.

    only if necessary someone could add another branch if a new way of do it(tm) is needed.

    Thats what I meant, but looking back I think that was not clear. The concept of branching was there only if required, otherwise everyone would work on the same.

    I guess the community will embrace more the idea of one branch when multiple people add commits

    I had formed a different opinion based on http://groups.drupal.org/node/50438. I was thinking that people would issue pull requests(sort of) and only the maintainer would have committ access. IMHO, otherwise there would be confusion. I am keeping it as maintainer now. I'll change that by discussing with you.(not on melange though)

    I guess you mean creating views handlers for new database tables/fields on versioncontrol_project and change/add default views to it.

    Frankly, I am not very experienced with this.

    About the timelinefor this phase, you are proposing basically one month, that should not take more that 1 week IMHO.

    Yes, i understand it will not take 1 monthat any cost. The idea was to keep this as a buffer period to add simpletests. But I have modified it now.