There's been a couple of issues raised this week which suggest we may need to have a generic discussion on module duplication and the code review process. As a result, I'm starting this thread, with the intent of 'genericizing' the discussion a bit ... stepping back from the details from the existing examples (and hopefully filtering some of the emotion out of the conversation as a result.) As one of the 'new guys' in the house, and with an outstanding application in the queue, I have some strong opinions regarding this subject - and as a result, my arguments are going to strongly favor one side of the debate ... please take this as an invitation for dialogue, as opposed to an attack on the opposing viewpoint.
Over the last couple days, I've read over a number of d.o. threads on this topic, and have attempted to incorporate them into the (admittedly biased) summary below. The first post discusses the 'purpose' for the Module Duplication check, while the rest of my posts deal with individual issues that may arise (or already exist) with the application of module duplication rules in the code review process.
I've broken things up into multiple posts in an attempt to maintain context within any comments which might follow ... please try to stick with this model while responding (breaking up responses in to different comments for each thread), and ensure you are actually replying on the conversation thread that you are commenting on.

Comments
Purpose of the 'Module Duplication' Check
Generically speaking, I would propose that the Goal of the 'Module Duplication' check in the Code Review process can be worded as follows:
From the How to review Full Project Applications page:
My take on the motivation behind this statement is that we want to prevent having 17 different ways of doing the same thing, as the multiple options causes confusion for new drupal users as they try to sort out how best to approach a given problem. In the end, the desire is to prevent the need for module comparison pages like Comparison of rotator/slider modules and Comparison of Content and User Import and Export Modules. Addressing this module duplication concern is a community-wide initiative, not limited to the folks doing Code and Project Application Reviews. (There's actually a whole other group dedicated to doing comparisons of contributed modules.)
While this check is an important part of the Code Review process, it is important to recognize that the module duplication check is just one of many Code Review (CR) goals ... another widely touted goal of the process is to provide a 'mentorship' period for new developers; introducing them to the system, teaching them best practices, and helping guide their transition into the role of 'Drupal Contributor'. Unfortunately (in my rather limited experience), the 'module duplication check' can often end up in direct conflict of this second goal.
Sandboxes
I'd suggest an alteration to the end of your summary, "being contributed to the Drupal.org repositories." Duplicate projects in sandboxes don't really hurt anything. It starts to become an issue if people are using and contributing to a sandbox project, but most sandbox projects are solo efforts, so there is no impact (neither negative nor positive) on the larger community. Especially in the context of code reviews, we only need to worry about duplication in the context of full projects. So maybe something like "being contributed as full projects on Drupal.org" would be a better ending to that summary. I think this difference has some subtle implications for how we approach more specific questions below.
Excellent point, which more
Excellent point, which more accurately captures my original intent. If I could edit the post, I would. ;)
When "Module Duplication" Checks Occur Late in the Process
In a perfect world, the module duplication check would occur before the contributor has written a single line of code. This allows the opportunity to redirect a contributor towards a collaborative effort (building on an existing module) before they have invested any time or effort into creating their own solution. This helps to eliminate wasted effort ... and more importantly, takes place before the applicant develops a strong 'emotional' attachment to their own code/contribution/approach.
Of course, this ideal situation does not fit within the Code Review process, which requires a 'completed' module/theme before an applicant can even submit a full project application. In the code review case, the recommendation becomes that a duplicate module search should be the first thing that a reviewer looks at before considering an application. However, I would argue that even then is too late ... once the code is written, the emotional attachment to the code exists, and any rejection of the contribution is going induce at least some sense of 'letdown' (and thus serves as a de-motivator for the applicant, instead of a motivator supporting the applicant's growth into a long-term contributor). Rule #1 should always be: Don't kill a new contributor's spirit before they even get a foot in the door!
Now, I'm not suggesting that a duplicate module check should not be part of the code review process ... only that a 'proper' long-term solution would address duplication long before an applicant reached the 'code review' stage. As such, the Code Review group should avoid taking on sole accountability for the 'duplicate module' check ... instead, the solution we need to develop should focus on preventing duplicate modules from ever getting into the Project Applications queue in the first place; and then if one does get through, minimizes the letdown when that application is rejected.
Proposed Solutions:
Much of the code review discussion over the last few weeks has revolved around the formalization of a 'Gate' structure for Code Reviews, which, while possibly diverging from the desire to provide 'end-to-end reviewer/mentorship' to an applicant, may enable less experienced reviewers to chip in on some of the more 'standard' chunks of the review process (such as running through Coder, checking for 3rd party licenses, etc). The first 'Gate' in any such process should be the 'duplicate module check'; ensuring that the applicant does not put any additional effort into addressing other gates if the end module is unlikely to be approved ... thus reducing (as much as possible) the sense of frustration associated with running through the approval process only to be denied at the last stage.
But, as mentioned, this only addresses modules which reach the Code Review queue ... it does not help prevent them from reaching the queue in the first place. I believe a good first step towards this goal would be to emphasise the desire to reduce duplicate modules at a number of other entry points into the 'Drupal development' process. To this end, I'd suggest generating a simple 'Reduce Module Duplication' plea to the following pages ... or, if the motivation can not be expressed in a concise enough statement, then create a dedicated 'Reduce Module Duplication' article and link to it on each of the following:
A good starting point for the text for such a plea is dman's 'Open Letter - A Proposal to Consolidate' from the 'Similar Module Review' group on g.d.o.
yes, first step
This is a really good point.
I would hope that the first thing an applicant does is write a full description of their module/theme/project and describe how it compares to similar modules (or doesnt at all), and how it might integrate with other modules.
This is really important because it puts the duplication review on the applicant, at least before the real review happens. It also shows that the applicant has understanding of the contrib space. This was being done well with the CVS applications but has fallen wayside with the sandboxes (including a full project description).
Then, yes the first step for the reviewer is to do a duplication and description check.
--
zzolo
Documentation Updates Complete
Just as an FYI, I've updated many of the documents in my list above to identify the 'module duplication' concern ... with any luck, future applicants should be more aware of this component of the review process.
Why create a module?
If you're writing a module that presumably means you want some new functionality on your site. The first thing I do in that situation is search for existing modules that do what I want. If I find such a module, I won't write my duplicate module in the first place.
So why are duplicate modules being written:
* Are people not searching for modules before writing their own?
* Are people not finding the right modules in their search, even though they exist?
* Are people finding modules that are almost what they are after, but not quite? If so, do these modules need to be more extensible?
* Are people aware of existing modules, but choose to try a different approach?
* Is there another reason?
If the first two are happening, then that suggests improvements to drupal.org and developer documentation are needed.
If it's the third, then we should find ways to help people collaborate.
If it's the fourth, then I think we should be encouraging that (e.g. CCK Fields), maybe with a caveat that it needs to show benefits over the older module before moving out of alpha.
'Module Duplication' enforcement can not be 'Black & White'
Of course, there will aways be cases where an applicant does their research, determines that their project doesn't significantly duplicate the functionality of any existing modules on Drupal.org, and goes away to happily start coding ... only to return months later to discover that, while they were building their project, another user has since contributed 'Module X', which duplicates 80% of the functionality of their own project. (Or, even worse, imagine that a 'pre-approved' user has contributed 'Module X' while the applicant's own version sat idle in the Code Review backlog queue!)
According to the current implementation, that applicant is out of luck ... even if their module meets all the other code review criteria, makes more efficient utilization of the Drupal APIs, provides a more flexible roadmap for future features, and is generally just a better-coded project in general ... their contribution/code is blocked by the 'Module Duplication' rule, and they are sent away with a suggestion to go collaborate on the 'Module X' project instead. shelleyp summed this up nicely in one of her posts:
Admittedly, the above is a contrived example, carefully constructed to support my point (and as an means to work the above quote into my post). That said, don't let that detract from the point itself ... Application of the 'Module Duplication' rules needs to be flexible enough to accomodate the uniqueness of every situation; and recognize up front that every situation is unique.
Proposed Solutions:
In phrasing the "purpose of the 'Module Duplication Check'" at the top of this page, I choose my words carefully ... The goal is to 'Reduce' the proliferation of duplicate modules, not 'Eliminate'. We want to prevent having 17 different ways of doing the same thing ... but I see nothing wrong with having 2 or 3 slightly different approaches to the same use case. In the quest to eliminate the confusion introduced by a proliferation of similar modules, we must be conscious that we do not also eliminate the benefits introduced with 'flexibility of choice'.
still needs to be first step
Just for reference, here is a great example of module duplication in the application process and a positive outcome: http://drupal.org/node/729206
I think @shelleyp's point is very valid. But, again, this comes down to duplication check being the first thing done. Yes, the idea is to reduce the proliferation of code duplication, but the actual mechanism in the application process is demonstration that the applicant has an understanding of the current contrib space.
Things will always be overlooked by both applicant and reviewer, that is why it is really important to ensure that this is the first step going forward, and that if an issue comes up in the middle of review, then it is simply unfortunate and things can be worked out from there, and there would be no need to stop an application.
--
zzolo
"Module Duplication" as a barrier to Innovation
The desire to prevent a proliferation of modules needs to be carefully balanced against the need to support/encourage innovation. In my research for this post, I came across a number of community quotes which illustrate this much better than I could alone ...
Proposed Solutions:
Enforce the duplicate module check ... but be flexible in that enforcement; support the exploration of new or innovative approaches to solving user problems.
Module duplication is definitely subjectiv
Module duplication is definitely subjective. The review should be more about the demonstration from the applicant and encouragement from the reviewer that the applicant has looked at the contrib space and can say what other modules are similar or not.
If there already exists a very similar module, then the applicant should be able to say why his/her project is valuable.
Here is what should never happen. A application comes with a project and has clearly not looked at any possible similar solutions, and the project itself basically does the same things as another existing module. Also bad, a new project that duplicates basically the same thing as a module, but with a new feature or two. This is bad for lots of reasons. Patches are good.
Overall, I believes its good to think of the Drupal contributed space as a single project. On any code project, a good coder would not duplicate code and abstract things that would be duplicate. This is the same principle with the contributed (and core) space. Of course, once you put in the social/community layer it gets pretty fuzzy. So, the questions are more about what is the status of a contributed module and if/when patches can be applied.
--
zzolo
Duplication providing a 'Lighter version' of Existing Features
In some cases, a proposed module comes along which is a strict, lightweight subset of functionality already provided by another larger, heavier module. As in the 'Barrier to Innovation' topic above, I see this as providing 'flexibility of choice' to end users ... sometimes all that is needed is a 'simple' solution; where end users don't need all of the additional overhead and complexity built into the heavier module. The following comments come from different sides of the debate on dman's 'Open Letter' post:
Proposed Solutions:
Accept 'lightweight' and 'low overhead' options for existing 'heavy' modules ... but only the 'first' one. (And validate that it's actually the applicant's code, not simply ripped from the heavier module!!!)
The KISS Principle
I'm an advocate of the KISS principle (Keep It Simple, Stupid). I agree 100% with #3, a simple, specific solution is generally preferable to a complex, generalized one. This concept is the heart and soul of modular and object-oriented systems. This approach has a lot of advantages:
Smaller, special purpose modules can be a legitimate alternative to more general, single solution software. There is, however, a need to strike a balance between choices and flexibility and module overload in Drupal. This requires an exercise of judgement. The question is, who is empowered to make these judgements?
light implies bad architecture
A "lighter" version is duplication of code. Whether it should stop an application is kind of subjective, as it could be different enough to allow someone to have full project access.
When a "lighter" solution comes along, it is an implication that the "full" solution was not architected well. Things should indeed be simple and modular. this sort of situation almost entirely means that an API layer should be provided with features and interface layers built on top.
The perfect, positive example of a "lighter" version is Views and SimpleViews. The Views provides an API module, and interface modules separately, allowing for a much more simpler interface, SimpleViews, to be built on top of it.
This is the kind of thing that should be encouraged.
In the context of an application. A new contributor should be looking to work with the existing module and become a co-maintainer to change its architecture so that a "lighter" version actually makes sense and would probably provide a more simpler solution in the end.
--
zzolo
Duplicating 'Module' functionality within a 'Theme'
This was an interesting one ... where does one draw the line between a 'feature' included in a theme, and a 'theme feature' invoked as a stand-alone module? While the Noggin module was the trigger for these discussions, it is not a unique case ... consider the photo slideshow feature built into the Danland theme, or the Rotating Image Showcase Block built into the Fever theme ... especially in the context of webchick's 'How many ways are there to build a Rotating Slideshow' slide from her Drupal Copenhagen Pre-Conference presentation.
Yes, it's blatant duplication of functionality that's available in an existing module. We want to make things easier for the end users by reducing module duplication. And so, we tell the themer that they should remove the duplicated code from their theme.
The net result? Our effort to 'simplify things' for the end users means that instead of simply installing the theme, they now have to install a theme AND A MODULE? How is this possibly better for the end user?
Proposed Solutions:
My thought on this one is that, when you consider it from the end user's perspective, the duplication does provide value and should therefore be allowed, on two conditions:
1) The functionality in question is definitively 'theme' related, and
2) The duplicated code doesn't infringe on the module version's namespace.
Duplicating Module Functionality within a Theme
In this case, I agree with the decision that was made by the reviewers, but not necessarily for the reasons given. We're really talking about the Jalapeno MDB theme that incorporated the code from the Noggin module.There are three issues here.
another fuzzy line
It is definitely a fuzzy line between what should live in a module or in a theme. Overall, a theme is providing a different way for Drupal to display its default output; plain and simple. A good rule of thumb is that if it can live comfortably in a module, it should. If something is in a module, it can actually be used on all sites, instead of just sites that want to use that theme.
But, an image slider could be a broader module for Views, or it could be just a home page nicety. Its hard to say.
Still, there is no practical reason to duplicate code that is in a module into a theme. On the flip side, there is the communities issues that could warrant it, like lack of response from the maintainer or otherwise abandoned module.
I am really against the argument of the user installing more than 1 module. I do fully understand given the current system, this can get to be annoying of a user. But 1) a typical Drupal site has dozens of modules installed, one more is not that big of a deal, and more importantly 2) this is an issue of Drupal dependencies and the installation system, and not an argument for code duplication. When you install a package on a Linux system, you'll probably end up installing 5 more packages with it because of dependencies; the community would be pretty pissed if a package just duplicated the code they needed in their package (without a really good reason).
--
zzolo
module installation...
I've said in a couple of related threads recently about the "1 more module" issue. Just to clarify, it's not the actual process of installing or maintaining a module that I dislike, it's when the module functionality "could" reasonably be thought of as a theme feature. Comparing the current drupal thinking with other leading CMS's, without themes carrying this type of functionality it is not making Drupal competitive. Then there's always the difference between new contributors and "full project maintainers" in that they can release pretty much any functionality without having to provide the justifications.... which new contributors could deem as unfair.
To my thinking, usability should be prioritised above a duplication consideration.. ie. does a particular functionality add value in the proposed implementation. That said I am totally behind having functionality that is module based and being able to be used regardless of theme. But, I do believe that it should not be a either/or discussion, rather a "does it add value".
Dreamleaf Media
Hi @dreamleaf, thanks for the
Hi @dreamleaf, thanks for the response. Once someone has the Full Project access, they can create any theme, modules, or install profile they want, and people without this can only create sandboxes. I think I am a little confused on why you bring this up or say it is could be viewed as "unfair".
I am not sure how competitiveness comes into this. Whether something lives in a module or theme does not change the fact that Drupal "offers" that functionality.
Yes, there are things that could reasonably live within a theme or module, but either way, if it exists in one or the other already, then duplicating it is still duplicating it.
Usability is a high prioritization, and yes this is about usability, but to actually solve the problem is to fix the system, not by duplicating a project. And to say one is more important than the other is very subject; project/feature duplication causes lot of aggravation to every one in the community from people just starting to veteran Drupalers.
--
zzolo
For someone new to the
For someone new to the druposphere, seeing established maintainers releasing features within modules and themes - without a review process and simultaneously getting informed that xyz feature cannot be included due to it already being available - during their review process.
Personally I've been lurking around Drupal long enough to understand where the motivation is coming from, and this is where the "appearance" of unfairness creeps in. Even in my own review process I highlighted several projects that offer similar functionality and other examples of crossover projects that have been created, whilst I can take the hit and remove the feature from my submission others may not be as willing and think that there is a culture like an old boys club.
On the competitive point, this is mainly thinking about where new contributors will come from... my guess is that a lot will come from other CMS's as opposed to people learning to write code so they can be a part of Drupal. However, the more barriers there are to entry, the higher the rate of contributor dissatisfaction there will be. Although, I do agree there has to be a level of "your work is cack, please work on it and then resubmit" to maintain a good experience within Drupal.
And usability, is a whole conversation on it's own, but I think it's important to remember that the usability for those submitting projects is entirely different to that of a "user" that wants as much to work easily without having to learn too much. Installing a module (especially in D7) is childs play, but is still several clicks, an upload and then activation... compared to a single theme install and navigating to a theme settings page (in a specific use case).
I think overall, the main issue is that the review process has to be fair to all, yet there are waaay too many variants to be able to address each in a broad brush approach. But what I wouldn't like to see flourish is a disparity between reviewers, so that how a submission gets reviewed is dependent upon the reviewer you get on the day.
Dreamleaf Media
agreed
Though I could argue with the other points, this is dead on. We need a consistent, efficient, and valuable experience for applicants and reviewers. And right now, it is pretty far from that. This is why I have took up the torch (or as best I can in my limited time) and started documenting the process and creating this group, so that we can define a more standard process around reviews so that they are as objective and efficient as they can be.
But, in reality, this will take some time, and there is lots of balancing between ensuring objectivity and recruiting volunteers. And both takes lots of time. This is why I just proposed that by DrupalCon London, we have a rough draft of all this laid out, and a set period to review it in. We aren't going to get it perfect, but the sooner we can get a more stable approach, the more effective it will be to get others involved and make the process so much better.
--
zzolo
I'm gutted that I can't get
I'm gutted that I can't get to DrupalCon London... even though it's only an hour from where I live! Sadly my wife is due to be popping out a baby at exactly the same time as Dries provides his keynote (ish!). It's baby #3 though, so personally I don't see the conflict... she disagrees.
Dreamleaf Media
'Module Duplication' as the only 'Full Project Access' block
I see a huge limitation in the current process, in that granting an applicant the 'Full Project Access' role is currently tied to acceptance of their proposed module. If an applicant fulfills all of the requirements of the code review process, but it is determined that his module doesn't pass the 'duplicate module' test, then they are not granted the 'full project access' role. Given the current backlog in the project application queue, this serves as a serious demotivator for potential contributors ... and being denied for no other reason than 'process' does not give the contributor the impression of a welcoming community that is both easy and fun to work within.
One has to look at the motivation for the code review process, and ask whether it's about the 'Code', or the 'Person'. Based on my research prior to composing this post (and supported by a few posts by webchick), I'd say it's at least 70% about the Person; and wanting to develop them into a valuable member of the contributing community ... As a result, the application decision should be weighted equivalently.
Proposed Solution:
Provide the flexibility to allow the granting of the 'full project access' role, without tying it to approval of the 'module'. I'm not suggesting that module approval should not be a consideration in the decision to grant full project access (and I'd expect that, for 98% of cases, they would be linked) ... but the process should be flexible enough to accomodate the other 2% as well.
/groupspam off. ;)
duplicate check first
Its actually about the community (not the person or project). Reviewing a project/module/etc is one way of getting Full Project access. The other way is to be a co-maintainer or taking up an abandoned module. But all these are about the community as a whole.
We initially talked about every project needing a review before being a full project, some people even talked about every release needing a review. But this is simply not feasible given current resources. So, yes, we are vetting the person, but through a single project review.
But, duplicating code is a community value that all of usually share. It is unfortunate that people don't catch this until the end, but that doesn't change the underlying value or the effect to the community.
This is why its really important to do the duplication check first and for the applicant to do this as well right up front.
--
zzolo
Module Duplication vs. GPL Licensing
This is another aspect of this issue. After all, the whole purpose of open source licensing is to guarantee that others will have access to the source code so that they can use it and modify it as they see fit. A few relevant quotes from the GPL text:
Publishing code under GPL doesn't mean that the author gives up his rights. He still holds the copyright, so if you use somebody else's code, under the terms of the license you have to include their copyright notice, and a notice that you changed it. If you do that you are free to use the code in your own modules and there's no problem. But if you want to distribute it as a Drupal project, that is another issue entirely.
Module Duplication and Author Motivation
This is another aspect of of the problem. In a lot of cases, an author wants to get his project on d.o. mostly to beef up his/her resume. They need something that they can point to and say, "I did that". Sandbox projects take care of that problem to a certain extent, but not to the extent of full project status. So if an author creates a project that duplicates what somebody else has already done, that actually says something about their level of skill. Apparently they do not yet understand that an important part of any project is researching what is already out there, seeing how things have been done by others, and deciding whether starting a new project from scratch is really warranted. There are instances when a new project can be justified, even when there is already something out there that does something similar. In those cases that author really needs to provide some good reasons why the project is necessary.
agreed
I totally agree with this. Its about the author demonstrating that they have looked for existing solutions, and even if there are similar solutions, describing why their solution has value.
It's not so much about strict duplication of code (copying line for line without attribution), its about being a responsible community member.
--
zzolo