Exploring solutions: Managing Duplication

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

We want to reduce duplication in the issue queue but we want to do it in a friendly way.

There are (at least) two parts to this:

  • getting rid of existing duplication - is there an easy way to flag issues as duplicates and who gets to close them if they are? I know little or nothing of this process at the moment (assume there is one tho)

  • reducing incoming duplication - wouldn't it be nice if it worked like Get Satisfaction (and probably many others) where it does a search of your title and suggests existing duplicates that you have to check and confirm are not the same before you create your own new issue... not perfect but could be v helpful?



Re: Managing duplication

Senpai's picture

The Duplicate Issue Alert System functionality would serve to help alleviate some manner of duplication by warning the issue queue's users that there are n other projects which also change at least some of the same things this particular project does. That would be super cool.

As to the incoming duplication, well, that's always been a problem, it's gotten better little by little with the current tweaks. I do like the idea of a forward-looking title searcher that would warn you if the words in your proposed title were substantially similar to other existing issues, but only if it gives you a way to preview those other issues without leaving your current issue-in-progress, because I know I'm not the only one that types their bug text first, then goes back to fill in the title... am I?

Joel Farris | my 'certified to rock' score

related posts

catch's picture

Quick plug for http://drupal.org/node/1060798

The block already exists in apachesolr module and it works very well (at least for non-technical content, I reckon it could do a decent job on the issue queue too though).

There may be barriers to getting it deployed on d.o but none that I know of, so it's potentially a very quick fix for existing duplication - which is always going to exist, and something we have a massive backlog of.

edit: for new issues, there is a quite old feature request for project module to do this, if I see it again I'll try to remember to come back here and link to it. I don't think there's been any/much activity on it since we added apachesolr to Drupal.org (it's very old). I'm not aware of anything drop-in that would do this, so it'd be a step beyond just enabling the related posts block, but yes it would definitely be nice to have something like this.

Two more, much older issues

catch's picture

Two more, much older issues about this:


and http://drupal.org/node/19386

current process

catch's picture

I missed the question in the opening post, the current procedure is this:

  1. Come across two issues that are duplicate of each other.

  2. Usually the most recently posted issue should be marked as duplicate, sometimes this is ignored if it has an up to date patch etc.

  3. Close the duplicate issue (changing status to duplicate)

  4. (optional but I wish it was done more often), cross-link the two issues in both directions - has to be done manually in a comment.

After typing this I realised I did a big edit to http://drupal.org/patch/review that tried to document this, that page is supposed to be more or less step-by-step for evaluating an issue.

an example, if I may....

leisareichelt's picture

so, this is the kind of thing we need to overcome somehow (and this is not the first time this has happened to me - I'm sure you've seen examples like this all the time.)

That thing where you log in to the issue queue, get bounced to your profile page then, if you hit the back button, you're not logged in so you have to somehow re-find what you wanted to respond to in a thread in a group somewhere (which, if it's a busy thread and you're trying to jump to it from email is pretty much impossible). That's the issue I wanted to log.

I was pretty sure that it must have been logged before so I searched, using all the terms I could possible thing of to find it. Couldn't find anything.

So I logged an issue here: http://drupal.org/node/1103302

compare my issue: 'On login, take me back to where I came from pls'
to the 'real' issue: 'Redirect back to the slave site's destination'

there is NO WAY I was going to get a match, was there?
especially since I had no idea that it was part of the 'bakery single sign on project'

how do we solve this?

leisa reichelt - disambiguity.com

Yeah, all the funny code

yoroy's picture

Yeah, all the funny code names and jargon make finding stuff harder. (https://twitter.com/#!/jeffnoyes/status/33538860986142720 -> We need a glossary)

Should we require some sensible tags when opening a new issue? Do free-tagging for a bit, then see if we can extract a fixed set of terms from it to ehm, prevent duplication :)

Starting some glossary page

tvn's picture

Starting some glossary page might be indeed a good idea. I remember it actually took me some time to figure out what "d.o" and "g.d.o" mean :)

Require tags - better no, its already overwhelming for a newbie to figure out correct choices for all the required fields when creating an issue. Lets not make more of those.

As for the list of tags - it would be good to have it pre-defined indeed. We could start gathering them right now on some wiki page here. And later there should be some place (issue I guess) where people could propose tags to be added to list when the need arise. (And is there now free tagging on d.o? I don't really know since I unfortunately haven't used them at all).

I have a page for that -

davidhernandez's picture

I have a page for that - http://groups.drupal.org/node/133634 . Maybe people can post here instead, to reduce duplication. :-) This is the list I've been working with. All additions and suggestions are welcome.

technical or human solution...?

leisareichelt's picture

perhaps what we need to do is to provide some guidance about how to describe the problem when you're describing a user facing issue (thinking to self if anything is ever not facing some kind of user in some way...?) so that with the text box where you're entering the issue title there is a prompt reminding you to describe it in terms of the PROBLEM you're wanting to solve and not the solution you're proposing?

that's got to get us quite a bit closer without having to go down the route of defining an extensive taxonomy (not that having this taxonomy wouldn't be helpful but taxonomies tend to be more difficult to get compliance on than general principles that mostly make sense) - and this general principle would go some way to achieving some general UX education objectives too, right?

leisa reichelt - disambiguity.com

Yes, it's a pet peeve of mine

yoroy's picture

Yes, it's a pet peeve of mine to see an issue title that (sorta) describes a solution without ever pointing out the problem in the body of the message. Some careful copywriting of the form labels would be a good start there.

Personally, when I start an issue myself, I try to refrain from talking solutions because you can't edit the original post of an issue. The way many issues evolve makes that those initial proposed solutions quickly don't make too much sense anymore. Instead, I post the issue, state the problem and quickly add the first comment (which can be edited) myself where I start talking solutions. Which is hacking around the limitations, but there you go.

There are glossaries ... 3 of them

lisarex's picture


I don't think we need 3 glossaries, but the info is there... probably needs updating.

Interesting that in http://drupal.org/node/303613 (a forum issue) someone said they tried typing in http://drupal.org/glossary but got a 404.


Oh look what I've found

tvn's picture

Oh look what I've found accidentally last week: http://drupal.org/getting-started/before/terminology Glossary!

The Glossary "situation" is

lisarex's picture

The Glossary "situation" is now resolved, yay! http://drupal.org/node/1001316


A simple approach

mgifford's picture

I don't know how many error messages I've posted in the past like this one http://drupal.org/node/1103602

Which essentially boils down to:
Notice: some_function() (line 847 of some_contrib_module.module).

Would be nice if when entering that initial issue queue if a basic search were done that looked for errors at least mentioning that line number in some_function() in drupal.org.

That's the easiest way to pull up some duplicates.

There are certainly enough other sites out there that allow you to write your post & then give you an opportunity to verify that it isn't redundant. Not sure what the best way to create that digg type workflow, but it would be nice.

This uniqueness module -

davidhernandez's picture

This uniqueness module - http://drupal.org/project/uniqueness - aims to do that. I was going to start playing with it for the support.d.o site.

Agreed: uniqueness.module seems perfect

dww's picture

I haven't looked at the code for uniqueness yet, but based on the description and a tiny bit of research, it seems ideal for stemming the tide of new duplicate issues. Implementation-wise, it's already talking to Solr, which is great from a d.o infrastructure deployability standpoint (our solr servers are not resource constrained at all, so anything we can serve via solr is a win for us). See #1128044: Consider deploying uniqueness.module on drupal.org (to help prevent duplicate issues before they're submitted) for more on this...

Sweet, on the Solr bit. I

davidhernandez's picture

Sweet, on the Solr bit. I didn't know it could work with Solr. I was just talking with Peter Wolanin about trying to accomplish the same functionality with Solr, so score one for less work. (at least some less)

Haven't read through

yoroy's picture

Haven't read through everything, but justa quick suggestion: Standardize and crowd-source the "Similar modules" listing. We're all about reducing duplication of efforts on d.o, but we don't have a means to make sure both maintainers and developers have a simple means to alert one another when this happens.

Right now, I monitor the New Modules RSS feed, and if something comes up that sounds familiar, I search it out, and post a boilerplate heads up in each issue queue. I don't dig in-depth into each module, so sometimes I'm wrong, but at least they now know about one another. What I find frustrating is that while I often recommend the maintainers add something to the project description, many fail to do so. I'm not sure why. Maybe they forget, or maybe they disagree about similarities, but I feel this conclusion should be up for dispute. If the maintainer (knowing the code) thinks it's different, but the users (knowing only their use-cases) feel they're similar, then other users should still have a prominent place to show this.

From @patcon in http://groups.drupal.org/node/137914#comment-467379

Where are we at now?

mgifford's picture

I think the most concrete steps are in: