Wiki: contributing to content quality on DrupalLadder.org

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Purpose of this document

This wiki is an open invitation to help solve challenges around content quality on DrupalLadder.org. Let's focus on concrete processes that volunteers can follow to improve this community resource.

Please note in the comments below that volunteers have applied security updates and added the honeypot module in hopes it will help reduce the creation of new spam.

Volunteers have also spent several hours removing existing spam and spam users, but the quantity of each is vast. The next pressing need is for a plan to leverage volunteers and software solutions for speeding up that process. To that end, let's spell out a plan here and confirm agreement on the details to enable more people to join the effort.

Distinguishing valuable content from spam is essential to the effort to involve lesson writers. The quality of the Drupal Ladder will be damaged if their materials are inadvertently deleted along with spam.

Please add priorities and recommended best practices here. Feel free to add rough details to allow other interested people to provide their refinements.

Spam deletion

(add steps for distinguishing spam content from valuable content, and for then removing only spam)

Most of the spam is easy to identify at a glance. Manual approaches involving a large number of volunteers may be the most effective way. Please improve on the following manual approach or propose automated alternative(s):

  1. Create a backup of the database before making changes so they can be reverted if an issue is discovered. (The unexpected can always happen --and we can even all make mistakes!) Please use backup_migrate via the UI to save a manual backup first.

  2. Visit the content page and...

Filters

  1. The comment field on the user registration page often contains spam on those user accounts used to create spam. It's generally easy to spot. The filter at http://drupalladder.org/admin/spamspotter is available to folks with permission to administer users. It is connected with a Views Bulk Operation action to cancel accounts and remove the related content. The filter is still labor intensive, as legitimate users may also have added comments. Ambiguous comments should not be used as the sole reason for deleting an account; however, gibberish comments or comments containing links can be quickly dealt with.

Tools for automation:

  1. Views Bulk Operations can be added to the site. Create a view that...

Additional considerations regarding spam control:

  • One of the most significant challenges faced by the ladder is to encourage contribution. Before we consider introducing hurdles to contributing, let's explore 'frictionless' solutions; they are out there. Please help identify and vet them. For the short term, we'll leave the setting in place requiring admin approval of user accounts; however, this setting needs to be changed at the earliest possible opportunity.

  • When a user is deleted, all ladders and lessons posted by that user are also deleted. There are hundreds of spam user accounts. Anything but a bulk approach to identifying and removing these accounts will be extremely tedious and time consuming.

  • Previous spam protection was provided with Mollom by itself. New spam users appeared at a rapid rate. The current codebase adds the Honeypot module. People are invited to explore additional tools and techniques, providing concrete recommendations of solutions.

Confirming contributing user accounts

(related to above, confirming who is known to contribute value (ham) vs spam can make spam cleanup a little easier in subsequent rounds)

  • Add a role for 'real person'; when administrators see that a user is a real person, they can set this role; the account could then be filtered from future reviews.

  • There must be an approach to confirming that uses other actions in relation to lessons/ladders, e.g. recommendations, maintainer actions, reported abuse, etc. Can someone propose solution(s)?

Adding lessons

(describe criteria and how they might better be communicated on the drupalladder.org site)

  • Proposals have been introduced requiring approval of content by admin. This is the setting currently on the site. Let's explore how we can accomplish our objectives without introducing new hurdles to contributing, and allow users to create accounts without requiring approval.

Peer reviewing lessons

  • If possible, we can have tags related to the ladders similar to issue queue on d.o and a tag like 'Needs Review' will mean that the ladder is still in review phase and will not be visible to common public but only to admins and reviewers. Once its marked as 'Approved', only the then it'll be published.

Recruiting volunteers

Organizing volunteers and providing appropriate levels of access

(describe guidelines for granting access)

  • Alongside, admin role, we can also add a reviewer role. Reviewers will be the people who'll be able to see ladder still in review phase and help polish it until its ready for publishing. And then admin can approve a ladder once it has been reviewed.

Increasing awareness of the Drupal Ladder and its value

A previous proposal included the following suggestion: "To increase visibility of the ladders, we can have featured (promoted might be the better word here) ladders on the homepage similar to gdo homepage." Discussion invited.

Comments

Delete current spam

Slurpee's picture

DrupalLadder.org has a ton of spam. Earlier this year in the month of May a Google Summer of Code student spent time cleaning out spam content/comments/users. It is safe to assume that some time between May 15th-June 1st is when the site was good in regards to spam. I think a good solution is to utilize Views Bulk Operation to unpublish the nodes + comments and disable the users as of June 1. A bit of manual spam clean up might still be required but VBO will drastically help save time. I currently have admin access and would accomplish the task personally, but VBO is not a module available on production and I don't have server access to add it.

Who has server access, can upload VBO, and run a backup on the database in case we unpublish legit content with the ability to retrieve it?

Anyone with server access??

Slurpee's picture

It has been almost a week and no response about who has server access. Who ever has access...please respond. This is a priority as DrupalLadder.org is currently not the best to use because of the amount of spam. Google Code-In students will start working on tasks Dec 1st related to DL.

Maybe you want to contact

penyaskito's picture

Maybe you want to contact offsite Addison (https://www.drupal.org/user/65088), I know she has access.

--
Christian López Espínola (@penyaskito)

access to code has been granted

kay_v's picture

Hi all,

Thanks for your ongoing focus on the Drupal Ladder. The OwnSourcing team has just received access to the server.

Rough plan at this point (to be updated as details become clearer):

  • beef up spam protection
  • apply security updates
  • reach out individually to volunteers for help on specific tasks (first of which is clearing out spam users and posts)

Please continue to incorporate your recommendations into the above wiki.

Kay

ownsourcing.com - strategy, training, documentation

tracking quick changes to codebase

kay_v's picture
  • create new branch 7.x-1.x-restart and cut date tag (currently dev, stage and prod are on alpha19-E; note there are alpha20 tags also on the server)
  • apply all security updates (done; no issues locally)
  • beef up spam protection (done; no issues locally)
    • mollom.module is already in place but appears not to be stopping problems (creation of spam user accounts, spam ladders and spam comments)
    • add honeypot.module (other people report that the 2 modules are stronger in combination; noted that mollom also has it's own honeypot approach, so if we find it's not helping we'll want to explore an alternative)
  • push to dev for quick review on server; will update here when ready to deploy to production

ownsourcing.com - strategy, training, documentation

please help review dev

kay_v's picture

The above updates have been applied to dev.

Since the issues on production are significant, we will open dev for review for only a short period before deploying to production. If you are able to review on dev, please visit http://drupalladderdev.prod.acquia-sites.com/

The gamut of tests are appropriate, especially being able to follow as well as author lessons on a ladder.

Dev has a copy of the database and files from production with an updated codebase (core and contrib).

Thanks in advance for your help!

ownsourcing.com - strategy, training, documentation

Please include

Slurpee's picture

Please review comment @ https://groups.drupal.org/node/448198#comment-1071618 - it would be wonderful to remove the spam during this testing period.

yes - spot on @slurpee

kay_v's picture

:)

ownsourcing.com - strategy, training, documentation

Try Botcha

cs_shadow's picture

Botcha module (https://www.drupal.org/project/botcha) has helped in similar situations before. It uses a variety of recipes instead of just one so I suppose its worth a try. AFAICT, it can be used alongside Mollom but Honeypot is not required since its a subset of Botcha anyway.

That being said, lets clean up spam once, and then explore such solutions.

I recommend the Spambot

alarez's picture

I recommend the Spambot module that I have use in the past to cleanup spam users https://www.drupal.org/project/spambot

But the first step will be to install and configure the Botcha module as recommended by @cs_shadow, which will definitely help stopping the spammers to continue.

some success --looks like mollom and honeypot stem tide

kay_v's picture

Seems spam removal is most pressing priority for making the Ladder usable again.

Thanks for the suggestions on beefing up spam prevention. Logs show changes made yesterday have virtually stopped the rapid creation of spam users and posts.

Let's pull together proposals and volunteers for the task of cleaning things up. Can people recommend concrete cleanup steps? Great if they can be incorporated into the wiki above so volunteers can follow clear steps.

ownsourcing.com - strategy, training, documentation

adding url field

kay_v's picture

adding url field module so users can link to their existing d.o. accounts; ideally this gives another vector for confirming non-spammers.

ownsourcing.com - strategy, training, documentation

kay_v's picture

let me know if you're available to help review user accounts. volunteers needed.

ownsourcing.com - strategy, training, documentation

kay_v's picture

let me know if you're available to help review user accounts. volunteers needed.

ownsourcing.com - strategy, training, documentation

I'm available

cs_shadow's picture

I can help review user accounts. Since GCI is about to start, we'll be getting lots of requests soon and I can help with that because I'm also involved with GCI.

more work needed --cleared out thousands of spammers so far

kay_v's picture

Thanks for the offers of help!

So far we've cleared out thousands of spammers, fortunately deleting their content with them. Unfortunately there is still a stunning amount there.

Discovering patterns in the spam and building views bulk operations to help lump the spam users together has been the most helpful thing so far. Doing so with regular comments would likely be very useful as a next step; so far our attention has been on patterns in nodes and users.

An example pattern might be someone posting an unusual number of comments; glancing at content of the comments quickly gives a sense of whether it is spam.

There are also roughly 1000 users who have created accounts since an unknown admin changed the account activation setting to require admin approval. It would be great to devise a good way of vetting these currently blocked user accounts so people wanting to use/contribute to the site legitimately are able to do so.

ownsourcing.com - strategy, training, documentation

@kay_v - Did you see all the comment details?

Slurpee's picture

@kay_v - did you see all of the details from my previous post? The GSoC student is the "unknown admin changed the account activation setting to require admin approval". I can build the VBO if needed. Let me know.

Posted by Slurpee on November 13, 2014 at 8:54pm
"Earlier this year in the month of May a Google Summer of Code student spent time cleaning out spam content/comments/users. It is safe to assume that some time between May 15th-June 1st is when the site was good in regards to spam. I think a good solution is to utilize Views Bulk Operation to unpublish the nodes + comments and disable the users as of June 1. A bit of manual spam clean up might still be required but VBO will drastically help save time. I currently have admin access and would accomplish the task personally, but VBO is not a module available on production and I don't have server access to add it."?

documenting our processes

kay_v's picture

@slurpee - thanks for quoting the earlier post. I'd missed it in my rush to get things moving.

One refinement I hope we can adopt (that I also need to be better at) is to spell out our plans in the wiki before implementing. That way we can come up with a repeatable approach to resolving issues and minimize the risk of mistakes. As I've worked, I've certainly been concerned about the possibility I would inadvertently delete valuable content or introduce new hurdles.

Can folks spend a bit of time refining processes in the wiki? It should probably also define a deployment plan, so we're not experimenting on production.

Thanks!

ownsourcing.com - strategy, training, documentation

status update on spam

kay_v's picture

Honeypot and mollom together are blocking quite a lot of spam. That's good news, though there is still a lot of spam to clean up, and still some new spam that gets introduced.

We are also currently requiring admin access for a user to create an account. As far as I can tell, we don't have a reliable way to distinguish between a real contributor and a spam contributor. When spam values are entered in the user registration form, someone can of course spot those, but someone will need to review regularly.

Very interested to get others' recommendations. Ideally we'll not hold contributors back once they create an account.

ownsourcing.com - strategy, training, documentation

A Sweeter Honeypot?

firstlut's picture

My friend Bryan made a better version of honeypot that did me well at my old part-time church site.

next: encouraging contribution & improving features

kay_v's picture

It's time to declare a small victory, say thank-you to volunteers for their efforts to date and refocus our energies on meeting the Ladder's original purpose of encouraging participation.

Status

The user creation setting is changed back to allowing visitors to sign up and be authenticated/activated automatically. This setting best matches the site's ethos of enabling learning/contribution and removing hurdles.

Spam finally seems to be largely cleared from the site thanks to a concerted push from a few volunteers over the last couple of months. Armed with VBO, they spent scores of hours obsessively reviewing the thousands of accounts on the site for any sign of intentions to spam vs contribute value. The user table is pared back to roughly 500 users with a good percentage of them positively identified as learners and contributors vs hucksters and vandals.

Visitors are once again free to create an account, log their progress on the ladder and submit comments. However, until we have better tools/practices in place to keep the spam from once again overtaking the site, comments will require review and admin approval will be required on an account before the account holder can create lessons or ladders. Where dozens of spam users were created per day in preceding months, new user creation has dropped to only a handful per day.

As of last night, people applying for accounts must also provide a link to an existing drupal.org account. The considerable downside of this requirement is that drupalladder.org is meant as an entry point into the community; requiring an existing d.o. account is a counter-productive prerequisite.

Next steps

Let's develop methods for giving users appropriate roles like community manager, lesson maintainer and the like. The need goes beyond the traditional Turing test; we seem to have a workable way of separating bots from humans as things now stand.

Let's better discover the needs of the various users of the ladder and set out to meet them more fully.

Two needs known already that could use attention right away:

  • more lessons on a broader set of topics
  • better tools than traditional comments for noting needed changes to content

Share your thinking / recommendations

If you'd like to be more involved going forward, please share your thinking in the wiki above and use the comment thread on this page to discuss.

ownsourcing.com - strategy, training, documentation