This message has been cross posted to Drupal.org Improvements, Documentation Team, and Drupal.org Testing Infrastructure.
I want to use automation to improve d.o content quality. I do not have access to do this and I have been unable to get someone with access interested enough to get this project moving.
The long version
My vision for improving the Drupal Documentation and drupal.org can be broken into three equally important ideas:
Incremental improvements - Small improvements more often keep the audience happy instead of waiting longer for major changes. People have a short attention span so to keep mindshare we have to pop up on the radar more often.
Exploring the solution space - Encourage people to think outside the box and test new things out. Make it easier for people to tinker with the docs as a whole. The continued growth of the site requires new scalable approaches to solving problems.
Not wasting the volunteer’s time - “Producing Open Source Software” by Karl Fogel states: “Try not to let humans do what machines could do instead. As a rule of thumb, automating a common task is worth at least ten times the effort a developer would spend doing that task manually one time. For very frequent or very complex tasks, that ratio could easily go up to twenty or even higher.” I firmly believe there are more important tasks that are still approachable for beginners other than basic spelling and grammar mistakes. Everyone’s time is important and trivial tasks that can be automated away will be automated away. It ensures both a higher level of quality as the computer does not make mistakes and the volunteer is freed up for tasks that computers cannot just solve like evaluating style and determining if information is missing.
With those three ideas in mind I started thinking about what I could personally do to achieve these goals. I came up with the followings tasks:
Create an automated process for checking page titles on drupal.org - https://drupal.org/node/1441074 . This should simply be a page listing all the page titles that still need to be fixed. Having users hunt around for stuff that is broken is a waste of time.
Have an automated test check the encoding of all the Drupal docs to make sure everything is uniform. This can also go into character usage and proper font selection for other mediums.
Do some analytics on what docs get viewed the most. Ensure those docs have a higher quality standard that is not simply locking the page and then making users bounce around the issue queue to get a page updated. This always leads to delays and volunteers losing interest.
Increase the use of visual aids in documentation. There is an entire field of information visualization research we can use to better communicate information. Plain text is not the best method in all cases for everyone.
Start a process of automated spell checking, grammar checking, and style guide checking. This allows a guaranteed level of quality
- Automate the collection of comments that should be approved for removal. Basically find the “me too” and spammy comments so a person can quickly verify them.
- https://drupal.org/node/1426262 - Find comments that should be in the documentation and set them up for beginners to work with.
Software has been built to check comments and has been tested on youtube comments.
Automate checking the reading level, verb tenses and other key metrics that determine quality writing. These natural language processing (NLP) tasks are solved via existing well testing libraries. None of this is even remotely ground breaking.
Now this is not the first time I have brought up automation as a solution for maintaining and improving the Drupal content. I tried a few times last year but it always dead ended at the same spot which is where I find myself now. I have a well thought out plan that I can execute that will help the community. It comes down to access. I need the correct kind of database access or database dumps to run tests and make corrections. Having a bot do it through the web interface will just slow d.o down.
Furthermore, this should help reduce the number of items in the issue queue.