Estimates of 12 million vulnerable websites (Sophos, then BBC), is actually the result of reasonable deduction. I checked.
Does anyone have any ideas why w3techs's estimates of total Drupal 7 websites (65% of 1.9% of all 1B websites: 12M) is so much higher than Drupal's own usage stats (930k) or even Built With's estimate: "We know of 783,321 live websites using Drupal in our records.
It is certainly possible that a number of mass-installs (e.g. Drupal Gardens) do not report to Drupal.org and take extensive measures to hide the fact they are running Drupal, thus evading Built With's checks. But I doubt that they make up websites that number in the millions. GoDaddy used to do provide one click installs that could reach numbers that high, but I have not heard anything about that service in many years.
What mass-installs and one-click install services do we know of? Can we get estimates of their numbers and Drupageddon defense systems to try to work out if the "hundreds of thousands" estimates are an order of magnitude too low?
Comments
Looking at the web-crawler I
Looking at the web-crawler I run, I would back build-with's estimate.
The d.o. estimate probably includes sites in subdirectories (something.com/blog/) and some that are not reachable from the "public web"
Build-with probably uses just "plain" domains (something.com) and subdomains (blog.something.com)
W3Tech has a disclaimer in their FAQ: http://w3techs.com/faq
"For the surveys, we count the top 10 million websites according to Alexa, see our technology overview for more explanations. We do crawl more sites, but we use the Alexa top 10 million to select a representative sample of established sites. We found that including more sites in the sample (e.g. all the sites we know) may easily lead to a bias towards technologies typically used for "throw-away" sites or parked sites or other types of spam domains."
I can only assume that they actually mean the number of 'sites' (something.com/node/42) rather than the number of domains out there. Otherwise there is no way they could cite 1 billion websites in total. Even after running a crawler for years, the highest I've seen is about 150 million domains and subdomains (with a bit of filtering to weed out spam-y wildcard subdomains that get generated)
Flawed metrics
I think their way of measuring the number of sites is flawed, which works in their favor for making their article sounds worse than it is and play into sensationalism. Using statistics, it's easy to bend numbers to make them tell the story you want. It seems to me that the BBC just picked whatever source they could find with the highest numbers, without really checking their accuracy or source. Kind of sad for a big name like the BBC. I don't really think of the Sophos blog as the most authoritative source for those kinds of surveys. Here is the comment I left on the Sophos blog post (my comment hasn't been published yet):
Edit: comment improved
Mass-installs
I can confirm that Drupal Gardens doesn't report its installs to drupal.org, and that the number of Drupal Gardens sites wouldn't really help to go from less than 1 Million to 12 Million sites. The number of sites on Drupal Gardens is only in the tens of thousands, not really significant when talking about hundreds of thousands. Note that all the Drupal Gardens sites were protected by the Acquia Cloud shield, so they should not be counted in the estimates of "possibly affected sites".
We don't really hide the fact that we're running Drupal on those sites either, see an example site, we leave the CHANGELOG.txt file and the generator meta element, so those sites can be assumed to accounted for by any decent crawler like the ones from W3Techs, builtWith or rb2k's crawler.
Discussing Drupageddon via Podcast
Great collection of content to review on this thread, we will repost as part of our podcast focused on Drupageddon.
http://www.commercialprogression.com/post/hooked-drupal-podcast-episode-2
Would anyone in here be open to contributing on a future podcast concerning security and how to move forward in light of Drupageddon?