Estimates of 12 million vulnerable websites

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Bevan's picture

Estimates of 12 million vulnerable websites (Sophos, then BBC), is actually the result of reasonable deduction. I checked.

Does anyone have any ideas why w3techs's estimates of total Drupal 7 websites (65% of 1.9% of all 1B websites: 12M) is so much higher than Drupal's own usage stats (930k) or even Built With's estimate: "We know of 783,321 live websites using Drupal in our records.

It is certainly possible that a number of mass-installs (e.g. Drupal Gardens) do not report to Drupal.org and take extensive measures to hide the fact they are running Drupal, thus evading Built With's checks. But I doubt that they make up websites that number in the millions. GoDaddy used to do provide one click installs that could reach numbers that high, but I have not heard anything about that service in many years.

What mass-installs and one-click install services do we know of? Can we get estimates of their numbers and Drupageddon defense systems to try to work out if the "hundreds of thousands" estimates are an order of magnitude too low?

Comments

Looking at the web-crawler I

rb2k's picture

Looking at the web-crawler I run, I would back build-with's estimate.
The d.o. estimate probably includes sites in subdirectories (something.com/blog/) and some that are not reachable from the "public web"

Build-with probably uses just "plain" domains (something.com) and subdomains (blog.something.com)

W3Tech has a disclaimer in their FAQ: http://w3techs.com/faq

"For the surveys, we count the top 10 million websites according to Alexa, see our technology overview for more explanations. We do crawl more sites, but we use the Alexa top 10 million to select a representative sample of established sites. We found that including more sites in the sample (e.g. all the sites we know) may easily lead to a bias towards technologies typically used for "throw-away" sites or parked sites or other types of spam domains."

I can only assume that they actually mean the number of 'sites' (something.com/node/42) rather than the number of domains out there. Otherwise there is no way they could cite 1 billion websites in total. Even after running a crawler for years, the highest I've seen is about 150 million domains and subdomains (with a bit of filtering to weed out spam-y wildcard subdomains that get generated)

Flawed metrics

scor's picture

I think their way of measuring the number of sites is flawed, which works in their favor for making their article sounds worse than it is and play into sensationalism. Using statistics, it's easy to bend numbers to make them tell the story you want. It seems to me that the BBC just picked whatever source they could find with the highest numbers, without really checking their accuracy or source. Kind of sad for a big name like the BBC. I don't really think of the Sophos blog as the most authoritative source for those kinds of surveys. Here is the comment I left on the Sophos blog post (my comment hasn't been published yet):

The 12M number of affected site is grossly over-estimated. https://www.drupal.org/project/usage/drupal reports less than a million Drupal 7. Given that some of those are private intranet sites which would not be affected by wide-spread attacks, and that some percentage of the rest would be hosted on platforms like Pantheon, Acquia, Platform.sh, CloudFlare which had protections in place (https://www.acquia.com/blog/shields), and that some other fraction would have patched in less than 7 hours, claiming that 12 Million sites are affected is absolutely inaccurate.

Edit: comment improved

Mass-installs

scor's picture

It is certainly possible that a number of mass-installs (e.g. Drupal Gardens) do not report to Drupal.org and take extensive measures to hide the fact they are running Drupal, thus evading Built With's checks. But I doubt that they make up websites that number in the millions.

I can confirm that Drupal Gardens doesn't report its installs to drupal.org, and that the number of Drupal Gardens sites wouldn't really help to go from less than 1 Million to 12 Million sites. The number of sites on Drupal Gardens is only in the tens of thousands, not really significant when talking about hundreds of thousands. Note that all the Drupal Gardens sites were protected by the Acquia Cloud shield, so they should not be counted in the estimates of "possibly affected sites".

We don't really hide the fact that we're running Drupal on those sites either, see an example site, we leave the CHANGELOG.txt file and the generator meta element, so those sites can be assumed to accounted for by any decent crawler like the ones from W3Techs, builtWith or rb2k's crawler.

Discussing Drupageddon via Podcast

shanesevo's picture

Great collection of content to review on this thread, we will repost as part of our podcast focused on Drupageddon.

http://www.commercialprogression.com/post/hooked-drupal-podcast-episode-2

Would anyone in here be open to contributing on a future podcast concerning security and how to move forward in light of Drupageddon?

Security

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: