Proposal: Drupal 7 RDFa markup testbench for search engines

Posted by scor on June 13, 2011 at 8:32pm

The launch of schema.org got me thinking about how we could help the major search engines to extract the bits of information they care about from Drupal 7 pages. They will try to follow the spec of whatever format they are parsing, but there will be bugs, and we should be able to detect these bugs easily so they can be fixed. From a search engine perspective, it is hard to know how to work with Drupal pages without knowing much about the internals of Drupal and the kind of information they output. They could look online for random Drupal 7 sites but they might miss the bigger picture if they don't work with the Drupal community. What if we built a set of typical Drupal 7 HTML output pages featuring the structured data that search engines should expect? Bing, Google (and also Facebook) could use these pages to test and tune their parsers. We could also use these pages to test the various search engines and ping them if something breaks on their side (this would be manual testing at first, but we could think about automatize that in the long term, to provide some form of QA for the Drupal SEO community). I think this would also allow us to learn more about how consumers like search engines use our markup, and potentially help us to make decisions if we needed to make some tweaks to Drupal's output.

We could start with a few common types of pages that search engines are familiar with, such as news article, person, event, recipe. Each of these types could be documented as a site recipe (maybe using features which would include the appropriate schema.org mappings) and we could generate an HTML snapshot from each of these types. (I think Lin has already started to work on some of these site recipes). For each supported type, we would take an HTML snapshot and assign a version number, and increment that number as we make fixes or modules updates. Each version would be hosted online so that it can be tested against the various search engine tools. We would have for example:

http://qa.semantic-drupal.com/snippets/person/1
http://qa.semantic-drupal.com/snippets/event/1

if we find a bug in one of our modules, we fix it and release a new set of snapshots:

http://qa.semantic-drupal.com/snippets/person/2
http://qa.semantic-drupal.com/snippets/event/2

and so forth...

The testing can be a manual copy/paste at the beginning, but as I said, it could potentially be automatized in the long run if each test comes with its expected results.

Thoughts?

Basically, #1 requires the involvement of other parties and #2 could still be useful for us and would not necessarily rely upon the involvement of other parties.

Bing, Google (and also Facebook) could use these pages to test and tune their parsers.
We could also use these pages to test the various search engines and ping them if something breaks on their side.

Proposal: Drupal 7 RDFa markup testbench for search engines

Comments

This makes a lot of sense.

Steve Macbeth from the Bing

Sounds Like a Good Idea To Me

I like it

Semantic Web

Group organizers

New groups

Group notifications