Benchmarking Higher Ed Sites ...

matt_paz's picture

I recently revisited an old pet project of mine looking at benchmarking and comparing higher education sites. It is obviously a work in progress, but it is shaping up fairly well at this stage and I'm getting to a point where I'd like to secure some feedback. If 'ya get a chance, please take a look and let me know what you think. It's based on D7, represents around 3,200 US higher ed sites, and has a range of data to sift through.


Pretty cool.

theMusician's picture

I work at the University of Oregon and that is fun to see the page stats. I know a few of the developers who manage the homepage. I'll pass this along to them. Feedback: the colors you use on the server information graph make it difficult to distinguish between platforms.

Text analysis is just cool. I like the metrics chosen. Anyhow, great project. Keep up the nice work.


matt_paz's picture

RE: Server Info Colors ...

Ya, not a lot of contrast there, eh? This was just a quick and dirty starting point, but I've gone ahead and added it to my list.

RE: Text Analysis ...

Heh, that's an interesting one. Has potential, but perhaps more interesting when I start adding in other "child" pages ... not sure how/when I'm going to tackle that tho'

That's nice

redndahead's picture

If you check sites multiple times it would be nice to see a history of results. Also curious to see where our universities site would stack up.

Thanks ...

matt_paz's picture

Greatly appreciated. Sorry I missed UC Merced. I got the data from IPEDS ... if I remember correctly, UC Merced is a newer institution, so that's probably why it isn't there. I'll work on getting that added soon.

I've always hoped to have longitudinal data collected. Just have to make sure I can afford to scale it ;) ... and need to invest in the work to schedule the snapshots (easy). My hope is just to use Drupal's node revision system ... then w/ Diffs, views, etc., there is potential for all kinds of interesting data. Also thinking about back filling revisions with data from the internet archive ... that'll only get me basic DOM, HTML, usage (none of the performance/header fun), but it could be interesting nonetheless. I'd love to see a chart with adoption of Google Analytics overtime.

I'm playing around with a dom hashing technique too ... hoping I might be able to use that to give me a heads up when pages are redesigned.

Just added UC Merced

Wow that was quick. Thanks!

redndahead's picture

Wow that was quick. Thanks!

Way cool

teaguese's picture

This is very cool - love it! I started something a little similar for academic libraries about a year ago - I was focused on the IA, the layout of the page, types of navigation, common navigational elements, etc. Mine was pretty manual, but you've inspired me to figure out how to automate it. I agree that a history of results would be even cooler.

Interesting ...

matt_paz's picture

I have an interest in academic libraries as well. Was thinking about things like ...

1) Percentage of sites that mention the library on their home page
2) Creating a semi-structured series of taxonomy terms for "common" child pages (libraries, jobs, portal, department pages, etc).
3) Allowing anyone to add child pages to the index

Just so many angles to consider here ... fun stuff, but it takes time.

See the comment above about longitudinal data. I'm all about going there, just have to find the time :)


teaguese's picture

yep - a lot of different angles/directions you could go in. the text-analysis piece is so interesting. The word count would be telling on library sites since many still tend to be very link dense.

maybe you've already looked at this, but it would be neat to look at what doctype folks are using...also, are they using a metadata schema such as dublin-core (doubt anyone is still doing this)

fun stuff :)


Metadata ...

matt_paz's picture

Lots of metadata to sift through ... you might be surprised about DC, but not sure ... I'll be adding a full-text source search soon(ish), so people can delve a bit deeper and more creatively than using the ole canned reports that I have available today.

It would be interesting to

greggles's picture

It would be interesting to pull in some data from about the sites since tries to do analysis of the technologies in use (e.g. CMS, programming language, javascript libraries) - see for one of the sites in your index I'm familiar with.

Great work, Matt. This is really cool stuff!

Indeed. I'll have to take a

matt_paz's picture

Indeed. I'll have to take a deeper look at their API. Unfortunately it appears to be limited to 500 calls a month, which for my needs, isn't much. Still, I'd rather use an API than try to reproduce parts of it from scratch. I already have parts of the js library, analytic engines, etc figured out ... just have to polish it up. Still it would take a ton of time to everything they're doing on my own ... hopefully they'll be willing to work with me on the API limit.

Oh, and btw, congrats on the Acquia deal! Nice!


sreynen's picture

This is really cool. I'd love to hear more about how you built it, maybe at a DBUG presentation.

I wish ...

matt_paz's picture

Thanks! I've always hoped to pursue something interesting enough for a DBUG prezo ... unfortunately, it looks like i'm moving outta state. I'll be happy to share some thoughts on building it ... it wasn't too bad ... just need to carve out the time. :)

Nice Job!

capitalfellow's picture

Really enjoyed poking around your work to see how the .edu site I'm responsible for stacked up. The textual analysis was an interesting metric, but the server/developer visitors such as myself could use some explanation of their meaning & impact (maybe a short summary sentence & link to the corresponding wikipedia article)

So True ...

matt_paz's picture

could use some explanation of their meaning & impact

Ya, I was running on a different host at one point and then ran into some threshold issues and lost the info. I'll look into that at some point. It was completely foreign territory for me too ... I'm not even sure how interesting it is at this point ... I think it has more potential when I (hopefully) expand later.

Really enjoyed poking around your work


University of Washington campuses

ksymer's picture

Great job on this, Matt!

I've noticed some discrepancies in search results that might be of interest. They relate to the three University of Washington campuses. Probably due to IPEDS data.

Not found on your site:
University of Washington-Bothell Campus

Found, but excluded when searching by region or classification:
University of Washington-Tacoma Campus

Compare to correct listing for:
University of Washington-Seattle Campus


Updated ...

matt_paz's picture

Tacoma has been updated and Bothell has been added

Sorry for the delay ...

Wellesley College - new to drupal 7

jannabrown51's picture

Really interested in having you add in Wellesley College for benchmarking. We are newly up in Drupal 7 and using a module called monster menus to help us with departmental hierarchy/ownership issues.

Had to take it down ...

matt_paz's picture

Heh, well, I wound up taking it down as it was costing me a bit more than I wanted to pay. I'd love to bring it back ... had grand plans for it, but in the end, I just couldn't justify it. Sorry! Glad to hear that someone else found it interesting tho'

Denver / Boulder Colorado (DBUG)

Group categories

Group events

Add to calendar

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: