[GSoC Proposal] Search API Statistics

Events happening in the community are now at Drupal community events on www.drupal.org.
mtee's picture

Project information

Project page on drupal.org: not created yet
Student: Michael Timofeev (mtee)
Mentor: Thomas Seidl (drunken_monkey)
Co-mentor(s): (klausi)
Local mentor(s): Thomas Seidl
Current status: developing

Overview

Search API is a very widely used module that provides a framework for easily creating searches on any entity known to Drupal with faceting support, using any search engine. I propose to create a module, that could integrate with Search API and Facet API and would collect and evaluate search statistics and represent them in a viewer-friendly form. A module with similar functionality - Apache Solr Statistics
already exists for Apache Solr Search Integration Module
There are some requests for such a functionality for Search API.

Description

The module will be developed as a separate module but will closely integrate with Search and Facet APIs, it will log search queries (keywords and facets) and provide site maintainers with a visual representation of the collected data, which can be shown as a simple list in a block or as a chart. It will integrate here with the Views module.
The following data might be of the biggest interest for a site maintainer so I will concentrate on these use cases:

  • How many results did a search facet have yesterday? How many 30 days ago? How did the popularity of a facet evolve over time?
  • Do people actually use the facets I’m exposing on my site? How often has this particular facet been clicked on today?
  • How often is the whole search used anyway in a given time period?
  • What are the most trending search terms compared to last week? Which facet has been relegated or is used less often?

A user will be able to set up the time period for which data will be stored. Granularity of reports will be also adjustable.
This data after evaluation could be published on the site, for instance the most searched jobs may be listed on a job search portal.

Project schedule

My university exams take place during the last two weeks of June, so I am planning to work on the project more intensively afterwards.

  • Before May 21:
  • Create drupal.org Project
    IRC/VoIP meeting with mentors
    Take a look into APIs stated above

  • May 21 - June 11:
  • Find out what can be reused from other modules

  • June 11 - June 31:
  • Add simpletests

  • July 1 - July 7:
  • Data storage functionality. Possible use of the datastore module
    Make it configurable

  • July 7 - July 15
  • GUI and possibility to select what data has to be collected

  • July 15 - July 21
  • Out-of-the-box functionality for a block with top searched phrases.
    More simpletests

  • July 21 - July 28
  • Chart graphics

  • July 28 - August 4
  • Data export

  • August 4 - August 15
  • Testing
    Fixing bugs

Who

My name is Michael and I am currently a bachelor graduating student of Software Engineering at the Vienna University of Technology. My first industrial experience started with Drupal half a year ago, however, I have already worked with various APIs (e.g. Migrate API). I also have practical knowledge in Java-based web developing tools and techniques.

Comments

Disclaimer: I helped Michael

klausi's picture

Disclaimer: I helped Michael to craft this proposal.

This is a very good idea and would be a cool benefit for all Search API users. I know Michael personally and although he is relatively new to Drupal he has good programming skills and is learning very fast.

I also talked to Thomas Seidl (Search API maintainer) and he agreed to be a possible mentor for this project.

I like it! ;)

carlitus's picture

I like it! ;)

see a related d.o issue about

dasjo's picture

see a related d.o issue about visualizing project issue metrics
http://drupal.org/node/1556468

it will use sampler api
https://drupal.org/project/sampler

Google Summer of Code 2012

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: