Post RSS feed to Twitter - Regex and XSS

Events happening in the community are now at Drupal community events on www.drupal.org.
cleaver's picture

I've developed a module that takes an RSS feed and posts it to Twitter. It uses a regular express to filter and manipulate the RSS data--IE. you may only want to post items that match a certain term and you might want to change the content before posting a tweet.

One of the things I'd like some feedback on is displaying a regular expression in a form field. Obviously, a regular express needs to be somewhat permissive in terms of the data it is presenting. Functions like check_plain() will cause problems. I even need to match XML tags from the feed, so the tag correction that check_markup() provides causes problems. My solution is to use a stripped down version of check_markup that retains the javascript checking, but not the tag matching.

If anyone has ideas, I'd like to make this as secure as possible while still functioning. It's still waiting for a CVS module review (http://drupal.org/node/854324) if you want a look, or on github (http://github.com/cleaver/TweetRSS).

A bit more about the module:

I wrote a hook so that people can develop their own filters and included a sample filter I wrote for my proof-of-concept site: (http://canweather.com). The module uses the Twitter module for authorization and Twitter API.

If anyone has new ideas, please post! One thing I can think of is to support Feed module--right now it uses the core Aggregator module for RSS.

Comments

Feeds and Views

sreynen's picture

Yeah, you should use the Feeds module for aggregation. That'll let you map categories to taxonomy tags so you don't need to handle the parsing on that at all. And once you have the feed items in nodes, you could filter what gets posted to Twitter with Views. If you handle parsing with Feeds and filtering with Views, you shouldn't need regular expressions at all. Beyond the issues you mentioned with exposing regular expressions, they just aren't a tool a lot of people are comfortable using, so you can improve usability a lot by not exposing them at all.

Filtering and rearranging

cleaver's picture

That would definitely work well for filtering which RSS to post (or any data, not just RSS). I definitely plan to do something like that.

However, the other thing I use regex for is to match certain parts of the feed to use them in my twitter post. For example in doing a weather post, I might search for the temperature, humidity and windspeed out of the entire RSS item. I have to be selective since I only have 140 characters. I can't think of any other way to do that in a flexible manner.

I guess the intent of this module (at least the regex plugin) is more of an admin tool and not for general users. Maybe the default should be a simpler wildcard pattern match?

subscribe

stevenator's picture

subscribe

You can check out the code of

cleaver's picture

You can check out the code of the module. (http://drupal.org/project/tweetrss) In comparing other modules' handling of regex, I find that most are showing the regex value with no filtering. While not ideal, mine should be a bit more care. In any case, grant admin access with care.

As for usability, I was thinking to create a simpler filter plugin that would let admins control things with simple searches and use tokens to define the output.

Contributed Module Ideas

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: