Considerations for Contest Rating Systems

I recently participated in the design and implementation of an online video contest for a major government client, a successful project which eventually evolved into an online product, the Votridea Contest Platform. One of the key features of the system was a rating system that allowed users to rate videos (photos and text entries were also eventually added).

Designing a rating system presents quite a few interesting problems. Basically, if a contest provides significant prizes, such as cash, vacation trips, products or other desirable prizes, then some users will inevitably try to game the system.

Some forms of self-promotion are allowable and, indeed, encouraged. By all means, share your contest entry on Facebook. Tell your friends about it. Ask them to vote for you. There’s nothing unfair about that. The tools of self-promotion are available to everyone. Plus, frankly, the whole point of a contest is to engage a community of users to respond in some way to the message of the organization sponsoring the contest, e.g. – to promote increased brand awareness, raise the profile of a needy charity, etc.

The sponsor wants users to be so engaged that they actively promote the contest and, by extension, the sponsor’s message. The sponsor doesn’t want people to cheat.

So, let’s look at some of the ways that people can cheat with online contests, and what steps can be taken to prevent cheating.

Voting / Liking

Some contests provide only a “Vote for this Entry” button or a “Like” button for a contest entry. Winners are chosen simply by getting the most votes. This is one of the simplest ways to rate contest entries, and it’s easily abused.

The obvious way to cheat on this type of contest is to simply vote as many times as you can. The primary way to prevent this is to regulate how many times a user can vote for a contest entry, with the primary choices being either 1) once per contest, or 2) daily.

A cookie can be set recording a unique ID for the user. The contest then only accepts one vote per contest entry from that user. However, this can be subverted by voting from multiple computers, or from multiple browsers on the same computer, or by anyone who is willing to clear their browser cookies after each vote (it does cut down on the number of non-technical cheaters, though).

An improvement is to require users to login to the contest site before they can vote. The login could be a native site login system, or, more often, it could be a remote identity provider such as Facebook, Twitter, LinkedIn, etc. This can be subverted by any user who is willing to create multiple login accounts in order to vote multiple times. The downside is that the requirement for logging in represents a modest barrier to user participation.

There’s not much more in the way of opportunities to secure this form of voting against cheating, which is why I don’t recommend it for contests. The method of rating is great for crowd-sourcing the evaluation of content, but not for contests where prizes of any significance are offered.

Ratings

A contest can provide a rating system so that users can rate a contest entry on a scale, such as 1 to 5 stars. Some ratings systems may even allow for multiple rating categories. For example, SpeakerRate is a site that allows users to rate speakers who present at conferences. The site lets users rate speakers by 1) the Content of their presentation, and 2) the Delivery of their talk.

The online video contest that my team built had four rating categories: Creativity, Originality, Production Quality and Effectiveness. The prize offered was, basically, an all-expense-paid vacation for four people to a foreign country of your choice. These were nice prizes. Needless to say, we learned a lot about cheating, er, gaming the system.

A rating system provides a number of opportunities to increase the difficulty for cheating, which I’ll illustrate using the online video contest as an example.

Caching Ratings

A user’s composite rating for a video entry is calculated based on his rating in each of the four categories. Sum up the ratings and then divide by 4 to get the user’s composite rating for that contest entry.

Since calculations are being done to determine each user’s composite vote, determining the total average rating for an individual entry is a bit or work and takes a fair amount of time, since you have to do the same thing for every user that has voted.

Accordingly, these types of calculations are typically done on a defined schedule, e.g. – average ratings for each entry may be calculated and stored in the database every 15 minutes. Subsequently, ratings for each entry can be easily retrieved by a single database query.

A side effect is that users can’t instantly see the effect of their vote on the rating of a contest entry. This is good. It makes it harder for them to see whether their cheating is helping their own entry’s rating.

Weighted Category Ratings

So far we’ve calculated a composite rating by simply totaling the ratings and dividing by 4. That effectively means that each category-level rating counts for 25% of the total. Instead, the category ratings can be weighted, for example so that Originality and Creativity might each count for 30% and Production Quality and Effectiveness for 20%.

Weighted category ratings can tune a rating system to better reflect the sponsor’s values. However, an analysis of our data showed that all users gave their own entry the maximum rating, as did most of their friends. Since weighted category ratings are generally getting maximum ratings from contestants and friends, they don’t appreciably obscure the rater’s insight into their own impact on the contest.

Weighted Voters

Let’s face it, some voters are better than others. Good voters rate multiple videos and honestly try to give each entry a meaningful rating. Their ratings tend to exhibit at least some similarity to a bell curve, depending on how many entries they’ve voted on. Many contestants vote only once, giving their own entry only a max rating (SpeakerRate attempts to prevent this by prohibiting speakers from voting for their own talks). Many of their friends only vote once, giving their friend a max rating.

Which type of voter do you think is a better voter? Using a pattern of voting, it’s possible to weight ratings from high-value voters much higher than those from low-value voters. For example, high-value voters could be weighted so that their votes counted three times as much as low-value voters.

This doesn’t work for democracies, but it can function pretty well for contests. Frankly, it diminishes the impact of voters who aren’t interested in fairly evaluating contest entries. It also has an impact on cheaters.

The video contest required users to login in order to rate contest entries. We identified several different types of bad behavior that ensured despite the login requirements.

First, some contestants not only gave their own entry the max rating (which we expected), but they also went out and either 1) gave every other entry the lowest possible rating, or 2) they targeted close competitors and gave them low ratings. Second, some contestants mass-produced remote login ID’s and used multiple accounts to vote in this fashion.

The first type of behavior is inappropriate but not against the contest rules (although it might not be a bad idea to throw out user votes on their own contest entry). The second is just flat-out cheating. The impact of both types is considerably diminished by voter weighting. I’d even say that this type of rating, where only 5’s and 1’s are assigned should be further penalized in the weighting.

At any rate, voter weighting makes it harder for a user to determine the impact of his own voting on an entry’s overall rating, which is good. It also makes the cheater do a lot more work in order to achieve any impact.

Rating Formula

Strangely enough, these types of problems aren’t new. Google’s Page Rank algorithm is used to evaluate the content and authoritativeness of web pages. People try to game that system so often that they’ve come up with a name for it: search engine optimization (SEO).

But a formula seems like a good idea. So far, a formula for rating contest entries can leverage weighted category ratings and weighted voter ratings. What else can we throw into the mix?

How about page views, i.e. – the number of times that a contest entry has been viewed? Of course, some users might continually refresh a page to send the page views through the roof. That contestant could be penalized, of course, but what if the culprit was a competitor who was trying to get his competition penalized? Don’t laugh, these things do get tried. So, let’s add unique page views instead, i.e – a user gets one page view counted per session per contest entry.

Another factor to include in a formula might be the number of times that a contest entry has been shared, such as users who share via Facebook. The number of comments generated by an entry can also provide an indication as to the popularity of an entry.

Conclusion

As you can see, my own concept of a high-quality rating system for a contest ends up becoming a fairly complicated formula. As with Google, the precise makeup of the formula should be kept secret from users in order to make it harder for them to game the system.

The primary benefit of such a formula is that it: 1) reduces cheating because it’s harder for users to analyze the direct effects of their cheating, 2) discourages cheating by making it much harder to cheat, and 3) leads to better results that more accurately reflect the thoughtful consideration of responsible users.