RubyNation Main Room Videos

I’ve solved most of the workflow problems with the RubyNation 2011 video content from the main room, the footage that was shot with the awesome camera that we borrowed from Near Infinity. In the video realm, “workflow is a fancy word for getting video from the camera into a format that you can use for editing with all elements intact. It’s backed up by Dave’s Video Corollary – “no matter where you get video from, it will never be in the format that you need.”

The camera is a Sony EX1 HDCAM. Sony produces a free software component called the XDCAM Log and Transfer Utility that allows Sony video footage to be imported into Final Cut Pro for editing. If you’ve saved the entirety of the content from the digital card (i.e. – the full directory tree), then importing works just fine.

Since Gray Herter, RubyNation’s Chief Organizer, wanted the Ryan McGeary talk done first, I was successfully able to import the footage into Final Cut Pro and edit it. So, that talk is just waiting on the single-width animated intro bumper from Don Anderson before it goes out live on our Blip.tv channel.

Most of the talks are in fine shape. However, there was a glitch on Friday (the first day of the conference) that corrupted some metadata from one of the digital cards. This has impacted the Scott Chacon, Nick Gauthier and Jerry Cheung talks. I still have the raw video footage (so nothing is actually lost), just not the top-level metadata that will allow it to work with the import utility. I have a Plan B and Plan C for dealing with that content.

Plan B uses Adobe Premiere to handle the raw Sony video files, for which it is supposed to have native support. Plan C uses a commercial utility (about $120 or so) to recover the footage from the raw Sony video files if Adobe Premiere doesn’t work.

It looks like the video footage is clear enough that I can get away with not doing the side-by-side video/slide thing that we did last year. Or, at least, that’s true of Ryan McGeary’s talk, where the slides were distinct, well-designed and clear. I may need to do side-by-side on some of the talks that are more code-focused. I’m currently deciding this on a case-by-case basis.

I think this year that we’re also going to spring for a professional-level membership with Blip.tv, so we can have larger files, higher priority for transcoding activities and ultimately deliver high-resolution videos for viewers.

Anyway, welcome to the world of video production, where the work really begins when the event is over.

Recipe: Page-Specific Content for the HTML Head Section

Rails layouts are a nice way to organize boiler-plate HTML content in a DRY (“Don’t Repeat Yourself” in case you’ve been asleep for a while) fashion. But your standard layout may not be sufficient to meet the needs of all web pages, particularly when it comes to including CSS style sheets or JavaScript files.

The Problem

In a Rails 3.x application, how can you provide an easy method for web pages to include page-specific CSS style sheets and JavaScript files within the <head> section of the HTML?

The Solution

Here’s a straight-forward solution to the problem:

   <!DOCTYPE html>
   <html>
   <head>
     <title>Sample</title>
     <%= stylesheet_link_tag :all %>
     <%= javascript_include_tag :defaults %>
     %lt;%= csrf_meta_tag %>
     <%= yield :head %>
   </head>
   <body>

   <%= yield %>

   </body>
   </html>

Note the “yield :head” statement in the <head> section of the document. With this construct in place, any web page can add custom content to the <head> element of the page.

  <% content_for :head do %>
    <%= javascript_include_tag 'custom.css' %>
  <% end %>

In the example above, a custom CSS file is added for one specific web page.

With this method, no other page has to bear the burden of including a CSS file that will not be used on that page. For any page that does not provide content for the <head> element, well, no content will be inserted. This is a simple way to support per-page customization capabilities while only affecting the pages that need their own custom support files.


To the extent possible, it’s beneficial to limit the number of CSS stylesheets your website has. So I’m not advocating tons of little CSS stylesheets for your website. But when one page has a bunch of special-purpose CSS, it’s good to segregate that code in a separate stylesheet and include it in only where it’s needed.

The recipe described in this article serves to meet those occasional CSS needs.

Ruminations on the Internet

Last Wednesday, my Verizon FIOS Internet connection went down for the entire day. What was amazing to me was how much this impacted me. High-speed Internet access is ubiquitous nowadays; we don’t even really think about it unless it’s absent. When it’s not available, you begin to realize just how much the Internet has pervaded every aspect of our lives.

Professionally, of course, most of my work has to do with the Internet. I build web applications of all types, small ones for conferences, large ones for government agencies, etc. I access my source code over the Internet (using remote source code management technologies like git and Subversion); stay in contact with team mates using technologies like Skype, IM and email; and sometimes monitor my live web sites online.

But the Internet means far more than work to me. The most important thing about the internet is that it provides “information on demand.”

Want to know who starred in the 1957 movie, “Bridge on the River Kwai”? Check out the Internet Movie Database. Need to research information on the Battle of Midway for a Toastmasters speech? Check out Wikipedia; it may have a few flaws, but it’s still the most complete encyclopedia there is. Need to research how a Rails application can interact with Facebook? Use the Google search engine to track down articles, documentation, blog entries and other information sources.

What about entertainment? Want to catch up on that television show you missed last night? Check out Hulu. Or preview the menu of the restaurant you’re thinking of taking your wife to on Friday.

Let’s go even further. Pay your bills online using your bank’s website. Order items from your favorite store online. Keep up with the news.

Stay in contact with friends, both new and old on Facebook. Chat with friends online using IM. Heck, I even work with a group of people to run two conferences, RubyNation and DevIgnition, and most of our communication is done online via email and IM.

Toss your favorite smart phone into the mix, and you now have Internet access, information on demand, just about anywhere that you go.

What does this all mean?

  1. We live in a connected world now. Geography isn’t a factor anymore. We can stay in contact with friends wherever they may be. We can form friendships online with people we’ve never met in person. We can build communities around common interests that transcend conventional geographic boundaries.
  2. Information on demand is a huge asset, and I think that even now we’re underestimating the impact it’s going to have on world-wide society (think of the current Middle East unrest as being caused by a younger generation that has been exposed to new ideas via the Internet). The tools are becoming available so that anybody can educate themselves on any topic they find interesting, whether to enhance their career or simply to pursue hobby-level interests.
  3. Email, the World Wide Web, IM and social networks are all technologies that have been empowered by the universal accessibility of the Internet. And we’re not done yet. There are technologies like virtual environments and environment tagging that have incredible potential that we’ve only barely tapped into, plus new technologies beyond the horizon.
  4. We’re still in the early days of the Internet. We’ve yet to see the majority of the impact that the Internet will have. Just think about that for a minute…

Welcome to the Internet. It’s going to be a wild ride, if Verizon can keep my connection up.

Reading “Eloquent Ruby”

I’m currently reading Russ Olsen’s new technical book, “Eloquent Ruby.” It’s an excellent book. I particularly like its focus on what it takes to be a truly good Ruby developer, rather than just teaching Ruby syntax. It’s a great “next book” for the developer that has learned Ruby, and now wants to learn how to use it properly and effectively.

Introduction to Git

This is the first of a short series of articles introducing git to new users. Git is a source code management (SCM) utility that has come into widespread use within the open source community and is now expanding rapidly into the corporate realm. It offers some unique advantages over other SCM systems, particularly Subversion, which it is quickly supplanting.

The first thing that surprises most people is that git is a command-line tool. No fancy user interface for this tool yet (although I suspect there are individuals in the open source community already working on creating utilities to augment git’s user interface). Git provides an amazingly complete, and often eclectic, set of commands to accomplish all sorts of actions on a source code repository.

The good thing is that these commands ensure that all aspects of your source code repository are accessible and can be modified as needed. The down side is that it can often be difficult for a newcomer to figure out where to start.

Setting Up a Repository

Most people that I know are using git in conjunction with GitHub, an online business that hosts git repositories and which has built all sorts of web-accessible tools to support the software development process. You don’t have to use GitHub; you can set up your own git repository wherever you want, but then you have to worry about hosting it, providing secure access to it, administering access for new users, etc.

GitHub’s business model provides free public repositories for users, which is great for the open source community. They also offer paid services for users who want private repositories, i.e. – repositories to which the owner can regulate access to only specified users. This is the option typically chosen by corporate users.

To use GitHub, you must first sign up for an account. Once you’ve created an account, GitHub has plenty of information available online to guide you in creating your first source code repository so I’m not going to cover that.

Using Your Git Repository

You’ve just created your first git source code repository at GitHub. Now you want to use it. To make things simple, you’re not sharing it with any other developers, so you don’t have to worry about things like branching, merging, etc.

First, let’s clone the repository so you’ll have a local copy to work on.

      $ git clone git@github.com:your_account/repository.git

Now that you’ve got a copy of the repository, add some new source code files. Once you’ve done this, you’ll naturally want to check your changes in. Let’s determine what’s changed:

      $ git status
      # On branch master
      #
      # Untracked files:
      #   (use "git add ..." to include in what will be committed)
      #
      #      yourfile1.rb
      #      yourfile2.rb
      no changes added to commit 
      (use "git add" and/or "git commit -a")

Your brand new repository started out empty, but you’ve created some new files. Git shows the changes to the repository.

Unlike Subversion, git will only check in files that you have marked for check-in. You can do this by “adding” each file individually:

      $ git add yourfile1.rb
      $ git add yourfile2.rb

Check the status again to see what will be checked in:

      $ git status
      # On branch master
      # Changes to be committed:
      #   (use "git reset HEAD ..." to unstage)
      #
      #      modified:   yourfile1.rb
      #      modified:   yourfile2.rb

To check in the code:

      $ git commit
      [master e732f5a]    Minor change.
      1 files changed, 9 insertions(+)

The commit action automatically brings up an editor, generally vi, so that a message can be associated with the commit (similar to Subversion, CVS and other command-line-based source code management tools).

      Minor change.
      # Please enter the commit message for your changes.
      # Lines starting with '#' will be ignored, and an empty 
      # message aborts the commit.
      # On branch master
      # Changes to be committed:
      #   (use "git reset HEAD ..." to unstage)
      #
      #      modified:   db/yourfile1.rb
      #      modified:   db/yourfile2.rb

The comment above in the above example is “Minor Change.” (I would generally enter something with a bit more useful detail). Within the editor, any line beginning with “#” is a comment and will be ignored.

As an alternative, you could also commit all changes automatically:

      $ git commit -a
      [master e732f5a]    Minor change.
      1 files changed, 9 insertions(+)

This will also commit all the files without the necessity for explicitly adding each one. The downside, of course, is that this might also check in files that you don’t want, e.g. – scratch files, temporary files, etc.

This commit action didn’t actually do what you think it might have done. You see, you have a complete copy of the source code repository locally. You’ve just checked the changes into your local copy of the repository. This is nice, because you can be working offline, not attached to the Internet, and check code into your repository.

Clearly, though, there must be some way to sync your changes with the master repository on GitHub. Here’s how you do it:

      $ git push origin master
      Counting objects: 35, done.
      Delta compression using up to 2 threads.
      Compressing objects: 100% (22/22), done.
      Writing objects: 100% (22/22), 2.12 KiB, done.
      Total 22 (delta 18), reused 0 (delta 0)
      To git@github.com:your_account/repository.git
         731358f..e732f5a  master -> master

This pushes the changes from the master branch of your local repository (the branch you’re on by default) to the origin, i.e. – the repository from which this local one was cloned.

Here’s where some of git’s power is exposed. If GitHub disappeared tomorrow, you’d still have a full copy of the repository, which can function as the origin for other developers if necessary. You could let other developers clone your local repository, push changes to it, etc.

Git decentralizes the source code repository and reduces the chance of a catastrophic failure, such as a repository that gets corrupted or a hardware crash impacting the computer that a repository is on. This is why git is referred to as a distributed source code management system.

You might want to tag a release. This is particularly useful if you’re using hosting services such as EngineYard or Heroku. These services will pull code from a git repository and automatically deploy it if a release has been tagged.

To view the current tags that have been defined:

      $ git tag

You should examine the list of existing tags to make sure that the tag you’re planning on creating doesn’t already exist. Since this is a new repository, there are no tags yet.

To tag a release:

      $ git tag RC_1.15

As with checking in code changes, this tags the current revision of all the files in your local copy of the repository. To get the tag pushed to your GitHub repository:

      $ git push --tags

This pushes your local tags to the origin repository by default.

Note: This type of tag is what is referred to as a “light-weight tag.” Git also supports signed tags with associated messages identifying what the tag represents. I’ve generally found light-weight tags sufficient for most needs, so that’s what I’ve covered.

Conclusion

This has been a whirlwind introduction to git, with just barely enough information to get you started using git with your GitHub repository. There’s a lot more power available with git, and we’ll be covering some more advanced features in later articles.

“Brace Your Smile” Contest Launches

This is another online photo contest that we’ve just fielded using the Votridea contest platform, created by my team at Metrostar. This is the software platform for which I’m the technical architect and development team lead.

Home page for the "Brace Your Smile" contest, created using the Votridea contest platform.

This contest platform was developed as a generalization of the ExchangesConnect video contest platform that we did last year for the Department of State.

WordPress and Zombies

This is a an excellent WordPress presentation from Brad Williams, the author of “Professional WordPress Design and Development.” The presentation provides detailed examples for creating custom post types and custom taxonomies in WordPress 3+, but with a distinct and humorous zombie flavor.

Best Practices for Processing Credit Cards in Rails

This article defines a real-life scenario for a payment processing flow that allows an organization to process online donations for multiple charities. Each charity will have donation widgets embedded on numerous and diverse web sites, allowing users to select a a donation amount and then click a button to initiate a payment.

A set of rules were defined to ensure the security of the payment processing flow, i.e – the set of web pages involved in accomplishing the transaction. The payment processing flow was implemented in Rails, and some of the technical details are therefore Rails-specific, but the general lessons are applicable to any technology used to manipulate data securely.

  1. Minimize external entry points. The payment processing flow needs to be secure, with strict controls on where and how users can enter the flow. In this case, there should be a defined URL to which information (including the relevant charity and the selected donation amount) can be posted by the donation widgets.This Payment Data Entry page will show users the posted information about the donation, plus allow users to enter additional information, such as credit card data, to complete the transaction. The Rails controller action to display this page should be the only page in the payment flow that has the Rails authenticity token turned off.In general, the authenticity token ensures that a Rails-based form cannot be driven remotely by an external site or program. In this case, it has to be turned off so remotely hosted forms (the donation widgets) can post information to the web page.The Rails authenticity token can be turned off for a controller action by adding the following code to a controller:   skip_before_filter :verify_authenticity_token, :only => [ :create ]In Rails, the authenticity token is on by default.
  2. Use the Rails authenticity token whenever possible. All other payment-related pages, other than a clearly defined initiation page (our Payment Data Entry page), should use the Rails authenticity token so they can’t be remotely driven by external sites or programs. The authenticity token allows Rails to verify that data was posted to a web page by another web page on the local web site.Note: This means that the Payment Data Entry page will accept only limited information from a remote location (so it can show the relevant charity and payment amount). However, the data entry form on that page does use the authenticity token, so that page must be submitted manually by a user.
  3. Credit card information is carried from page to page via POST actions. No credit card information is stored in the local database. The purpose of the payment processing flow is to facilitate transactions in a customer-friendly way. The payment processing gateway (such as PayPal, ProfitStars or a credit card gateway) with which the payment flow is integrated will be the database of record for detailed transactions, and the security of credit card data will ultimately be their problem.
  4. Use a session variable to store the transaction ID. A transaction is recorded in the local database for a payment, and includes only relatively non-sensitive information such as the name of the customer, the charity to receive the donation, the amount and the date. Once a transaction has been created, the ID is carried from page to page via a session variable. Users will have no external way to specify a transaction ID in order to view data that does not belong to them.Note: Why have a local transaction? Well, some transactions might fail or there could be problems with the payment processing gateway. The local transaction provides sufficient information to correlate with the corresponding transaction recorded by the gateway. So, it’s really needed for data integrity checking and debugging if necessary.
  5. “Parameter Filtering” must be turned on so that fields related to credit cards are not stored in the site’s log file. This little detail is often forgotten by individuals implementing payment processing flows, and can result in credit card information being left in plain text in log files.The following code can be used to turn off logging of specified data elements:   filter_parameter_logging :credit_card, :expiration, :name_on_card, :security_code
  6. SSL is used for communication between pages and for communications with a payment processing gateway. This ensures that the traffic is encrypted and greatly reduces the odds that anybody (other than maybe the NSA) can access sensitive data in transit.
  7. Non-SSL access to any of these web pages will be rejected with an appropriate error message. Users cannot elect, either accidentally or on purpose, to perform a transaction without using SSL.

In combination, these rules allowed me to successfully create a secure payment processing flow for a client. In the event of a security audit, these rules allow me to: 1) clearly define for an auditor the risks associated with using and managing the data, and 2) demonstrate that the risk of any sort of data loss from the payment processing flow has been minimized to acceptable industry standards.

Considerations for Contest Rating Systems

I recently participated in the design and implementation of an online video contest for a major government client, a successful project which eventually evolved into an online product, the Votridea Contest Platform. One of the key features of the system was a rating system that allowed users to rate videos (photos and text entries were also eventually added).

Designing a rating system presents quite a few interesting problems. Basically, if a contest provides significant prizes, such as cash, vacation trips, products or other desirable prizes, then some users will inevitably try to game the system.

Some forms of self-promotion are allowable and, indeed, encouraged. By all means, share your contest entry on Facebook. Tell your friends about it. Ask them to vote for you. There’s nothing unfair about that. The tools of self-promotion are available to everyone. Plus, frankly, the whole point of a contest is to engage a community of users to respond in some way to the message of the organization sponsoring the contest, e.g. – to promote increased brand awareness, raise the profile of a needy charity, etc.

The sponsor wants users to be so engaged that they actively promote the contest and, by extension, the sponsor’s message. The sponsor doesn’t want people to cheat.

So, let’s look at some of the ways that people can cheat with online contests, and what steps can be taken to prevent cheating.

Voting / Liking

Some contests provide only a “Vote for this Entry” button or a “Like” button for a contest entry. Winners are chosen simply by getting the most votes. This is one of the simplest ways to rate contest entries, and it’s easily abused.

The obvious way to cheat on this type of contest is to simply vote as many times as you can. The primary way to prevent this is to regulate how many times a user can vote for a contest entry, with the primary choices being either 1) once per contest, or 2) daily.

A cookie can be set recording a unique ID for the user. The contest then only accepts one vote per contest entry from that user. However, this can be subverted by voting from multiple computers, or from multiple browsers on the same computer, or by anyone who is willing to clear their browser cookies after each vote (it does cut down on the number of non-technical cheaters, though).

An improvement is to require users to login to the contest site before they can vote. The login could be a native site login system, or, more often, it could be a remote identity provider such as Facebook, Twitter, LinkedIn, etc. This can be subverted by any user who is willing to create multiple login accounts in order to vote multiple times. The downside is that the requirement for logging in represents a modest barrier to user participation.

There’s not much more in the way of opportunities to secure this form of voting against cheating, which is why I don’t recommend it for contests. The method of rating is great for crowd-sourcing the evaluation of content, but not for contests where prizes of any significance are offered.

Ratings

A contest can provide a rating system so that users can rate a contest entry on a scale, such as 1 to 5 stars. Some ratings systems may even allow for multiple rating categories. For example, SpeakerRate is a site that allows users to rate speakers who present at conferences. The site lets users rate speakers by 1) the Content of their presentation, and 2) the Delivery of their talk.

The online video contest that my team built had four rating categories: CreativityOriginalityProduction Quality and Effectiveness. The prize offered was, basically, an all-expense-paid vacation for four people to a foreign country of your choice. These were nice prizes. Needless to say, we learned a lot about cheating, er, gaming the system.

A rating system provides a number of opportunities to increase the difficulty for cheating, which I’ll illustrate using the online video contest as an example.

Caching Ratings

A user’s composite rating for a video entry is calculated based on his rating in each of the four categories. Sum up the ratings and then divide by 4 to get the user’s composite rating for that contest entry.

Since calculations are being done to determine each user’s composite vote, determining the total average rating for an individual entry is a bit or work and takes a fair amount of time, since you have to do the same thing for every user that has voted.

Accordingly, these types of calculations are typically done on a defined schedule, e.g. – average ratings for each entry may be calculated and stored in the database every 15 minutes. Subsequently, ratings for each entry can be easily retrieved by a single database query.

A side effect is that users can’t instantly see the effect of their vote on the rating of a contest entry. This is good. It makes it harder for them to see whether their cheating is helping their own entry’s rating.

Weighted Category Ratings

So far we’ve calculated a composite rating by simply totaling the ratings and dividing by 4. That effectively means that each category-level rating counts for 25% of the total. Instead, the category ratings can be weighted, for example so that Originality and Creativity might each count for 30% and Production Quality and Effectiveness for 20%.

Weighted category ratings can tune a rating system to better reflect the sponsor’s values. However, an analysis of our data showed that all users gave their own entry the maximum rating, as did most of their friends. Since weighted category ratings are generally getting maximum ratings from contestants and friends, they don’t appreciably obscure the rater’s insight into their own impact on the contest.

Weighted Voters

Let’s face it, some voters are better than others. Good voters rate multiple videos and honestly try to give each entry a meaningful rating. Their ratings tend to exhibit at least some similarity to a bell curve, depending on how many entries they’ve voted on. Many contestants vote only once, giving their own entry only a max rating (SpeakerRate attempts to prevent this by prohibiting speakers from voting for their own talks). Many of their friends only vote once, giving their friend a max rating.

Which type of voter do you think is a better voter? Using a pattern of voting, it’s possible to weight ratings from high-value voters much higher than those from low-value voters. For example, high-value voters could be weighted so that their votes counted three times as much as low-value voters.

This doesn’t work for democracies, but it can function pretty well for contests. Frankly, it diminishes the impact of voters who aren’t interested in fairly evaluating contest entries. It also has an impact on cheaters.

The video contest required users to login in order to rate contest entries. We identified several different types of bad behavior that ensured despite the login requirements.

First, some contestants not only gave their own entry the max rating (which we expected), but they also went out and either 1) gave every other entry the lowest possible rating, or 2) they targeted close competitors and gave them low ratings. Second, some contestants mass-produced remote login ID’s and used multiple accounts to vote in this fashion.

The first type of behavior is inappropriate but not against the contest rules (although it might not be a bad idea to throw out user votes on their own contest entry). The second is just flat-out cheating. The impact of both types is considerably diminished by voter weighting. I’d even say that this type of rating, where only 5’s and 1’s are assigned should be further penalized in the weighting.

At any rate, voter weighting makes it harder for a user to determine the impact of his own voting on an entry’s overall rating, which is good. It also makes the cheater do a lot more work in order to achieve any impact.

Rating Formula

Strangely enough, these types of problems aren’t new. Google’s Page Rank algorithm is used to evaluate the content and authoritativeness of web pages. People try to game that system so often that they’ve come up with a name for it: search engine optimization (SEO).

But a formula seems like a good idea. So far, a formula for rating contest entries can leverage weighted category ratings and weighted voter ratings. What else can we throw into the mix?

How about page views, i.e. – the number of times that a contest entry has been viewed? Of course, some users might continually refresh a page to send the page views through the roof. That contestant could be penalized, of course, but what if the culprit was a competitor who was trying to get his competition penalized? Don’t laugh, these things do get tried. So, let’s add unique page views instead, i.e – a user gets one page view counted per session per contest entry.

Another factor to include in a formula might be the number of times that a contest entry has been shared, such as users who share via Facebook. The number of comments generated by an entry can also provide an indication as to the popularity of an entry.

Conclusion

As you can see, my own concept of a high-quality rating system for a contest ends up becoming a fairly complicated formula. As with Google, the precise makeup of the formula should be kept secret from users in order to make it harder for them to game the system.

The primary benefit of such a formula is that it: 1) reduces cheating because it’s harder for users to analyze the direct effects of their cheating, 2) discourages cheating by making it much harder to cheat, and 3) leads to better results that more accurately reflect the thoughtful consideration of responsible users.

Washington Business Journal on AirBanking

Main Street BankThe Washington Business Journal interviews Main Street Bank exec Jeff Dick about his AirBanking initiative. AirBanking mixes online banking, blogging and social media to attract young professionals. David Keener is the architect and social media expert assisting Main Street Bank with their strategies.