Recipe: Creating an Atom Feed in Rails

Your website is now functionally complete, and it’s got a good selection of quality content to interest potential viewers. You’d like to make it easy for those potential viewers to discover your content. One of the best ways to do this is to add a feed to your site.

Feeds can be read by software referred to as a feed reader. Interested users can subscribe to your feed using their feed reader, allowing them to quickly become aware of new content when it’s added to your web site. Feed readers are widely available, so any user who’s interested in following feeds in this way should be able to find one. Even most web browsers now include a feed-reading capability.

Feeds are generally created to accomplish one of the goals below:

Content Distribution: A feed can contain a complete content item, such as the full text of an article or blog entry. This method allows subscribers to access the full content. Subscribers can easily extract the content and re-use it for their own purposes.
Content Notification: A feed can contain summaries of content items, such as articles, blog entries, videos, etc. Since the feed doesn’t contain the full content, it generally includes links that allow users to navigate to a URL where they can view the content. This method can be used to help drive traffic to a web site.

In this article, we’re going to create an Atom feed for the KeenerTech.com blog using Ruby on Rails, with the goal of driving traffic to the blog web site. All reasonably well-supported feed readers support both Atom and the various versions of the RSS standard. For our purposes, though, Atom is a better choice since our blog content will contain HTML tags.

The Atom Syndication Standard

The Atom Syndication Format, now at version 1.0, is a standard for syndicating content, very similar to RSS. Like RSS, information is packaged into an XML file.

Atom is, in some respects, an enhanced RSS-like standard that provides better support than RSS for diverse content types, internationalization and modularity. Many people, in fact, use RSS as an umbrella term for feed standards (e.g. — RSS 1.0, RSS 1.1, RSS 2.0, etc.) and just lump Atom in as “another” RSS standard, even though, strictly speaking, that is not correct.

What Does an Atom Feed Look Like?

An Atom feed is just an XML file that adheres to the Atom Syndication Standard. A sample Atom feed is shown in Listing 1.

Listing 1: Sample KT.com Blog Feed

<!--?xml version="1.0" encoding="utf-8"?-->
<feed xml:lang="en"
      xmlns="http://www.w3.org/2005/Atom">
   <title>KT.com Articles</title>
   <link href="http://www.kt.com/feeds"
         rel="alternate" />
   <link href="http://www.kt.com/arts.atom"
         rel="self"
         type="application/atom+xml" />
   <id>http://www.kt.com/arts.atom<id>
   <updated>2009-08-30T18:56:14Z</updated>
   <generator uri="http://www.kt.com">
      kt.com
   </generator>
   <entry>
     <title><![CDATA[Maven Introductory
         Presentation Available Online]]></title>
     <link
      href="http://www.kt.com/arts/2009/08/23/maven"
      rel="alternate">
     <id>tag:kt.com,2009-08-23:321</id>
     <author>
       <name>Steve Keener</name>
       <uri>
          http://www.kt.com/profiles/steve_keener
       </uri>
     </author>
     <updated>2009-08-24T01:26:16Z</updated>
     <published>2009-08-23T00:00:00Z</published>
     <summary type="html">My presentation,
        <a href="http://www.kt.com/docs/Maven.pdf">
        Maven: Managing Software Projects for
        Repeatable Results</&;gt;, is now available
        online. Find out how to leverage this
        sophisticated build tool to automate
        key tasks for your next Java project.
     </summary>
</entry></feed>

Feed Summary Elements

The summary section of the Atom feed consists of the elements defined in Table 1. An asterisk indicates elements that are required to be present for a valid Atom feed. Notes about how the element will be handled for the feed we are building in this article will be in italics.

Table 1: Feed Summary Elements

Element	Description
author	This element provides information about the author of the feed. Data about the author is contained in one or more child elements. According to the standard, an author element must be provided within the feed, but there are two valid ways to specify this information. If it appears in the summary section, then it represents the author of the entire feed. Alternatively, each entry must have an author element, which would describe the author of each entry. In this article, we’re building an Atom feed for a blog, so each entry will have its own author element. Accordingly, no author element will appear in the summary section of the XML output.
generator	This element defines the software or web site that created the feed. For this feed, the feed generator will be defined as “KT.com.”. The “uri” attribute of the element will provide a link to the KT.com website.
id *	This element defines a unique and permanent identifier for the feed. In practice, the id is typically defined as the full URL of the feed. Note that the Atom standard considers this value to be an id; under no circumstances should it be treated as a URL. If a URL for the feed is needed, use the “link” element instead.
link	The link element provides a URL for a resource related to the Atom feed, with the URL provided in the “href” attribute. The meaining of the link is defined by the “rel” attribute. A feed can include multiple link elements, but only one link for each valid “rel” value. Our Atom feed will include two link elements in the summary section, one with the “rel” element is set to “self” — indicating that the URL is a self-referential link to the Atom feed itself. The other link element will have its “rel” element is set to “alternate,” indicating that the URL is a link to additional information about the feed.
title *	The title of the feed.
updated *	The date/time that this instance of the feed was updated significantly. Date/time values are in the following format: 2009-04-24T00:00:00Z The values are in GMT, and show a four-digit year, a two-digit month and a two-digit day. The “T” character serves as a separator between the date/time portions of the value. The date value is terminated with the “Z” character. Note that it’s the “Z” character at the that implies GMT; otherwise a specific offset from GMT can be specified. The simplest solution, and the one that I recommend, is to specify that dates using GMT. People who use the feed can easily translate that into an appropriate date for their timezone if they need to.

Entry Elements

The Atom feed includes one or more entries, where each entry provides information about a blog entry. The Atom Syndication Format provides a set of standard elements for describing content entries.

Table 2: Standard Atom Entry Elements

Element	Description
author	This element provides information about the author of the entry. If the entry has multiple authors, then the element will appear multiple times. Within the author element, information about the author is contained in one or more child elements. The “name” element is required. There are also two optional sub-elements, “email” and “uri”. The “uri” is typically either the URL of the author’s blog site or the URL of a profile page that provides more information about the author. According to the standard, an author element must be provided within the feed, but there are two valid ways to specify this information. If it appears in the summary section, then it represents the author of the entire feed. Alternatively, each entry must have an author element, which would describe the author of each entry. In our blog’s Atom feed, each entry will have its own author element. Additionally, since our blog entries can only have a single creator, only one author element will be provided for each entry.
content	This element contains the full content of an entry. We’re not going to provide a content element for our feed because we’re not trying to distribute the blog’s content. Instead, we’re trying to notify potential viewers of the content that we have available, and entice them to go to the web site. So, we’ll provide a summary of the content rather than the full content.
id	This element contains a well-formed URI for a entry. A sample entry id is shown below: tag:keenertech.com,2009-08-16:322
link	The “href” attribute of the link element specifies the URL for a web page that shows detailed information about the entry.
summary	Provides a short abstract of the entry.
title	A human-readable title describing the entry.
updated	The date/time that this entry was last modified. Date/time values are shown in the same format as the “updated” element of the feed summary section.

Onwards to the Code

Implementing an Atom feed in Ruby is a straight-forward task. Ruby provides the Builder module which automates creating XML documents. The general technique shown in most technical books is to create a controller that retrieves a list of content items from the database. The controller then makes that list available to a view, which includes Builder code to generate the necessary XML.

My technique is a little different. In Listing 2, I pull the Ruby code to produce the feed into the model. By doing this, it’s easier to write unit tests for the Atom generation code. It’s also easier to share the code if you need to produce multiple Atom feeds, e.g. — a separate feed for each category.

Listing 2: The Ruby Code Behind the Atom Feed

 class Entry < ActiveRecord::Base

    def self.generate_atom_feed(title, entries, options = {})
      feed_ref_url = 'http://www.kt.com/feeds'
      feed_url = 'http://www.kt.com/arts.atom'
      base_entry_url ='http://www.kt.com/arts/'
      buffer = ''
  
      xml = options[:builder] ||= Builder::XmlMarkup.new(:indent => 
        options[:indent], :target => buffer)
      xml.instruct! :xml, :version=>'1.0', :encoding=>'utf-8'
  
      xml.feed 'xmlns' => 
          'http://www.w3.org/2005/Atom', 'xml:lang' => 'en' do
        xml.title title
        xml.link 'rel' => 'alternate', 'href' => feed_ref_url 
        xml.link 'rel' => 'self', 
                 'href' => feed_url,
                 'type' => 'application/atom+xml'
        xml.id feed_url
        xml.updated Time.now.utc.strftime("%Y-%m-%dT%H:%M:%SZ")
        xml.generator 'KT.com', 'uri' => 'http://www.kt.com'
  
        entries.each do |entry|
          if !entry.name.nil?
            xml.entry do
              # Create Atom entry elements
              xml << "    <![CDATA[" + entry.name + "]>\n"
              xml.link  "rel" => "alternate", 
                "href" => base_entry_url + 
                entry.display_date.strftime('%Y/%m/%d/') + entry.url_name
  
              xml.id "tag:kt.com,#{entry.display_date.strftime
                ('%Y-%m-%d')}:#{entry.id}"
              xml.author do
                xml.name "#{entry.user.first_name} #{entry.user.last_name}"
                xml.uri entry.user.first_name + "_" + entry.user.last_name
              end
              xml.updated entry.updated_at.strftime("%Y-%m-%dT%H:%M:%SZ")
              xml.published entry.display_date.strftime
                ("%Y-%m-%dT%H:%M:%SZ")
              xml << "    " + 
                 Entry.escape_html(entry.summary) + "\n"
            end
          end
        end
  
      end
      
      buffer
    end
  
  protected

    def self.escape_html(str)
      return str.to_s.gsub('&', '&').gsub('<', '<').gsub('>', '>') if str
    end

 end

In the code above, I’ve hard-coded the feed paths. If I were to generalize this code to produce multiple Atom feeds, such as for different categories, I’d need to re-factor this code slightly since each feed should have its own distinct URL.

To produce the XML, the caller must pass in the title of the feed, the list of entries to be included in the feed and any options. The options are passed directly to Builder::XmlMarkup; the only option that is really supported is the “:indent” option, which specifies how many spaces each level of XML should be indented. The Builder::XmlMarkup object will direct its output into the “buffer” string.

The Builder::XmlMarkup object makes it easy to produce XML. A statement like:

xml.title title

will produce the XML below:

<title>KT.com Articles</title>

Likewise, a statement like this:

xml.generator ‘KT.com’, ‘uri’ => ‘http://www.kt.com’

will produce an XML element that includes both content and an attribute:

<generator uri=”http://www.keenertech.com”>KeenerTech.com</generator>

Since I’ve pulled the Atom generation code into the model, the controller will need to call the Entry.generate_atom_feed method in order to produce the output. The controller will then receive the XML as a string, which it will then render without benefit of a view. The controller code is shown in Listing 3.

Listing 3: Controller Code

  def index
    @articles = Entry.find(:all, 
                           :include => :user, 
                           :order => "entries.display_date desc")
                           
    respond_to do |wants|
            wants.html do               
              render :layout => 'kt' 
            end
            wants.atom do
              xml = Entry.generate_atom_feed('KT.com Articles',
                @articles, :indent => 2)
              render :text => xml, :content_type => 'application/atom+xml'
            end
          end     
  end

The controller generates the list of articles using the find method. Note that it uses the “:include” option to incorporate user information into the list. Since the user information has been pre-fetched by find, traversing from an entry to its corresponding user will not generate additional database calls (which is something that the Atom generation code will be doing).

The controller uses the respond_to statement to determine what type of content the caller wants to produce. If HTML, Rails will use a view to generate the appropriate HTML output. If the caller has specified Atom, then the code generates the Atom output by calling the model. It then renders the content directly.

Does this break the MVC paradigm? It probably bends the rules a little bit, but the advantages outweigh the disadvantages.

The primary advantage is that the code for producing the Atom output is centralized in the model, where it can easily be shared and unit-tested. Meanwhile, the model and controller still have clearly defined roles in producing the output.

The model simply produces the output; it doesn’t render it. The controller determines what data elements will appear in the feed, gets the Atom output from the model and then causes the output to be rendered. While the controller causes the content to be rendered, it doesn’t otherwise manipulate it. We could have a view that renders the Atom output string, but it seems kind of pointless to have a one-line view.

Atom Validation

After producing the feed, you should validate it using the Feed V alidator. This free service will perform a detailed analysis of Atom and RSS feeds. It produces excellent diagnostic output, as well as good recommendations for items that optional. Don’t even think about considering your feed to be “finished” until you’ve successfully validated it.

References

There are a number of relevant references available that can be useful for those producing Atom feeds:

Atom Syndication Format – Introduction
http://www.atomenabled.org/developers/syndication/
This web page provides a detailed and extremely useful overview of the Atom Syndication format.
Builder Documentation
http://builder.rubyforge.org/
RDoc documentation for the Builder module, including Builder::XmlMarkup.
Feed Validator
http://www.feedvalidator.org/
A free online service for validating RSS, Atom and KML feeds. Highly recommended.