Crack, The Easiest Way to Parse XML and JSON

In which I release a new gem that allows parsing XML or JSON with a simple, consistent API.

An astute reader will remember that a while back, HTTParty divorced both ActiveSupport and the JSON gem in order to make it easier to use all around. With the JSON gem went the last gem dependency, which was kind of cool.

A few days back, it occurred to me that the parsing of XML and JSON that HTTParty used might be handy outside of HTTParty. In the spirit of sharing, I whipped together a new gem, named crack, that contains the XML and JSON parsers that formerly were bundled in HTTParty.

Why Crack?

I figured the name was easy and memorable, which is a requirement for anything I’m going to release. When I thought about parsing XML and JSON, for some reason, cracking the code came to mind and thus crack had a name.

Credits

First, I’d like to make it abundantly obvious that I did not author any of this code. I tweaked it a bit and made sure it had tests, but the XML parsing was extracted from Merb (extlib) and the JSON parsing from Rails (ActiveSupport). I merely packaged them together for all to enjoy. Ok, now that we have that out of the way, let’s move onward.

So I ripped the two parsers out of HTTParty and put them in their own gem and then just set that as a dependency for HTTParty. HTTParty will still work the exact same, but if all you need is a really simple way to parse JSON or XML, crack is his name and parsing is his game.

Details

As always these days, I used shoulda and matchy for testing and jeweler to make the gem maintenance easy. That is pretty much it for details on this project. It is focused and simple so there isn’t much behind the scenes.

Installation

I registered a rubyforge project, but I’m waiting for approval. For now, you can get the gem from Github.

sudo gem install jnunemaker-crack -s http://gems.github.com

Usage

It has always slightly annoyed me that all the different XML and JSON (JSON.parse, ActiveSupport::JSON.decode) parsing mechanisms available in Ruby have different APIs. I think parse is the easiest to remember and it is consistent with HappyMapper, another project of mine, so whether you are working with XML or JSON, all you have to remember is parse.

xml = '<posts><post><title>Foobar</title></post><post><title>Another</title></post></posts>'
Crack::XML.parse(xml)
# => {"posts"=>{"post"=>[{"title"=>"Foobar"}, {"title"=>"Another"}]}}

json = '{"posts":[{"title":"Foobar"}, {"title":"Another"}]}'
Crack::JSON.parse(json)
# => {"posts"=>[{"title"=>"Foobar"}, {"title"=>"Another"}]}

That is pretty much all there is to it. Given XML or JSON, you get back a hash. The repositoryhas been up for a couple days, but I thought I would mention it here as well. The keys here are simple and consistent. If you just want to get dirty and you aren’t worried about performance, crack is a perfect fit.

Exporting from a SaaS application

Recently I had a request from a customer of my applicant tracking system application to do an export of all her data. This was one of those features that I had always planned to do, but not never got around to it, and eventually decided I would just handle it when it came up. Well, it finally came up, so it was time to deal with it. Let’s call it just-in-time feature delivery. 🙂

In Catch the Best, there are two sets of user data. The first set is in the database and is made up of notes, contact info, etc., about job applicants. The second set of data is the documents the candidates send in with their job applications, which are stored via the attachment_fu plugin. Exporting the first set of data is fairly simple, but the other set of data took a little more work. Here’s what I did.

First, to get the data out of the database, I decided to put the to_xml method provided by ActiveRecord to good use, redefining it in my various models to meet my needs. For example, I have this in my Submission model:


class Submission < ActiveRecord::Base

  def to_xml(options = {})

    super({ :include => [:source, :attachments] }.merge(options))

  end

end

And then in Attachment I have a similar to_xml method that spits out the relative path to the file, so that it can be matched up with the files that I will be including with the export.

For the resumes, I iterate over all the attachments that belong to the customer, adding them to a zipfile that the customer will download. It looks something like this:


Zip::ZipFile.open(zipfile, Zip::ZipFile::CREATE) do |zip|

  submissions.each do |submission|

    submission.attachments.each do |file|

      zip.add(file.relative_path, file.full_filename)

    end

  end

  zip.add(‘submissions.xml’, xmlfile)

end

That submissions.xml file at the end is actually the output of that to_xml call in the Submission model. All of the attachments and the to_xml dump are added to a single zip file, which the customer can then download. Add a couple of UI bits, and self-serve data exports from my SaaS application are good to go.

That’s much better than doing it by hand, and I got to wait more than a year after I launched Catch the Best before I spent time on that feature. Everybody wins. 🙂

Identifying ThreadLocal Memory Leaks in JavaEE Web Apps

A few weeks ago wikis.sun.com powered by Confluence “Enterprise” Wiki grew beyond yet another invisible line that triggered intermittent instabilities. Oh boy, how I love these moments. This time the issue was that Confluence just kept on running out of memory. Increasing the heap didn’t help, even breaking the 32bit barrier and using a 64bit JVM was not good enough to keep the app running for more than 24 hours.

The Xmx size of the heap suggested that something was out of order. It was time to take a heap dump using jmap and check what was consuming so much memory. I tried jhat to analyze the heap dump, but 3.5GB dump was just too much for it. The next tool I used was IBM’s Heap Analyzer – a decent tool, which was able to read the dump, but consumed a lot of memory in order to do so (~8GB), and was pretty hard to use once the dump was processed.

While looking for more heap analyzing tools, I found SAP Memory Analyzer, now known as Eclipse Memory Analyzer, a.k.a MAT. I thought “What the heck does SAP know about JVM?” and reluctantly gave it a try, only to find out how prejudiced I was. MAT is a really wonderful tool, which was able to process the heap really quickly, visualize the heap in a easy-to-navigate way, use special algorithms to find suspicious memory regions, and all of that while using only ~2GB of memory. An excellent preso that walks through MAT features and how heap and memory leaks work, can be found here.

Thanks to MAT I was able to create two bug reports for folks at Atlassian (CONF-14988, CONF-14989). The only feature I missed was some kind of PDF or HTML export, but I did quite well with using Skitch to take screenshots and annotate them.

One of the leaks was confirmed right away, while it wasn’t clear what was causing the other one. All we knew was that significant amounts of memory were retained via ThreadLocal variables. More debugging was in order.

I got this idea to create a servlet filter, that would inspect the thread-local store for the thread currently processing the request and log any thread-local references that exist before the request is dispatched down the chain and also when it comes back. Such a servlet could be packaged as a Confluence Servlet Filter Plugin, so that it is convenient to develop and deploy it.

There was only one problem with this idea, the thread-local store is a private field of the Thread class and is in fact implemented as an inner class with a package default access – kinda hard to get your hands on to. Thankfully private stuff is not necessarily private in Java, if you get your hands dirty with reflection code:

Thread thread = Thread.currentThread();

Field threadLocalsField = Thread.class.getDeclaredField("threadLocals");
threadLocalsField.setAccessible(true);

Class threadLocalMapKlazz = Class.forName("java.lang.ThreadLocal$ThreadLocalMap");
Field tableField = threadLocalMapKlazz.getDeclaredField("table");
tableField.setAccessible(true);

Object table = tableField.get(threadLocalsField.get(thread));

int threadLocalCount = Array.getLength(table);
StringBuilder sb = new StringBuilder();
StringBuilder classSb = new StringBuilder();


int leakCount = 0;

for (int i=0; i < threadLocalCount; i++) {
    Object entry = Array.get(table, i);
    if (entry != null) {
        Field valueField = entry.getClass().getDeclaredField("value");
        valueField.setAccessible(true);
        Object value = valueField.get(entry);
        if (value != null) {
            classSb.append(value.getClass().getName()).append(", ");
        } else {
            classSb.append("null, ");
        }
        leakCount++;
    }
}

sb.append("possible ThreadLocal leaks: ")
        .append(leakCount)
        .append(" of ")
        .append(threadLocalCount)
        .append(" = [")
        .append(classSb.substring(0, classSb.length() - 2))
        .append("] ");

logger.warn(sb);

A simple plugin like this, was able to confirm that the leaked SAXParser instances are created and stored as thread-local variables somewhere within the code that exports content as PDF. That is good enough info to pinpoint the exact line of code that creates the thread-local instance by BTrace (or code review), but that's a story for a separate blog post.

The morale of the story: ThreadLocal variables are a very powerful feature, which as is common for powerful stuff can result in a lot of nasty things when not used properly. Hopefully all the info I provided to Atlassian will be enough to get a speedy fix for the issue and bring stability to wikis.sun.com - at least until we step over the next "invisible line".

Identifying ThreadLocal Memory Leaks in JavaEE Web Apps

A few weeks ago wikis.sun.com powered by Confluence “Enterprise” Wiki grew beyond yet another invisible line that triggered intermittent instabilities. Oh boy, how I love these moments. This time the issue was that Confluence just kept on running out of memory. Increasing the heap didn’t help, even breaking the 32bit barrier and using a 64bit JVM was not good enough to keep the app running for more than 24 hours.

The Xmx size of the heap suggested that something was out of order. It was time to take a heap dump using jmap and check what was consuming so much memory. I tried jhat to analyze the heap dump, but 3.5GB dump was just too much for it. The next tool I used was IBM’s Heap Analyzer – a decent tool, which was able to read the dump, but consumed a lot of memory in order to do so (~8GB), and was pretty hard to use once the dump was processed.

While looking for more heap analyzing tools, I found SAP Memory Analyzer, now known as Eclipse Memory Analyzer, a.k.a MAT. I thought “What the heck does SAP know about JVM?” and reluctantly gave it a try, only to find out how prejudiced I was. MAT is a really wonderful tool, which was able to process the heap really quickly, visualize the heap in a easy-to-navigate way, use special algorithms to find suspicious memory regions, and all of that while using only ~2GB of memory. An excellent preso that walks through MAT features and how heap and memory leaks work, can be found here.

Thanks to MAT I was able to create two bug reports for folks at Atlassian (CONF-14988, CONF-14989). The only feature I missed was some kind of PDF or HTML export, but I did quite well with using Skitch to take screenshots and annotate them.

One of the leaks was confirmed right away, while it wasn’t clear what was causing the other one. All we knew was that significant amounts of memory were retained via ThreadLocal variables. More debugging was in order.

I got this idea to create a servlet filter, that would inspect the thread-local store for the thread currently processing the request and log any thread-local references that exist before the request is dispatched down the chain and also when it comes back. Such a servlet could be packaged as a Confluence Servlet Filter Plugin, so that it is convenient to develop and deploy it.

There was only one problem with this idea, the thread-local store is a private field of the Thread class and is in fact implemented as an inner class with a package default access – kinda hard to get your hands on to. Thankfully private stuff is not necessarily private in Java, if you get your hands dirty with reflection code:

Thread thread = Thread.currentThread();

Field threadLocalsField = Thread.class.getDeclaredField("threadLocals");
threadLocalsField.setAccessible(true);

Class threadLocalMapKlazz = Class.forName("java.lang.ThreadLocal$ThreadLocalMap");
Field tableField = threadLocalMapKlazz.getDeclaredField("table");
tableField.setAccessible(true);

Object table = tableField.get(threadLocalsField.get(thread));

int threadLocalCount = Array.getLength(table);
StringBuilder sb = new StringBuilder();
StringBuilder classSb = new StringBuilder();


int leakCount = 0;

for (int i=0; i < threadLocalCount; i++) {
    Object entry = Array.get(table, i);
    if (entry != null) {
        Field valueField = entry.getClass().getDeclaredField("value");
        valueField.setAccessible(true);
        Object value = valueField.get(entry);
        if (value != null) {
            classSb.append(value.getClass().getName()).append(", ");
        } else {
            classSb.append("null, ");
        }
        leakCount++;
    }
}

sb.append("possible ThreadLocal leaks: ")
        .append(leakCount)
        .append(" of ")
        .append(threadLocalCount)
        .append(" = [")
        .append(classSb.substring(0, classSb.length() - 2))
        .append("] ");

logger.warn(sb);

A simple plugin like this, was able to confirm that the leaked SAXParser instances are created and stored as thread-local variables somewhere within the code that exports content as PDF. That is good enough info to pinpoint the exact line of code that creates the thread-local instance by BTrace (or code review), but that's a story for a separate blog post.

The morale of the story: ThreadLocal variables are a very powerful feature, which as is common for powerful stuff can result in a lot of nasty things when not used properly. Hopefully all the info I provided to Atlassian will be enough to get a speedy fix for the issue and bring stability to wikis.sun.com - at least until we step over the next "invisible line".

Sales, Status, and Talks

Important sales milestone over the weekend and the Lulu revenue nosed past my monetary start up costs for putting the book and the web site together. So, thanks to those of you that have bought the book so far—I appreciate it. If you’re enjoying it so far, let me know, tell a friend, say nice things online, you know, that kind of thing. If you aren’t enjoying it so far, let me know so that I can work on making the book better over time.

I think the ongoing release schedule will go like this: next Monday, then every other Monday until I feel sufficiently done. Next week’s release will have a long Cucumber section, a shorter autotest section, and I’m not sure what else. However, I’m trying to get on a more agile-like iteration schedule, so I’ll release on the 6th no matter what else is done or not done.

In the meantime, looks like there’s kind of an arms race in my RailsConf time slot. I’m at the same Tuesday 2:50 PM time slot. I’m now ten percent more scared that I’ll be speaking to an empty room. Actually, I’m hoping to have some promotional stuff of some kind—turns out I have a book to promote. But I don’t know if I’ll be able to get that done or not. I’ll have more to say about RailsConf closer to the date, but the short version is, come to my talk, say hi.

10 Cool Things in Rails 2.3

This was presented to the Ruby Users of Minnesota on March 30, 2009.

Here’s a quick look at 10 new Rails features that I think are cool. Not all of them are huge new features, but instead help solve annoying problems. I’ve also created a simple application that demonstrates most of these features. You can get it at BitBucket

10 Cool Things About Rails 2.3<object height=”355″ width=”425″><param /><param /><param /><embed src=”http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=10coolthingsaboutrails2-3-090329160723-phpapp01&#38;stripped_title=10-cool-things-about-rails-23″ height=”355″ width=”425″></embed></object>

View more presentations from lukefrancl.

1. Rails Boots Faster in Development Mode

This is something all Rails developers can appreciate. In development mode, Rails now lazy loads as much as possible so that the server starts up much faster.

This is so fast, instead of replying on reloading (which doesn’t pick up changes to gems, lib directory, etc) one developer wrote a script (does anyone have the link for this?) that watches for file system changes and restarts your script/server process.

Using an empty Rails app, I got the following (totally non-scientific) real times for time script/server -d:

Rails 2.2: 1.461s
Rails 2.3: 0.869s

Presumably this difference would grow as more libraries were used, because Rails 2.3 will lazy load them. However I was too lazy to build up equivalent Rails 2.2 and 2.3 applications to try that out.

2. Rails Engines Officially Supported

Inspired by Merb’s slices implementation, Rails added official support for Engines, which are self-contained Rails apps that you can install into another application. Engines can have their own models, controllers, and views, and add their own routes.

Previously this was possible using the Engines plugin, but Engines would often break between Rails versions. Now that they are officially supported, this should be less frequent.

There are still some features from the unofficial Engines plugin that are not part of Rails core. You can read about that at the Rails Engines site.

3. Routing Improvements

RESTful routes now use less memory because formatted_* routes are no longer generated, resulting in a 50% memory savings.

Given this route:

map.resources :users

If you want to access the XML formatted version of a user resource, you would use:

user_path(123, :format => 'xml')

In Rails 2.3, :only and :except options to map.resources are not passed down to nested routes. The previous behavior was rather confusing so I think this is a good change.

1
2
3
4
map.resources :users, :only => [:index, :new, :create] do |user|
  # now will generate all the routes for hobbies
  user.resources :hobbies
end

4. JSON Improvements

ActiveSupport::JSON has been improved.

to_json will always quote keys now, per the JSON spec.

Before:

{123 => 'abc'}.to_json
=> '{123: "abc"}'

Now:

{123 => 'abc'}.to_json
=> '{"123": "abc"}'

Escaped Unicode characters will now be unescaped.

Before:

ActiveSupport::JSON.decode("{'hello': 'fa\\u00e7ade'}")
=> {"hello"=>"fa\\u00e7ade"}

Now:

ActiveSupport::JSON.decode("{'hello': 'fa\u00e7ade'}")
=> {"hello"=>"façade"}

See ticket 11000 for details.

5. Default scopes

Prior to Rails 2.3, if you executed a find without any options, you’d get the objects back unordered (technically, the database does not guarantee a particular ordering, but it would typically be by primary key, ascending).

Now, you can define the default sort and filtering options for finding models. The default scope works just like a named scope, but is used by default.

1
2
3
class User < ActiveRecord::Base
  default_scope :order => '`users`.name asc'
end

The default options can always be overridden using a custom finder.

User.all # will use default scope
User.all(:order => 'name desc') # will use passed in order option.

Example:

1
2
3
4
5
6
7
8
9
User.create(:name => 'George')
User.create(:name => 'Bob')
User.create(:name => 'Alice')

puts User.all.map { |u| "#{u.id} - #{u.name}" }

3 - Alice
2 - Bob
1 - George

Note how the default order is respected.

6. Nested Transactions

Pass :requires_new => true to ActiveRecord::Base.transaction and a nested transaction will be created.

1
2
3
4
5
6
7
User.transaction do
  user1 = User.create(:name => "Alice")

  User.transaction(:requires_new => true) do
    user2 = User.create(:name => "Bob")
   end
end

This is actually emulated using save points because most databases do not support nested transactions. Some databases (SQLite) don’t support either save points or nested transactions, so in that case this works
just like Rails 2.2 where the inner transaction(s) have no effect and if there are any exceptions the entire transaction is rolled back.

7. Asset Host Objects

Since Rails 2.1, you could configure Rails to use an asset_host that was a Proc with two arguments, source and request.

For example, some browsers complain if an SSL request loads images from a non-secure source. To make sure SSL always loads from the same host, you could write this (from the documentation):

1
2
3
4
5
6
7
ActionController::Base.asset_host = Proc.new { |source, request|
  if request.ssl?
    "#{request.protocol}#{request.host_with_port}"
  else
    "#{request.protocol}assets.example.com"
  end
}

This works but it’s kind of messy and it’s difficult to implement complicated logic. Rails 2.3 allows you to implement the logic in an object that responds to call with one or two parameters, like the
Proc.

The above Proc could be implemented like this:

1
2
3
4
5
6
7
8
9
10
11
class SslAssetHost
  def call(source, request)
    if request.ssl?
      "#{request.protocol}#{request.host_with_port}"
    else
      "#{request.protocol}assets.example.com"
    end
  end
end

ActionController::Base.asset_host = SslAssetHost.new

David Heinemeier Hansson has already created a better plugin that handles this case: asset-hosting-with-minimum-ssl. It takes into account the peculiarities of the different browsers to use SSL as little as possible, reducing load on your server.

8. Easily update Rails timestamp fields

If you’ve ever wanted to update Rails’ automatic timestamp fields created_at or updated_at you’ve noticed how painful it can be. Rails REALLY didn’t want you to change those fields.

Not any more!

Now you can easily change created_at and updated_at:

1
2
3
4

User.create(:name => "Alice", :created_at => 3.weeks.ago, :updated_at => 2.weeks.ago)

=> #<User id: 3, name: "Alice", created_at: "2009-03-08 00:06:58", updated_at: "2009-03-15 00:06:58">

Remember, If you don’t want your users changing these fields, you should make them attr_protected.

9. Nested Attributes and Forms

This greatly simplifies complex forms that deal with multiple objects.

First, nested attributes allow a parent object to delegate assignment to its child objects.

1
2
3
4
5
6
7
8
9
10

class User < ActiveRecord::Base
 has_many :hobbies, :dependent => :destroy

  accepts_nested_attributes_for :hobbies
end

User.create(:name => 'Stan', 
            :hobbies_attributes => [{:name => 'Water skiing'},
                                    {:name => 'Hiking'}])

Nicely, this will save the parent and its associated models together and if there are any errors, none of the objects will be saved.

Forms with complex objects are now straight-forward. To use this in your forms, use the FormBuilder instance’s fields_for method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<% form_for(@user) do |f| %>
  <div>
    <%= f.label :name, "User name:" %>
    <%= f.text_field :name %>
  </div>

  <div>
    <h2>Hobbies</h2>

    <% f.fields_for(:hobbies) do |hf| %>
      <div>
  <%= hf.label :name, "Hobby name:" %>
  <%= hf.text_field :name %>
      </div>
      <% end %>
  </div>

  <%= f.submit 'Create' %>
<% end %>

One catch is that a form is displayed for every associated object. New objects obviously have no associations so you have to create a dummy object in your controller.

1
2
3
4
5
6
7
8
9
10
class UsersController < ApplicationController
  def new
    # In this contrived example, I create 3 dummy objects so I'll get
    # 3 blank form fields.
    @user = User.new
    @user.hobbies.build
    @user.hobbies.build
    @user.hobbies.build
  end
end

There are a lot of options for nested forms including deleting associated objects, so be sure to read the documentation. Ryan Daigle also has a great write-up.

10. Rails Metal \m/

You can now write very simple Rack endpoints for highly trafficked routes, like an API. These are slotted in before Rails picks up the route.

A Metal endpoint is any class that conforms to the Rack spec (i.e., it has a call method that takes an environment and returns the an array of status code, headers, and content).

Put your class in app/metal (not generated by default). Return a 404 response code for any requests you don’t want to handle. These will get passed on to Rails.

There’s a generator you can use to create an example Metal end point:

script/generate metal classname

In my sample app, I have what I would consider the “minimally useful” Rails Metal endpoint. It responds to /users.js and returns the list of users as JSON.

1
2
3
4
5
6
7
8
9
10
11
class UsersApi
  def self.call(env)
    # if this path was /users.js, reply with the list of users
    if env['PATH_INFO'] =~ /^\/users.js/
      [200, {'Content-Type' => 'application/json'}, User.all.to_json]
    else
      # otherwise, bail out with a 404 and let Rails handle the request
      [404, {'Content-Type' => 'text/html'}, 'not found']
    end
  end
end

If you want a little bit more help, you can use any other Rack-based framework, for example Sinatra.

For more details on how Rails Metal works, check out Jesse Newland’s article about it.

Thanks for reading! For more details about new features in Rails 2.3, read the excellent release notes

#155 Beginning with Cucumber

Cucumber is a high-level testing framework. In this episode we will create a new Rails application from scratch using behavior driven development.

#155 Beginning with Cucumber

Cucumber is a high-level testing framework. In this episode we will create a new Rails application from scratch using behavior driven development.

OAuth Explained and What It Is Good For

In which I attempt to explain OAuth in simple terms and what it is good for.

OAuthTwitter recently announced OAuth support and that eventually they will be deprecating HTTP Basic Authentication in favor of OAuth. Knowing this, I figured it was about time to get familiar with OAuth and update the Twitter gem to use it.

Let me start by explaining my history with OAuth. I have none. There that was fast. I didn’t read the specifications or any articles on OAuth. I simply dove in code first and tried to figure out what was going on and how to make things work. I promise that what I’m about to explain will be simple and that I will not spend any time on hmac-sha1 signatures.

For the code examples below, I’ll be using the OAuth gem. Install is typical and I’ll provide it below for copy and paste.

sudo gem install oauth

Tokens and Secrets

At first, the hardest thing to figure out was all the tokens and secrets. Basically, there are three sets of made up of a token and a secret. Each set builds upon the last. The three sets are consumer, request and access.

The consumer token and secret are provided for you by the OAuth provider, when you register an application with them. These basically define what application is attempting to do the deed. You can create a new consumer like this:

consumer = OAuth::Consumer.new(
  'consumer token', 
  'consumer secret', 
  {:site => 'http://twitter.com'}
)

Before I go on, let me explain the end goal. The end goal with OAuth is to get an access token and secret. Once you have these, requesting a users information is much like it would be with HTTP Basic Authentication from a Ruby API point of view.

Request Token

In order to get the access token, you have to create a request token, keep track of it, and then redirect the user to the provider to authorize your application. You can create a new request token, using the consumer you just created, like so:

request_token = consumer.get_request_token
puts request_token.token, request_token.secret

The other thing that the request token provides is an authorization URL. The authorization URL is where your application redirects to, to allow the user to grant or deny you access to the their data. You can get the authorization URL in quite a predictable manner:

request_token.authorize_url

If you were using Rails, you would simple redirect to this URL. Before you redirect, be sure to store the request token and secret as you’ll need those to create the access token when the user returns to your application.

Whether your app is on the web or something else, the aforementioned steps are the same. The difference is if you have a web app, you provide a redirect URL that the provider sends the user back to upon granting access, whereas if you have a desktop app, or the like, the page will just inform the user to head back to the app.

Access Token

Either way, once the user is back at your app, you use the request token and secret to generate the access token. In the code below, you would use the request token and secret that you stored in the session before redirecting to the authorization URL.

request_token = OAuth::RequestToken.new(consumer, 'request token', 'request secret')
access_token = request_token.get_access_token

Once you have the access token, you can make authenticated requests like so:

puts access_token.get('/statuses/friends_timeline.json')

I haven’t had to go this far yet, but I’m assuming you would store the access token and secret in the database or something for later use. Someone correct me if I’m wrong about this. Next time you need to access the user’s data, you can simply create a new access token from the token and secret, and once again make authenticated requests.

access_token = OAuth::AccessToken.new(consumer, 'access token', 'access secret')
puts access_token.get('/statuses/friends_timeline.json')

The Twitter Gem

So how does all this affect the Twitter gem? Currently, setting up the authentication for the Twitter gem is as easy as this:

Twitter::Base.new('email', 'password')

In the OAuth version of the Twitter gem, setting up the authentication will be something like this:

Twitter::Base.new('access token', 'access secret')

So, pretty much the same. The only difference is that you will have to register your application with Twitter and then go through the process of getting the access token and secret. The good news is I already have a Twitter::OAuth class that will make this process as easy as possible.

What It Is Good For

I see three common methods for authenticating web services—HTTP Authentication, API keys and OAuth. HTTP Auth is simple, but then you end up with usernames and passwords all over the interwebs. OAuth is better in that users don’t give out passwords and can cancel at anytime with any app they have granted permission to.

The next question might be, “Wouldn’t an API key system be more simple than OAuth?” The answer is yes—for developers. For developers, it is easier to just tack an API key on every request than to go through the process of all the aforementioned tokens and secrets. So if an API key system is easier for developers, why use OAuth?

I see OAuth as the easiest solution for users. When determining how something will work for users, I often user the mom test. I guarantee you if my mom wanted to print a picture from flickr, she would not be able to figure out how to create an API key on Flickr and then use it with the printing site. Think about the opposite though. I definitely think that when redirected to flickr from the printing site, she could click yes to grant the printing site access.

OAuth is easier for users and API keys are easier for developers. When you go to build your API, take your primary audience into consideration and pick the solution that suites them best.

Hopefully this primer helps those (like me a day ago) who haven’t taken the time to get to know OAuth, and also those who have enjoyed the Twitter gem and are curious about what implications OAuth brings to the table. Formerly, I thought OAuth was over complicated, but now that I’ve spent a few hours with it, I’m pretty comfortable. Oh, and the other cool thing about this is now I have a base to work from for adding OAuth support to HTTParty.

Any OAuth junkies out there, feel free to correct me or elaborate in the comments.

Update: For more on OAuth, you can read how I updated the Twitter gem to use it.

Some Further Reading

Highgroove and Scout – Ruby on Rails Podcast

The members of Highgroove Studios talk about the technical details behind an update to their Rails-based server monitoring application.

Also mentioned

Highgroove and Scout – Ruby on Rails Podcast

The members of Highgroove Studios talk about the technical details behind an update to their Rails-based server monitoring application.

Also mentioned

Radiant CMS in 5 Minutes Or Less

Radiant logo

Radiant is an excellent Rails-based Content Management System (CMS). It was created by John W. Long and Sean Cribbs, and has been around for a couple of years, growing steadily in popularity. With the recent addition of taps and gem manifeststs, it’s super-easy to get this lightweight CMS up and running on Heroku.

Start by installing the latest radiant gem on your local box:

$ sudo gem install radiant

Now use the radiant command-line tool to set up your Radiant CMS locally. We’ll use SQLite as the local database:

$ radiant --database sqlite mycms
$ cd mycms
$ rake db:bootstrap

Before we can push to Heroku, we’ll need to initialize a git repo in our project directory:

$ git init

By default, Radiant caches CMS pages in RAILS_ROOT/cache. This won’t work with Heroku’s read-only file system, so before deploying we’ll change it to make sure cached files are written in the tmp directory. Open up your config/environment.rb, and change the cache config line so it reads:

config.action_controller.page_cache_directory = "#{RAILS_ROOT}/tmp/cache"

We’ll also add a gem manifest to make sure the radiant gem is installed on Heroku when we push. Radiant depends on rSpec 1.2.2, so our .gems file should look like this:

rspec --version 1.2.2
radiant --version 0.7.1

Then commit your changes:

$ git add .
$ git commit -m "changed cache dir and added gem manifest"

Now it’s time to create an app on Heroku and deploy this baby to it.

$ heroku create
Created http://vivid-fog-54.heroku.com/ | git@heroku.com:vivid-fog-54.git
Git remote heroku added
$ git push heroku master
....
-----> Heroku receiving push
-----> Rails app detected
       Compiled slug size is 5.4MB
-----> Launching.......... done
       App deployed to Heroku

Finally, we’ll transfer over our local database using taps:

$ heroku db:push
Auto-detected local database: sqlite://db/development.sqlite3
Sending schema
Sending data
9 tables, 57 records
schema_migrat: 100% |==================| Time: 00:00:00
config:        100% |==================| Time: 00:00:00
page_parts:    100% |==================| Time: 00:00:00
extension_met: 100% |==================| Time: 00:00:00
sessions:      100% |==================| Time: 00:00:00
pages:         100% |==================| Time: 00:00:00
snippets:      100% |==================| Time: 00:00:00
layouts:       100% |==================| Time: 00:00:00
users:         100% |==================| Time: 00:00:00
Sending indexes

And now our Radiant instance is live and ready to go!

Radiant on Heroku

In 1492, Columbus Discovered…A Feed

In which I release a new gem named Columbus that auto-discovers feed urls.

Sorry for the rush of posts the past few days. I’ve been feeling a bit inspired of late, and with the discovery of Jeweler, it is now easier to make a gem than it is to just let the code sit in ~/dev/ruby/ on my computer.

A few weeks ago, I showed how to follow redirects using net/http, but I provided no context really as to why I wanted to follow redirects. Basically, that code was one piece of some code I wrote to auto-discover feed urls for a given url.

I was a little more creative with the name this time than last time, calling it Columbus. Get it? Auto-discovery and Columbus was a discoverer. Yeah, you get it. I’m sure I don’t need to explain it. Right? Yeah.

Usage

There isn’t much code in the gem and using it is even easier.

# get the primary feed
primary = Columbus.new('http://railstips.org').primary
puts primary.url, primary.title, primary.body

# get all the feeds
Columbus.new('http://railstips.org').all

The first returns a single feed if one is found else nil. The second returns all the feeds found in an array. That probably doesn’t feel like much, but there is a lot more going on behind the scenes.

Behind the Scenes

  1. Gets the response for the passed in url.
  2. If the URL is a redirect, it follows the redirect up to 5 times to find the endpoint.
  3. Once it has the endpoint, it uses Hpricot to get all the link tags in the response body that appear to be RSS or Atom feeds.
  4. For each link tag found, it gets the response for the URL and once again follows redirects up to 5 times until it finds an endpoint.
  5. Once the endpoint for each feed is found, it returns the URL, the title and the response body for you to fart around with.

Some Details

Once again, I used shoulda, matchy and fakeweb to do the testing. I didn’t need HTTParty, but I did break out an old friend Hpricot, which I haven’t used since XML parsing in the Twitter gem. Kind of funny that this is the first time I used Hpricot for its original intent, parsing HTML.

Installation

For now the gem is just up on Github so the usual routine will get your going.

sudo gem install jnunemaker-columbus --source http://gems.github.com

Hopefully someone finds it useful someday. 🙂 I’ve already got my mileage out of it.

Rails Envy Podcast – Episode #072: 03/25/2009

Episode 72.This week we’re back with improved audio quality on both sides of the microphone, and all of the great Ruby and Rails news from this week.

Subscribe via iTunes – iTunes only link.
Download the podcast ~12:30 mins MP3.
Subscribe to feed via RSS by copying the link to your RSS Reader


Sponsored by New Relic
The Rails Envy podcast is also brought to you this week by NewRelic. NewRelic provides RPM which is a plugin for rails that allows you to monitor and quickly diagnose problems with your Rails application in real time. They also recently produced Rails Lab, which gives you expert advice on tuning and optimizing your Rails app.

Sponsored by Hashrocket
The Rails Envy podcast is brought to you this week by Hashrocket. Hashrocket is an expert consultancy group that uses best-of-breed technologies like Ruby on Rails to deliver the highest quality software in the least amount of time.

Building API Wrapping Gems Could Not Get Much Easier

In which I show how easy it is now to create ruby gems that wrap APIs, using Google Weather as an example.

Google has a weather api that is dead simple to use. Just discovered that tonight so I whipped together a wrapper using HTTParty. I decided to try out Jeweler, a project by Josh Nichols, that makes creating gems a snap and it delivered. I used shoulda and fakeweb for the tests. Holy crap has making a gem that wraps a web service become really easy.

The New Way

  1. jeweler google-weather —shoulda —create-repo
  2. %w(matchy fakeweb).each { |x| require x } (in your test_helper)
  3. require ‘httparty’
  4. Add some code and tests
  5. rake version:bump:minor
  6. rake gemspec
  7. git push origin master
  8. blog

I did all of these in about an hour or two tonight.

The Old Way

  1. Create a bunch of files and directories and make a bunch of decisions
  2. mock and stub all net/http stuff
  3. net/http and rexml (or hpricot once that came along)
  4. Add some code and maybe some tests
  5. Add a version
  6. Figure out how to build a gemspec
  7. svn commit your files
  8. Request project to be created on rubyforge
  9. Wait a few days
  10. Project approved, release files, blog

And it would take a few days from first code scratched to gem released. My how times are a changing.

Stuff You Can Learn From This Gem

At any rate, the GoogleWeather gem I just created is a really simple example of how to use:

  • jeweler to create and manage a gem
  • httparty to pwn an API
  • shoulda to test the gem
  • fakeweb to make sure your tests aren’t making real web requests
  • matchy for some syntactical sugar

If you want to learn any of those things, poke around in the code a bit and you should be good to go. Also, if you want a really easy way to get weather information, this gem makes that possible.

Sorry I didn’t give it some fancy name like HTTParty or HappyMapper. Maybe I need to make another gem that spits out fancy names. After all, naming the project is the only thing left that is hard. 😉

Buckets: Preview

So, yeah. With Capistrano and friends off my plate, I’ve actually found time to work on a project that has been in the works for years (and that’s no exaggeration, I first mentioned it in a blog post in October 2004). I’ve named it and renamed it (“Penny Pincher”, “Chump Change”, “Make Me Rich”, and “BudgetWise”) but its current incarnation is “Buckets”.

Buckets is a simple web-based personal finance application that I’ve been working on, written specifically for my wife and me. Its focus is on simple budgeting, and reducing debt, and It is intentionally “feature-poor”. It is loosely based on an envelope budgeting strategy, and while it definitely isn’t the only web-based finance app using such a strategy, it just may be the simplest.

I recorded a screencast demonstrating the budgeting aspect of Buckets; it’s a 5M QuickTime movie, 2:42 in length. Click here to view it, if you care to..

Buckets is still private: it has been deployed and my wife and I are using it, but that’s it. The source code is in a private repository on GitHub, and the production instance of the app is currently only accessible to me. That will change eventually (maybe a couple of weeks, depending on how initial testing goes), but I want to make sure it’s actually going to be useful before I open it up.

Time tracking on Rails 2.3.2

time.onrails.org was first deployed in 2005 and the last time I deployed it was in July 2007. This application is in use by several hundred people daily and several thousands did sign up over the year. It’s written in old-style Rails (pre-resources) and I did port it to Rails 1.2 a while back. So today I decided to run the test suite and had a list of deprecation warnings for Rails 2.0. I did fix them all, then decided to run against Rails 2.3.2. A couple of more issues where identified (tests with fixtures should use ActionController::TestCase and ActiveRecord::TestCase) and all tests where passing. I used to have a timezone bug related to the old Rails support of time zones, so I decided to bite the bullet and try out the Rails 2.0 time zone support and found a good description here. The Continue reading “Time tracking on Rails 2.3.2”

Time tracking on Rails 2.3.2

time.onrails.org was first deployed in 2005 and the last time I deployed it was in July 2007. This application is in use by several hundred people daily and several thousands did sign up over the year. It’s written in old-style Rails (pre-resources) and I did port it to Rails 1.2 a while back. So today I decided to run the test suite and had a list of deprecation warnings for Rails 2.0. I did fix them all, then decided to run against Rails 2.3.2. A couple of more issues where identified (tests with fixtures should use ActionController::TestCase and ActiveRecord::TestCase) and all tests where passing. I used to have a timezone bug related to the old Rails support of time zones, so I decided to bite the bullet and try out the Rails 2.0 time zone support and found a good description here. The change was straight forward, et voila, time.onrails.org running on 2.3….not so quick. I did a cap deploy and then realized that I didn’t have the latest version of Rails on my deployment system, nor did I have the latest version of the gems. So after a ‘gem update –system I encountered a gem related issue but with this solution I was back in business….Et voila, time.onrails.org running on 2.3.2! Note it’s still old style Rails and needs a good rewrite, but if you need a free time tracking application, just go try it out.

#154 Polymorphic Association

Polymorphic associations can be perplexing. In this episode I show you how to set it up in Active Record and then move to the controller and view layer.

#154 Polymorphic Association

Polymorphic associations can be perplexing. In this episode I show you how to set it up in Active Record and then move to the controller and view layer.