Duck it.

Ruby is not a strongly typed language so we sometimes need to check types explicitly. In a ducktyped world this might be less obvious than you might think. In particular, testing code that is testing input types explicitly can be harder than needed. This morning I came to work with a bunch of spec failures […]

ZOMG assembly metaprogrammin shitz!

Joe Damato over at timetobleed.com just let their new memory profiler out of the box.

Go read more about it here…and whoever is interested in Ruby internals and lowlevel x86 architecture, do take the time to read through the previous posts about the road to achieve unintrusive instrumentation to the VM. Some pretty crazy ZOMG assembly metaprogramming shitz.

[SOLVED] 64bit MySQL and 32bit Ruby: FAIL

For the people out there using RVM to rock multiple Ruby versions, here’s a gotcha with REE and Snow Leopard that took me hours to solve: if you happened to boot your mac in 32-bit mode when building ree with the oh so awesome:

rvm install ree

And then install MySQL 64bit and try to install the gem, ruby will positively hate you. It will also do it’s best not to help you understand the issue whining about missing libraries and whatnot.

No, it’s not a library issue. Your ruby is built for 32bit and the mysql libs are 64bit. I know for sure nokogiri has the same issue and I think many others as well.

What kernel version am I running?

~# uname -m
x86_64

~# uname -m
i386

To force 64bit mode, reboot holding down the “6” and “4” keys on the keyboard.

Recompiling your rubies is fun for sure, but if you get tired of it, here’s how to make the 64bit switch permanent:

~# sudo vim /Library/Preferences/SystemConfiguration/com.apple.Boot.plist

Change the last key to arch=x86_64. You’ll end up with:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>Kernel</key>
        <string>mach_kernel</string>
        <key>Kernel Flags</key>
        <string>arch=x86_64</string>
</dict>
</plist>

To check if a lib is 64bit, run:

~# lipo -info /usr/local/mysql/lib/libmysqld.a 
input file /usr/local/mysql/lib/libmysqld.a is not a fat file
Non-fat file: /usr/local/mysql/lib/libmysqld.a is architecture: x86_64

What about my ruby version?

~# lipo -info ~/.rvm/ree-1.8.7-2009.10/lib/libruby-static.a 
input file /Users/david/.rvm/ree-1.8.7-2009.10/lib/libruby-static.a is not a fat file
Non-fat file: /Users/david/.rvm/ree-1.8.7-2009.10/lib/libruby-static.a is architecture: x86_64

~# lipo -info ~/.rvm/ruby-1.9.1-p243/lib/libruby-static.a 
input file /Users/david/.rvm/ruby-1.9.1-p243/lib/libruby-static.a is not a fat file
Non-fat file: /Users/david/.rvm/ruby-1.9.1-p243/lib/libruby-static.a is architecture: i386
</code

Looks like I’ll have to nuke my Ruby 1.9 install too. Sigh.

Give us the fix already!

Ok ok. Here it goes:

# One
echo "rvm_archflags='-arch x86_64'" > ~/.rvmrc

Now rvm will now know what we want.

That’s not enough though and we need to make sure REE doesn’t build with tcmalloc (at least until the good people at Google solve the non-64bit-ness of it). Only thing is rvm as of v0.64 has a little bug stopping params to be passed along. The soon to be released v0.75 fixes this but for now you need head:

# Two
rvm rvm update --head

Now we’re ready to reinstall REE:

# Three
rvm --force install ree --ree-options --no-tcmalloc

Many thanks to wayneeseguin for helping with this issue! Rvm is the best thing since sliced bread.

Good Ruby Times

In the last few weeks I’ve had the chance to glance into several different codebases, some written years ago by devs in their Rails-infancy (rainfancy?).

Doing the travelling codes-man like that is a golden opportunity to see what people’s stumbling blocks are and how a few easy tricks can improve the code substantially.

What follows is not your new cutting edge coffe making kitchen sink script-fu, but a rather dull list of everyday Ruby that I learned people need to learn.

One-stop requires

This is a short one: requires go into the environment.rb.

Really, don’t use it elsewhere. I’ve seen people reason along the lines of “I keep the ‘require hpricot’ in my xe.com currency rates scraping class as to keep things local and closely knit”. There is some truth to that argument and if a gem is only used by a rake task, maybe it even saves us a second during app startup and some RAM. Still, load order issues, poor readability and maintainability makes it a real pain to deal with down the road.

Trust me: don’t. Put it in environment.rb with a comment describing its purpose in your project.

Whenever possible use the “config.gem”. Advantages:

  • allows you to specify the lib name even if the gem name is different (e.g. for github gems)
  • allows you to require sub libs

    Not used very often perhaps but put to good use by e.g. the right_aws gem where you can choose to load just the SQS part with:

    config.gem 'right_aws', :lib => 'sqs/right_sqs'
    

    or maybe you just wanted the SDB part?

    config.gem 'right_aws', :lib => 'sdb/right_sdb_interface'
    

    I personally believe this particular feature is more confusing than helpful, but it’s good to know it’s there.

  • allows you to specify the library version

    Towards the end of the development cycle, before rolling out to production, gems should generally be frozen. An alternative is to define exact versions in the config. I generally advise doing both as having the environment.rb as the one-stop source for gem requirements is a boon for everybody involved, capistrano scripts or new devs on your team.

  • all in one place, easy to find out requirements

Singleton methods (creation)

Ruby provides three (common) ways to define a singleton method:

  1. Explicitly name the class

    class RapidInterventionGroupTeam < ActiveRecord::Base
      def RapidInterventionGroupTeam.load_group_allocations_and_teams(project_id, curr_user)
        # Code
      end
    end
            
    

    PROs: hmm, dunno. Makes it easy to know what class you’re looking at if the source code file is very long?

    CONs: long, ugly

  2. Use self

    class RapidInterventionGroupTeam < ActiveRecord::Base
      def self.load_group_allocations_and_teams(project_id, curr_user)
        # Code
      end
    end
          
    

    PROs: easy to spot here-comes-a-singleton-marker. Some people like to use the ‘self’ keyword. I call them self-ish people. Sorta short.

    CONs: I think it’s ugly. For more than two singletons that repeating ‘self’ annoys me.

  3. Use the metaclass

          
    class RapidInterventionGroupTeam < ActiveRecord::Base
      # ==========================
      # = Class methods go here! =
      # ==========================
      class << self
        def load_group_allocations_and_teams(project_id, cur_user)
          # Code
        end
      
        def team_allocations
          # Code
        end
      
        def group_allocations
          # Code
        end
      end
    
      # ============================
      # = Instance methods go here =
      # ============================
      def beef_cow_and_fowl
        # Code
      end
    end
          
    

    PROs: shortest. Encourages devs to gather singletons in one spot, and perhaps they will notice when it gets out of hand and start creating mixins. Maybe. Also allows devs to show off their Ruby metaclass knowledge at bars. Maybe.

    CONs: that class << self is funky and if funky doesn’t rub well with you (or your boss) then I guess it’s no good.

I personally favor c) whenever the number of class methods goes above two or three; b) is fine when there are just a few of them.

Invoking singleton methods

In Ruby there are many ways to invoke a class method. A few common ones:

  1. Name the class explicitly:

    user = User.find(123)
    Predators.feed_to_wolves(1, user)
          
    

  2. When used from an instance of the class, use the context: self.class

    class Predators < ActiveRecord::Base
      def self.feed_to_wolves(pack_id, meat)
        # code
      end
      
      # we know how we were born...
      def feed_us!(meat)
        self.class.feed_to_wolves(id, meat)
      end
    end
        
    

  3. Use a helper instance method

        
    class Steak
      def self.logger
        @logger ||= Logger.new('log/steak.log')
      def
    
      def logger
        self.class.logger
      end
    
      def beef
        logger.debug "MONDAY BEEF!"
        beef_for!(:monday)
      end
    end
          
    

    Using an instance method this way to invoke a class level method is of course useful only if you invoke it very often and readability becomes very important. In the above example we’re also memoizing (caching) the class level Logger object for fast access.

I personally try 1) to avoid singletons altogether, 2) use method b) whenever possible and 3) start asking myself hard and awkward questions when doing a) more than 3 times a day. Too many singletons might be a codesmell.

Multiple return values

First an example of how people often fake multiple value returns in Ruby:

class Something
  def many
    [calc_monday_beef, calc_total_beef]
  end
end

s = Something.new
retval = s.many
  

DON’T: when coding a method and you realize you need to return more than one value from a method, you should stop and think. You’re probably doing something wrong.
This is actually the reason why Ruby does not provide a way to return multiple values.

If you’re sure you know what you’re doing and want to return more than value here are some tips:

  1. do NOT do what I did in the example.

  2. use descriptive variable names. “retval” above sucks. Use Ruby multiple value assignments like so:

    >> cows, sheep, fish = [1, 5, 3]
    => [1, 5, 3]
    >> cows
    => 1
    >> sheep
    => 5
    >> fish
    => 3
          
    

    Use this to assign multiple return values to local variables with helpful names:

          
    s = Something.new
    monday_beef, total_beef = s.many
          
    

  3. c) if you’re returning many values and you don’t know beforehand how many values are coming back, use the splat operator (*):

    >> me, *others = [:abe, :bob, :caesar, :donald]
    => [:abe, :bob, :caesar, :donald]
    >> me
    => :abe
    >> others
    => [:bob, :caesar, :donald]
          
    

  4. d) avoid using array indexes and prefer using Array#first, Array#last:

    • BAD:

        
      s = Something.new
      retval = s.many
      Beef.find(retval[0], :limit => retval[1])
                
      

    • STILL BAD SOMEWHAT BETTER:

      s = Something.new
      retval = s.many
      Beef.find(retval.first, :limit => retval.last)
                
      

      Why better? Because you communicate to the reader that that ‘retval’ variable only have two values. (Yeah, it still sucks I know)

    • ALSO BAD BUT SOMEWHAT-ER BETTER STILL:

      s = Something.new
      monday_beef, beef_count = s.many
      Beef.find(monday_beef, :limit => beef_count)
                
      

    • GOOD:

      s = Something.new
      Beef.find(s.monday_beef, :limit => s.total_beef_count)
                
      

Drop the return

Use the fact that Ruby methods return the value of the last expression. The only place where I think an explicit return is legit is when we want to return early and that’s a performance optimization and as we know, premature optimization is the root of all evil, so unless you have benchmarks at hand proving the need for speed, just don’t user ‘return’.

  • Instead of:

    def course_description
      retval = course_shortname 
      retval = course_shortname + "(#{project.shortname})" if project
      return retval
    end
          
    

  • Do:

          
    def course_description
      course_shortname + (project ? "(#{project.shortname})" : '')
    end
          
    

Enumerable

Learn to use and love ruby Enumerable module. It’s really really useful.

  • Instead of:

          
    def Vacation.holidaytype_selectbox
      types = Array.new
      for vt in VacationTypes
        types << vt[:name]
      end
      @holidaytype_selectbox_model ||= types
    end
        
    

  • Use:

        
    def self.holidaytype_selectbox
      @holidaytype_selectbox_model ||= VacationTypes.map(&:name)
    end
        
    

  • Instead of:

    for result in @candidate.results do
      result.destroy
    end
        
    

  • Use:

    @candidate.results.each(&:destroy)
    

  • Instead of:

    @approved = 0
    unless @do_it.nil?
      @events = Event.hourreporting_entries(@person.user_id, params[:projectid], @week.first.date, @week.last.date)
    
      for e in @events
        if e.approved == 1
          @approved += 1
        end
      end
    end
        
    

  • Do:

        
    if @do_it
      @events = Event.hourreporting_entries(@person.user_id, params[:projectid], @week.first.date, @week.last.date)
      @approved = @events.sum{|e| e.approved ? e.approved : 0}
    end
        
    

    The Array#sum method is a Rails addition.

In the app where I spotted the above, the default value for the ‘approved’ field is set to NULL. Had the default value been 0 instead, we could have written the above in an even shorter way:

@events.sum(&:approved)

Truth

In Ruby, everything except false and nil evaluates to true. For realz.

  • Instead of:

      
    if options[:add] == true && check_id.to_i != 0
      meanie = MeanGuy.find(check_id)
      if meanie.reason != '' && !meanie.reason.nil? && meanie.type == MeanGuy.allowed_types[0]
        # Code
      end
    end
        
    

  • Do:

      
    if options[:add] && !check_id.zero?
      meanie = MeanGuy.find(check_id)
      if !meanie.reason.blank? && meanie.type == MeanGuy.allowed_types.first
        # Code
      end
    end
        
    

When chaining conditional checks like above with “&&”, remember that Ruby will stop the checks at the first failure, so always put the ‘cheapest’ checks first (e.g. put any checks that require a database query last).

In the above snippet we see the very common “!something.blank?” idiom. I personally really don’t like the prepended “!” and always try to avoid it. Hurts my eyes and makes me stumble while reading. Ideally the above should be:

if meanie.reason && meanie.type == MeanGuy.allowed_type

but if the meanie reason is allowed to be the empty string — which Ruby will consider True — then we need the explicit blank?-check.

Unrelated neat trick using &&:

  • Instead of:

    def safe_death(dude)
      if dude
        if dude.destroy
          if DudeMailer.deliver_destruction_notification
            Call.his_mum
          end
        end
      end
    end
    safe_death(Dude.first)
          
    

  • Do:

    def safe_death(dude)
      dude && dude.destroy && DudeMailer.deliver_destruction_notification && Call.his_mum
    end
    safe_death(Dude.first)
          
    

  • This works well only if you don’t care about handling any error conditions
    and success depends on *all* method calls being successful.

    To improve readability in general create as many predicate methods as possible. They are short,
    cheap and really helps readability.

    • Instead of:

        
      if @person && @person.active == true && @person.projects.include?(:beef_on_monday) && (@person.type == :human || @person.type == :dog)
        # Code
      end        
          
      

    • Do:

      class Person
        def active_monday_beefer?
          actvive? && member_of?(:beef_on_monday) && (human? || dog?)
        end
      end
      
      if @person.active_monday_beefer?
        # Code
      end
          
      

    Method naming

    When naming your methods, avoid prepending “get_” or “set_”. The Ruby convention is that getters are just the value name and the setter has the “=” postfix. Avoid the quasi-get_’s too: load_, retrieve_, store_, put_. Sometimes they’re ok, but just stop a second and think if:

    store_incoming_beef()
    

    is really really better than

    more_beef_in_the_holds()
    

    or

    beefers_keepers()
    

    or

    not_pork_not_sheep_not_fowl_but?()
    

    As always rules are made to be broken, judiciously. 😉

    DON’T:

    #get_premium, #get_unix_time_stamp, #load_dates, #retrieve_mums etc
    

    Do:

    #premium, #unix_timestamp, #dates, #mums, #beefs
    

    DON’T:

    Server.set_clock(Time.now)
    @doc.store_signer(Dude.new)
    @tellus.put_fire(:wild)
    Time.set_back(1.year.ago)
    

    Do:

    Server.clock = Time.now
    @doc.signer = Dude.new
    @tellus.fire!(:wild)
    Time.now = 1.year.ago
    

    Keep method names short: e.g. #shorten_timespaces_because_of_special_holiday is too long. Long method names are often a sign of complex methods. Complex methods are often a sign we should refactor the code into smaller pieces.

    Sometimes the work done by a method is simply too complicated to be expressed properly
    by a short and self-explanatory method name. If so, don’t even try. Leave the method name
    short and cryptical. It becomes a marker for the reader of “hey, wanna understand this one?
    Sorry but you really really have to go read the code!”.

    DON’T:

    #shorten_timespaces_because_of_special_holiday
    

    DO:

    #holiday_adjustment
    

    Use the “?” for predicates (should *always* return true/false or at least something that evaluates to true false, such as “abc”/nil)

    class Session
      def finished?
        finished_at
      end
    end
    
    sess = Session.new
    sess.finished_at = Time.now
    sess.finished?
    

    GOOD:

    if sess.finished? 
      coffee!
    end
    

    BAD:

    if sess.finished_at
      coffee!
    end
    

    Use rdoc!

    User rdoc for your rails apps. It’s actually nice. When your mum asks you what it is you actually do you can show her the rdocs with links and text and colors and lists and it will make her happy too. And very useful for people trying to read your code (ok, they can run the freakin’ rake task themselves, but you should do it too!).

      rake doc:app
      open doc/app/index
    

Ruby Daemons and Angels

Unix is pretty good at managing processes. Fork is a simple yet powerful means of achieving parallelism for workloads that are reasonably well self-contained.

The web application development racket has a higher tier where the load can be so insane that the standard “receive request, process it, format a reply and send it back” just doesn’t cut it. I/O contention, traffic spikes and subsystem failures becomes the real hurdle for a successful website. As all human-machine interaction studies show, latency is one of the top characteristics of a successful software. Quick and accurate feedback to users is paramount to communicate solidity and high quality.

At ELC, we’re ever more often confronted with helping clients and partners to provide not only applications with all the needed features, but also to help cope with some crazy workloads.

As many others, we often enough look to queues and asynchronous workers for help. Instead of processing a request right off the HAProxy plate, we throw it off to a queue where a battery of worker processes will take care of the real work. This allows the app server(s) to respond quickly and provide that stern look in the eyes of the user, saying “Trust us, you’re request is in good hands.”
🙂

Queues and workers

Queues is a big topic. Very interesting, go google it. Today I want to talk about the other, often neglected part: the workers.

When decoupling the web tier from the processing tier you often end up with code that all in all is pretty simple and straightforward. Throwing together a worker script is pretty easy, especially if you already have a working version of it to steal bits and pieces from.

The pesky thing about workers is that they run headless and they must somehow be monitored. Process monitoring is also a big topic I’m not going to spend time on. What I do want to talk about is how important it is not to underestimate the difficulty of writing a decent worker, and to realize how your focus should probably be on making it debuggable, introspectable and as a good Unix citizen as possible.

Daemon Kit

Cron and script/runner is a useful combo for some kinds of asynchronous worker, but the issues that can arise are many. Each time your job runs it has to instantiate a full Rails stack and open up all the IO channels needed. Lot of fat there.

Plain ruby scripts and the Daemon class is somewhat better and given the knowledge it’s possible to write awesome daemons where your workers can live and thrive. It does take a lot of knowledge and fiddling to get it right though.

Enter Daemon Kit by Kenneth Kalmer. Tired of reinventing the wheel he wrote a collection of libraries and generators to facilitate the lives of daemon summoners like yourselves.

Go get it!

gem install kennethkalmer-daemon-kit -s http://gems.github.com

What follows is an example daemon built with daemon-kit, but with an additional twist inspired by the unicorn project: spawn a configurable number of workers to attack your workload in parallel. We’re going to start with a standard daemon-kit daemon and see how easy it is to extend to fit you own needs.

Step one: Generate a daemon

Easy as pie. Run:

$ daemon-kit foff

You’ll see a bunch of files being generated. Open the foff/ directory in your IDE and have a look around. You’ll find a bunch of useful readme files in the various subdirs, pointing you in the right direction. Open config/arguments.rb. Add the following:

Step two: command line options

We need to be able to pass in the number of worker processes to start. Open config/arguments.rb. Add the following:

  @options[:worker_count] = 1 # Default
  opts.on('-w', '--workers WORKER_COUNT', 'Number of worker processes to spawn') do |worker_count|
    @options[:worker_count] = worker_count.to_i
  end

Try your daemon!

  $ bin/foff

See that? Already up and running.

Step three: Signals&traps

Let’s move on. Open libexec/foff-daemon.rb. You’ll find a configuration block that is executed once the daemon is initialized but before the workers arrives to the plant. Here’s where we’ll set up our traps to catch the signals.

DaemonKit::Application.running! do |config|
end

Signals&traps. Another big topic I’m in no way competent to speak in depth about. For now, suffice to say that you want to make sure that when you stop your daemon, the workers have time to finish up what they were doing. If you don’t do this carefully, one Ctrl + C later you might just have interrupted a recurring billing worker just after it sent off a request to the inventory dudes but before billing the client. Don’t go there.

For now, just trust that mr Kalmer knows what he’s doing and add:

# encoding: utf-8
  
DaemonKit::Application.running! do |config|
  config.trap( 'INT' ) do
    DaemonKit.logger.info "\nINT  #{Process.ppid} ? #{Process.pid} GOING DOWN. WORKERS: #{WORKERS.keys.inspect}"
    WORKERS.each_pair do |pid, foff|
      DaemonKit.logger.info "\nNotifying child process #{pid} it's time to go."
      kill_worker(:QUIT, pid)
    end
    Process.waitall
  end
end

The above code intercepts the INT signal, log a message and then tell each of your running workers to stop gracefully (using the QUIT signal). The last line, Process.waitall, makes the master daemon process wait for all children (your workers) to finish.

This is example code and the above is not enough for production use. If an error occurs inside your trap block, the INT signal will go unheeded, or worse, the master will die but the worker process will steam on uncontrolled. You need to add all kinds of checks. For instance, what happens if evil Eve just issued a kill -9 to one of your worker processes, leaving your master daemon with a dangling worker. Tread carefully.

Step four: GO KILL THAT DAEMON

Next up is the kill_worker method. Plain and simple (and stolen from the awesome Unicorn sources):

  # Delivers a signal to a worker and fails gracefully if the worker
  # is no longer running.
  def kill_worker(signal, wpid)
    begin
      Process.kill(signal, wpid)
    rescue Errno::ESRCH
      worker = WORKERS.delete(wpid) rescue nil
    end
  end

“Errno::ESRCH”?!? I bet that’s obvious to Matz, but to me it’s even hard to read… It’s good I don’t need to know exactly what that part does. So, kill_worker signals the worker with PID wpid and if Eve already killed it (or, it died by itself) we stop trying to deal with it and move on.

Step five: Trapping workers on a fork

Now let’s set up the workers. I choose to put that in it’s own method, fork_and_trap:

  # Fork, listen for QUIT signals and run()
  def fork_and_trap(wrkr)
    fork do
      trap(:QUIT) do 
        wrkr.stopit!
        wrkr.log "aawwwright, time for a break. Hang on, lemme finish up..."
      end

      wrkr.run
    end
  end

Given a worker instance (“wrkr”), we fork() and setup the “listener” for the QUIT signal that we just setup above. The master daemon, when receiving an INT, will send QUIT to all known workers which in turn will run worker specific cleanup code, the #stopit! method in our case, and gracefully exit.
We also call the #run method on the worker and off he goes!

Step six: do it already!

The final piece is the actual worker spawning and accountant code. Again, there are probably a lot of pieces missing, but it boils down to:

# ===========================
# = Set up and fork workers =
# ===========================
WORKERS = {}

(0...DaemonKit.arguments.options[:worker_count]).each do
  foff = Foff.new
  WORKERS[ fork_and_trap(foff) ] = foff
end

DaemonKit.logger.info "PIDs: #{WORKERS.keys.inspect}"
Process.waitall

Remember the first step, when we set up the arguments processing? We can access the value passed from the command line in the DaemonKit.arguments.options collection. If we launched with:

  $ bin/foff -w 12

we’ll spawn 12 worker processes and the master daemon can keep tabs on them in the WORKER constant, where the PIDs are keys and the values are the now running workers.

We finish up with a call to Process.waitall so that the master daemon doesn’t exit.

Step seven: the Working Class

The worker class is of course where all the meat is going. For this article I just cooked up a barebones worker that executes 5 steps and sleeps for 1 or 2 seconds for each step. The “steps” are application specific, perhaps the “Pull from inventory”, “Call mum”, “Tell FedEx to pick up the stuff”, “Bill client”. At the completion of each work unit we check if we have received a QUIT in the meantime. If so, exit the loop.

Put the following in lib/foff.rb

  # encoding: utf-8

  class Foff
    def log(msg)
      (@logger ||= DaemonKit.logger).info "#{Process.ppid} ? #{Process.pid} #{msg}"
    end

    def stopit!
      @shutting_down = true
    end

    def shutting_down?
      @shutting_down
    end

    def run
      loop do
        log "START @ #{Time.now}"
        (0..5).each do |step|
          log "workee workee at step #{step}"
          sleep rand(2)
        end
        log "END @ #{Time.now}"

        if shutting_down?
          log "TERMINATING @ #{Time.now}"
          exit
        end
      end
    end
  end

Not much to say here. Note that the only reference to DaemonKit is in the #log method and that is easy to remove. We’re also pleasantly free of process management code in here. All that stuff is taken care of in the master daemon. Loose coupling ftw!

Step eight: profit off of the work of others

Time for a test run:

  $ bin/foff -w 3


Out

You can see your three workers start up and go about their business. When the signal comes (Ctrl + C), the master daemon notifies the workers who in turn finishes off whatever they were doing and then, finally, our daemon can rest in peace.

You might notice how some log messages are printed weirdly on screen. I’m not sure why this happens, but I’d put my money on the Ruby standard logging library not coping with parallel access very well. It’s a good thing the logging backend of DaemonKit is pluggable…

As said above, writing daemons is harder than you’d think but with tools like DaemonKit to help it’s not that bad.

Creating PDFs with MacRuby

Macs are cool. Cocoa is cool. Ruby is cool. Matz is nice so we are nice. Sansonetti&Co at Apple are like the master drinkmixers, joining together the best of the best in one neat package. I’m talking about MacRuby.

Also_cool_fu

The latest MacRuby dev branch, 0.5, is using LLVM and packs speed and features enough to make any geek teary eyed. This morning I wanted to take it for a test run on a problem domain that is sort of close to what my current project is dealing with: PDFs.

We all know Mac OS X is crazy good at reading, writing and printing PostScript and PDFs. It’s right in there in the OS core graphics libraries. So, I thought, wouldn’t it be cool to see how to use MacRuby and Quartz to read and parse PDFs directly, rather than that pesky old beast that is Ghostscript? (GS is an awesome piece of work, very capable and I’m not meaning to sound negative here. It’s just kinda… old style?)

So, here’s some example code. You never knew you needed it, but here it is: you know all of those PDFs that you have lying around on your Desktop? Wouldn’t it be fantastic to have one big PDF with the first page of all of those?

No? Really? C’mon!

So, listen, it’s like the internet. You didn’t know you needed that either back in the day. This is exactly the same. Keep reading.

Get the goods

Go read this: http://redartisan.com/2009/9/1/macruby-intro. Awesome intro, instructions that work. Takes hours, so it’s a good thing you now know you really really need your first-page-of-all PDF.

Got quartz?

Done? Sweet. So here goes:

framework 'Quartz'
mother_pdf = PDFDocument.new
Dir.glob(File.expand_path('~/Desktop/*.pdf')).each do |file_path|
  url = NSURL.fileURLWithPath(file_path)
  pdf = PDFDocument.alloc.initWithURL url
  puts "Loaded PDF from: \"#{file_path}\""
  mother_pdf.insertPage(pdf.pageAtIndex(0), atIndex: mother_pdf.pageCount)
end
puts "Done"
mother_pdf.writeToFile('./mother.pdf')

See how short that was? I was just blown away by the ease of it. The only gotcha was that it took me a while to a) find the API docs for PDFKit and b) learn which framework I needed (quartz, not PDFKit).

Mixing and matching standard Ruby (File, Dir.glob etc) and Cocoa is just… seamless! Very very cool. The PDFKit API is straightforward to use. Let’s add an annotation to each page. There are a ton of different kinds of annotations you can add, but let’s keep it simple: PDFAnnotationText. Add this method on top of the code above.

def annotation(text)
  rect  = NSRect.new(NSPoint.new(100,100), CGSize.new(150,100))
  annot = PDFAnnotationText.alloc.initWithBounds(rect)
  annot.shouldDisplay = true
  annot.contents = text
  annot.color = NSColor.whiteColor
  annot
end

Notice how we can use annot.color= instead of the “real” message name, annot.setColor(). That’s a MacRuby extra.
Next, modify the PDF generating code like so to get this:

framework 'Quartz'
def annotation(text)
  rect  = NSRect.new(NSPoint.new(100,100), CGSize.new(150,100))
  annot = PDFAnnotationText.alloc.initWithBounds(rect)
  annot.shouldDisplay = true
  annot.contents = text
  annot.color = NSColor.whiteColor
  annot
end

mother_pdf = PDFDocument.new
Dir.glob(File.expand_path('~/Desktop/*.pdf')).each do |file_path|
  url = NSURL.fileURLWithPath(file_path)
  pdf = PDFDocument.alloc.initWithURL url
  puts "Loaded PDF from: \"#{file_path}\""
  page = pdf.pageAtIndex(0)
  page.addAnnotation(annotation("Origin: \"#{file_path}\""))
  mother_pdf.insertPage(page, atIndex: mother_pdf.pageCount)
end
puts "Done"
mother_pdf.writeToFile('./mother.pdf')

See? Pretty cool right? 🙂

Also cool:

Also_cool_brooklynbridge

Installing ruby-filemagic on MacOS X and Ubuntu

FileMagic is a Ruby extension that provides an interface to libmagic, i.e. the lib version of the *nix ‘file’ command and provides easy access to the system mime database along with some nifty heuristics to determine filetypes.

Checking that what-you-got-is-what-you-want is often important, especially when dealing with file uploads and/or third-party integration APIs providing file downloads, e.g. links to download a PDF.

Installing ruby-filemagic is easy enough if you got all dependencies installed. If not, here’s a few pointers.

On MacOS X (Leopard), this worked for me:

$ sudo port install file
$ sudo gem install ricardochimal-ruby-filemagic -- --with-opt-include=/opt/local/include

On Ubuntu, I had to:

$ apt-get install libmagic1 libmagic-dev
$ gem install ricardochimal-ruby-filemagic

Before I got the right libs installed there were moments of confusion, as there were already a ‘magic.h’ installed (from ImageMagick — Ruby’s most troublesome library?) and some weird and unhelpful error messages from extconf.

Usage:

$ irb -r filemagic
>> fm = FileMagic.new(FileMagic::MAGIC_NONE)
=> #<FileMagic:0x1a62740>
>> fm.file('here.pdf')
=> "PDF document, version 1.3"
>> fm.file('tmp/dump.sql.bz2')
=> "bzip2 compressed data, block size = 900k"
>> fm.file('tmp/sweetjpg')
=> "JPEG image data, JFIF standard 1.01"

Hope I saved somebody some time!

EBS striping: worthwhile?

A while back I discussed the performance on Amazon EC2 EBS volumes with some of my colleagues. The discussion went kinda like “yeah, I/O performance sure isn’t stellar but striping disks really doesn’t make much sense as EBS volumes are already striped by Amazon and in any case they’re SAN drives, so the bottleneck is bound to in the NIC”.

While the above makes sense, nobody had any hard facts and we were mostly just guessing (well, we did know for sure EBS I/O performance is an issue for our clients).

The russians over at Percona did some benchmarking and it turns out that while RAID 5 is pretty pointless, pure striping is way faster, especially for multithreaded I/O.

Good to know there’s an option for when that DB server goes all slow on you.

Testing invocation of external executables, %x{}, `

If you ever found yourself trying to write tests for code that invokes an external executable, such as:

  class Beef < ActiveRecord::Base
    def diskspace(flags = 'sh')
      `du -#{flags} .`.split.first
    end
  end

…and wondered how to write a spec that ensures the du command is called with the right options, you might have tried something like:

  before do
    @beef = Beef.new
  end

  it "has diskspace with humanized multipliers" do
    Kernel.should_receive(:`).with("du -h .")
    @beef.diskspace('h')
  end

That doesn’t work.

After some headscratching and calls for help on the ruby-talk mailing list, I learned how the Kernel#` method is not being called directly. The Ruby object hierarchy has Object at the top-level, and thus all your objects eventually descends from Object. As Object mixes in Kernel, all your objects have all the methods defined in that module, so it’s not Kernel that receives your call to “`”, but it’s the current ‘self’. In the example above it’s the @beef instance.

Consider:

  class Sheep
    def mytick(arg)
      puts "tickety-tick: #{arg.inspect}"
    end
    alias :"`" :mytick

    def tick_it!
      `cheese`
    end
  end

  Sheep.new.tick_it!

  $ ruby sheep.rb
  tickety-tick: "cheese"

So, to test the original code, here’s the right way:

  it "has diskspace with humanized multipliers" do
    @beef.should_receive(:`).with("du -h .")
    @beef.diskspace('h')
  end

All of the above also applies to the “%x[]”-syntax (just an alias for Kernel#`).

Kudos to Brian Chandler for helping out on this, and happy hacking!

Stubbing a method for all tests

If you ever need to stub out a method for all tests in your test suite, for example a before filter in ApplicationController that goes takes a while to run, here’s a neat trick.

Stick the following in your spec_helper:

Spec::Runner.configure do |config|
  config.before(:each, :type => :controller) do 
    controller.stub!(:blog_feed).and_return([])
  end 
end

If you need to override the above for some specs, just put a similar block on top of the spec file. (Question: how can I remove the stub completely for just a few tests?)

It might not be the most elegant solution known to man, and it sure isn’t very clean and could come back and bite you one day, but if you know what you’re doing… 😉

Update: Something like this might work for your unstubbing needs, but if the above is an ugly hack, this is a really ugly hack and YMMV:

class Object
  def unstub!(method_name, *args, &blk)
    self.send("proxied_by_rspec__#{method_name}", *args, &blk) if self.respond_to?("proxied_by_rspec__#{method_name}")
  end
end

If you need to poke even deeper (and you really shouldn’t), this might be useful to you:

rspec_guts = my_instance_of_something.send(:__mock_proxy) # rspec works its magic on this object, which is a Spec::Mocks::Proxy
puts rspec_guts.instance_variable_get("@proxied_methods").inspect

pdf2swf on Mac OS X and ASVM mismatch

EdgeThe current version of swftools in macports is 0.8.1 and for many cases it’s not good enough. In particular, for pdf2swf to generate swf files that are recognized as ASVM3 (i.e. Flash version 7+ I think, but do correct me if I’m wrong) you need swftools 0.9.0 or greater. Loading an external swf movie from within another is considerably more cumbersome if the virtual machine used in the two files versions differ.

You know you ran into the above VM mismatch error if your Flash logs are telling you something like this:

  TypeError: Error #1034: Type Coercion failed: cannot convert flash.display::AVM1Movie@1e3d9e01 to flash.display.MovieClip. 
    at com.elctech::PdfViewer/doneLoading()

The above error will show only if — only if — you have a debugger enabled flashplayer and logging enabled.

But as we said, the current macports version isn’t recent enough, so what to do? One option is to install everything from source, which for a package such as swftools entails a lot of dependencies. Another option is to install the current macports version anyway to get all the dependencies installed and then compile just the swftools package from source (http://www.swftools.org/download.html). Not ideal perhaps, but IMHO better.

If you get complaints about missing libraries towards the end of the ./configure phase, like this:

  ***************************************************
  * The following headers/libraries are missing:  jpeglib ungif jpeglib.h gif_lib.h
  * Disabling pdf2swf tool...
  * Disabling jpeg2swf tool...
  * Disabling gif2swf tool...
  ***************************************************

… try this:

  LDFLAGS="-L/opt/local/lib" CPPFLAGS="-I/opt/local/include -I/opt/local/include/lame" ./configure
  

…to make sure your macports installed libs/headers are picked up.

If/when make fail with something like:

  ld: in ../libgfxswf.a, archive has no table of contents 
  collect2: ld returned 1 exit status 
  make[1]: *** [pdf2swf] Error 1 
  make: *** [all] Error 2

…run:

    ranlib lib/*.a
    make
  

…and the compile will continue. ‘sudo make install’ to finish.

I’d love to write a more thorough explanation for the ranlib magic here, but I honestly don’t have a clue. It’s key not to run ‘make clean’ or ‘./configure’ again here, just the ranlib stuff and it will pick up where it got confused. Somehow. If anyone knows of a good resource for ranlib I’d love to dig further into this.

Hapiness

Ruby 1.9 isn’t always faster: rake tab-completion revisited

Rake is awesome in many ways and we’re all using it for a plethora of tasks. It’s easy to use, fast to code and reliable.

A while back I blogged about tab completion for rake tasks and the other day I set out to speed it up, as the project I was working on had a huge amount of tasks and retrieving the list of tasks took a loooong while.

As it turns out, speeding up the task enumeration is not a trivial task. My first thought was to ParseTree but it was horrendously slow and quite complex. Then I thought I’d use a fast language, such as OCaml or perhaps even C to scan the files and become known worldwide for my wicked coding skillz.

Didn’t happen. As I’m sure you’ve already figured out, the rake library allows you to define raketasks in many different ways and people out there all seem to have different opinions on how to code up their tasks. Several important libraries define the rake tasks dynamically, ruling out any pattern matching approaches.

In other words I failed. But I also learned a lot and thanks to ruby-talk I also found a trick to make things spiffier: Ruby 1.9 isn’t always faster, Rake::Application is a singleton and String#hash isn’t what it used to be.

Read on for details.

Rake tab completion

The first step to setup rake tab completion is to tell bash to invoke your script when ‘rake’ is the command. Drop this in your ~/.bashrc:

  complete -C ~/bin/rake_completion.rb -o default rake

The next step is the ruby executable script that actually looks up the rake tasks. The script:

  #!/usr/bin/env ruby
 
def rake_silent_tasks
  dotcache = "#{File.join(File.expand_path('~'), ".raketabs-#{Dir.pwd.hash}")}"
  if File.exists?(dotcache)
    Marshal.load(File.read(dotcache))
  else
    require 'rubygems'
    require 'rake'
    load 'Rakefile'
    tasks = Rake.application.tasks.map(&:name)
    File.open(dotcache, 'w') { |f| f.puts Marshal.dump(tasks) }
    tasks || []
  end
end
 
exit 0 unless File.file?(File.join(Dir.pwd, 'Rakefile'))
exit 0 unless matches = ENV["COMP_LINE"].match(/^rake\s+(.*)/)
 
after_match = matches[1]
task_match = after_match.strip.empty? ? nil : after_match.strip.split.last
 
if task_match
  tasks = rake_silent_tasks.grep /^#{Regexp.escape task_match}/
 
  # handle namespaces
  if matches = task_match.match(/^([-\w:]+:)(.*)/)
    upto_last_colon = matches[1]
    after_match = matches[2]
    tasks = tasks.map { |t| (t =~ /^#{Regexp.escape upto_last_colon}([-\w:]+)$/) ? "#{$1}" : t }
  end
  puts tasks
end
 
exit 0  

We need to make this executable with:

  chmod +x ~/bin/rake_completion.rb

Let’s go through the code. First comes the ‘rake_silent_tasks’ top-level method. More on that later.
Next a couple of checks to ensure we’re somewhere where it makes sense to run ‘rake’ (i.e. a directory with a ‘Rakefile’ present). Then we check the $COMP_LINE environment variable. This is set by bash and we just need to match it against the string ‘rake’ at the beginning of the command line. Easy.

The remaining code is just your average regexp juggling to match the string you enter, e.g. ‘rake db:mi[TAB]’.

What is interesting here is all in the ‘rake_silent_tasks’ method. In order for lookups to be fast we really want to avoid invoking rake and parse the Rakefile (and all the .task files that are commonly present).

The code first checks if we’ve been running rake+TAB in this directory before and if so, loads an unmarshal the tasks into a Ruby Array. Using Marshal.load/.dump is a big speedup in itself.

If we need to lookup all the rake tasks from scratch, a full parse of all task defining ruby files is necessary. Previous versions of this code used a backtick invocation of rake like so:

  `rake --silent --tasks`

That ouputs the full list of tasks and their descriptions and is quite slow.

To bypass the necessity for backticks I read through the source code for rake and asked ruby-talk to help out and sure enough, there is a way to get around the necessity to spin off a whole new ruby process, and the following returns a list of tasks as an Array:

  require 'rake'
  load 'Rakefile'
  Rake.application.tasks

Pretty straightforward, uh? Sure, after the fact it’s obvious, but I spent a long while trying to instantiate a new Rake::Application object until told that rake is implemented as a singleton. This snippet will crash so badly that IRB dies and throws you back to the shell:

  require 'rake'
  a = Rake::Application.new
  a.init

(maybe it’s not even a crash, just plain wrong usage but it’s really hard to tell and can be quite confusing…)

Anyways, retrieving the rake tasks this way is a lot faster. How much?

  100 rake task enumerations for "/Users/david/projects/big_rails_app"
                    user     system      total        real
  backticks     0.160000   0.330000  59.680000 ( 61.323611)
  rake direct   0.590000   0.070000   0.660000 (  0.676324)

Two orders of magnitude! Yay!

Interestingly, Ruby 1.9 is quite a bit slower than good’ol 1.8:

    100 rake task enumerations for "/Users/david/projects/big_rails_app"
                      user     system      total        real
    backticks     0.230000   0.390000  59.940000 ( 61.596335)
    rake direct   0.960000   0.110000   1.070000 (  1.089892)
  

While I was investigating the various options to speed up tab completion I first of all grabbed for Ruby 1.9. After a while I discovered that the implementation above never hits the cache and the reason for that is a subtle change in the way the little used String#hash method works.

In 1.8 the hash for a given String is always the same, while in 1.9 the implementation changed to use MurmurHash and now the hash is identical only within the same ruby process. This change is not documented anywhere to my knowledge and is a deal breaker for ruby scripts used as command line executables (and maybe elsewhere as well).

I tried to replace the ‘Dir.pwd.hash’ snippet above with ‘Digest::MD5.hexdigest(Dir.pwd)’ so it would work under 1.9, but it’s a lot slower than Sting#hash, and we’re aiming for speed here, so…

All in all I’m happy with my investigation. As said, learned a lot and in the end I gained a fast-ish rake task tabber script. Many thanks go to Sebastian Hungerecker, Robert Klemme, Ryan Davis and of course to the original implementors of the rake tab completion script.

If you see any other means of speeding this up, please leave a comment! 🙂

Update: fixed escapes in code, as per commenters request (1 Jun 2009)