Follow 10+ Rubyists using Sinatra on Twitter (Reprint)

Note: This first appeared on 24th June 2009 and is being reprinted as the original is not accessible.

What’s Twitter?

Twitter

The New York Times says:

Twitter is a simple messaging service that you’ve either heard about a lot or not at all. Either way, it’s a fun and useful tool, well worth trying if you want to reach potential and existing customers, employees or employers.

List of Rubyists Using Sinatra

This list of over 10 Rubyists using Sinatra, is in alphabetical order, with a link to their Twitter profile. The following list is not intended to be all-inclusive, but it should give you a great start to following some talented Rubyists using Sinatra.

  1. Aaron Quint – aq
  2. Adeel Ahmad – _adeel
  3. Andre Lewis – alewis
  4. Andrew Neil – nelstrom
  5. Arjun Ram – arjunram
  6. August Lilleaas – augustl
  7. Barry Hess – bjhess
  8. Bill Siggelkow – bsiggelkow
  9. Continue reading “Follow 10+ Rubyists using Sinatra on Twitter (Reprint)”

Karel Minarik: How do I learn and master Sinatra? (Reprint)

Note: This is reprint of the blog post that appeared on 13th July 2009, as the original is not accessible.

Welcome to the fourth installment on the RL blog, of a mini series – “How do I learn and master Sinatra?” – by top Rubyists using Sinatra. The interview series will provide insight and commentary from these notable Sinatra developers, with the goal of facilitating and providing answers to the questions Ruby beginners face on how to learn and master Sinatra.

Satish>> Karel Minarik, could you tell us something about yourself – your background, where you are based?

Karel MinarikKarel Minarik>> I’m Karel Minarik, web designer and developer living in Prague, Czech Republic. I have graduated in Philosophy, not Computer Science, which may explain why I love Ruby a lot, and why I prefer solving “naming things” over “cache invalidation” problems. I earn my bread by designing interfaces, writing Ruby, JavaScript, HTML/CSS and giving people advice or teaching them new tricks. I blog in undecipherable intervals on Restafari.org and publish code regularly at Github.

Satish>> Are there any pre-requisites for a person to start learning Sinatra?

Karel>> Very few: you just need to know Ruby a little bit. The rest you can and will learn along the way. In fact, Sinatra is wonderful teaching tool to deepen your knowledge of Ruby as a general programming language, web application architectures, HTTP and REST principles, concept of middlewares, and so on.

Sinatra Icon
:)

Continue reading “Karel Minarik: How do I learn and master Sinatra? (Reprint)”

Corey Donohoe: How do I learn and master Sinatra?

Note: We are re-printing this blog post that appeared on 6th July 2009, as the original post is not accessible.

Welcome to the first installment on the RL blog, of a mini series – “How do I learn and master Sinatra?” – by top Rubyists using Sinatra. The interview series will provide insight and commentary from these notable Sinatra developers, with the goal of facilitating and providing answers to the questions Ruby beginners face on how to learn and master Sinatra.

Satish>> Corey Donohoe, could you tell us something about yourself – your background, where you are based?

Corey DonohoeCorey Donohoe>> I’m Corey Donohoe. I’m based out of Boulder, Colorado – USA. My background is in computer science and system administration though I prefer hacking to either of those labels. I’m a pretty normal dude, I enjoy cycling, music, coffee, micro brews, and all the other awesomeness that my home state has to offer. I’ve been working for Engine Yard since March of ’07 doing everything from app support to internal development. I’m currently 1/2 of our internal integrations team.

Sinatra’s greatest strength is its flexibility

Satish>> Are there any pre-requisites for a person to start learning Sinatra

Corey>> There aren’t any hardcore prerequisites per se; Ruby and experience in a Ruby web framework is a plus. HTTP verbs play a huge role in Sinatra, as well as things like query and post params.

Sinatra Icon

Continue reading “Corey Donohoe: How do I learn and master Sinatra?”

How do I benchmark Ruby code?

Send to Kindle

How do I benchmark Ruby code?

This guest post is by Jesse Storimer. He’s the author of Working With Unix Processes, a gentle introduction to Unix system programming for Ruby programmers. Jesse has been programming Ruby since joining Shopify in 2008 and is still going strong, always looking for a chance to dig lower down into the stack. He lives way in the backwoods of southern Ontario, Canada with his wife and two daughters. Jesse blogs at jstorimer.com and has authored a few other books for Ruby developers.

Jesse Storimer So you’ve got some Ruby code and you want to make it faster. Maybe you’ve already got a new implementation in mind, or maybe you’re still cooking that up. But how do you make certain that your new implementation is faster?

Science, of course! Ruby’s standard library comes with a benchmarking library fit for measuring the execution time of your Ruby code. The Benchmark module offers several different ways for you to benchmark your code. I’ll take you through the different options and their use cases.

Getting started

The Benchmark module is in the standard library, so you don’t need to install any gems to get it. Here’s the documentation from the standard library.

The simplest way to measure your Ruby code is with Benchmark.measure.

require 'benchmark'
require 'bigdecimal/math'

# calculate pi to 10k digits
puts Benchmark.measure { BigMath.PI(10_000) }

This will return something that looks like this:

  0.310000   0.040000   0.350000 (  0.339958)

With no context, these might look like magic numbers. Here’s what they mean:

Benchmark numbers breakdown

Generally, the number farthest to the right is the most important one. It tells how long it actually took to perform the operation. If you’re curious about why the clock time is so high, the other numbers can help you drill down to see if you’re spending time in system functions or your own code.

Now that you know what those magic numbers mean, we can move on to the core Benchmark API. The truth is that I rarely use the measure method on its own. It only prints the benchmark for a single block of code. The most common way to use Benchmark is to compare the execution time of different approaches to the same problem.

Benchmark has some built-in methods for this exact purpose.

Benchmark#bm

This method lets you define several blocks of code to benchmark, then prints the results side-by-side in the same format you saw earlier.

require 'benchmark'

iterations = 100_000

Benchmark.bm do |bm|
  # joining an array of strings
  bm.report do
    iterations.times do
      ["The", "current", "time", "is", Time.now].join(" ")
    end
  end

  # using string interpolation
  bm.report do
    iterations.times do
      "The current time is #{Time.now}"
    end
  end
end

This will print the following result:

       user     system      total        real
   0.540000   0.010000   0.550000 (  0.556572)
   0.410000   0.010000   0.420000 (  0.413467)

Notice that this is the same format I outlined earlier, but now you have little hints about each of the numbers.

The core API here is this:

Benchmark.bm do |bm|
  bm.report { first_approach }
  bm.report { alternative_approach }
end

You call the Benchmark#bm method passing a block. The block variable is a special object provided by Benchmark. It gives you a report method that you call with the block of code that you want to measure. Benchmark then runs both blocks of code and prints their execution times side-by-side.

A note about iterations: Often, when doing benchmarks that test code that executes very quickly, you need to do many iterations to see a meaningful number. In this case, I did 100,000 iterations of each variant just to get the execution time up to half a second so I could grasp the difference.

Labels

In that last benchmark, I buried some comments in the source that said what each block of code was doing. That’s not so helpful when looking at the results! Benchmark allows you to pass in a label to the report method that will be printed along with the results.

require 'benchmark'

iterations = 100_000

Benchmark.bm(27) do |bm|
  bm.report('joining an array of strings') do
    iterations.times do
      ["The", "current", "time", "is", Time.now].join(" ")
    end
  end

  bm.report('string interpolation') do
    iterations.times do
      "The current time is #{Time.now}"
    end
  end
end

I’ve now removed the comments describing the blocks and pass them in to the report method as an argument. Now the output describes itself:

                                  user     system      total        real
joining an array of strings   0.550000   0.010000   0.560000 (  0.565089)
string interpolation          0.410000   0.010000   0.420000 (  0.416324)

There’s one more important change I made in that last example that may have gone unnoticed. I passed 27 as an argument to the Benchmark.bm method. This signifies how much padding the header labels should have in the result output. If you pass labels to report, but don’t set this value high enough, your output won’t line up properly.

Let’s see an example with no argument passed to Benchmark.bm.

       user     system      total        real
joining an array of strings  0.520000   0.010000   0.530000 (  0.541942)
string interpolation  0.390000   0.010000   0.400000 (  0.394111)

That’s certainly not right. Make sure you pass a value that’s greater than the length of your longest label. That’s the happy path.

Benchmark#bmbm

The Benchmark#bm you just saw is really the core of Benchmark, but there’s one more method I should mention: Benchmark#bmbm. That’s right it’s the same method name, repeated twice.

Sometimes, with a benchmark that creates a lot of objects, the results start to get skewed because of interactions with Ruby’s memory allocation or garbage collection routines. When creating a lot of objects, one block may need to run garbage collector, while the other doesn’t; or just one block may get stuck with the cost of allocating more memory for Ruby to use.

In this case, the benchmark can produce unbalanced results. This is when you want to use Benchmark#bmbm.

The method name is suitable because it actually benchmarks your blocks of code twice. First, it runs the code as a ‘rehearsal’ to force any initialization that needs to happen, then it forces the GC to run, then it runs the benchmark again ‘for real’. This ensures that the system is fully initialized and the benchmark is fair.

This last example benchmark allocates a lot of objects. When this runs at the rehearsal stage, Ruby has to allocate more memory to make room for all the objects. Then when the ‘real’ benchmark happens, the memory is already available and just the actual implementation is tested.

require 'benchmark'

array = Array(1..10_000_000)

Benchmark.bmbm(7) do |bm|
  bm.report('reverse') do
    array.dup.reverse
  end

  bm.report('reverse!') do
    array.dup.reverse!
  end
end

And here’s the result:

Rehearsal --------------------------------------------
reverse    0.020000   0.020000   0.040000 (  0.050908)
reverse!   0.030000   0.020000   0.050000 (  0.048042)
----------------------------------- total: 0.090000sec

               user     system      total        real
reverse    0.010000   0.000000   0.010000 (  0.015385)
reverse!   0.030000   0.000000   0.030000 (  0.023973)

Notice the discrepancy between the rehearsal and the benchmark! Thanks bmbm!

Conclusion

When you want to try your hand at speeding up some of your Ruby code, make sure that you measure, measure, measure to prove that your new implementation is faster than the old one. This great little benchmarking library ships with Ruby right in the standard library, so there’s no excuses!

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , , , ,


(Powered by LaunchBit)

Ruby Matrix, the Forgotten Library

Send to Kindle

Ruby Matrix, the Forgotten Library

This guest post is contributed by Matthew Kirk, who is a partner at Modulus 7, specializing in software development and strategy. The basis of his career has been around utilizing science to improve businesses. He has spoken at technology conferences around the world and in his spare time, he enjoys traveling and adding to his 2000+ vinyl record collection.

Matthew Kirk Remember matrices from math class? No not the movie, but the rectangular array of numbers. While you might not see it often, Ruby has a matrix implementation that is well tested and allows you to accomplish tough calculations quickly.

While I won’t be able to teach you everything there is to be known about matrices, we will cover how to use matrices within Ruby as well as some quirks and their major selling points. By the end of this I hope that you delve deeper into learning about matrices and use them in your next project.

What are matrices?

A matrix according to Wikipedia is a rectangular array of numbers. Used heavily in math, matrices are all over languages like R and Matlab. They can be a great way to store numerical data and simplify many difficult and tedious problems. Instead of solving systems of equations matrices can simplify these into one equation.

In terms of how Ruby implements matrices, Ruby stores all matrix rows into one big array. The only requirement is that the arrays are of the same dimension. So for each row that is added to a matrix each one must be of the same size.

Just like arrays, matrices are zero indexed meaning that the first row is index 0 and the first column is index 0. Unlike arrays though you have to have two indexes to get to an element:

Making matrices:

Some quirks:

The Matrix library has some quirks. The Matrix class allows non-numerical data to go into itself. This could be useful for storing things like symbols in a more x, y format but render most of the matrix functions useless.

For instance:

Another quirk to be aware of: Matrix[*rows] does not copy the rows objects but instead points to it. To avoid this use Matrix.rows(rows) or Matrix.columns(columns). Implementation wise Matrix[*rows] calls the function Matrix.rows(rows, copy = false).

Iterating over matrices:

How do you iterate over a matrix? Most likely you would think that matrices read left to right top to bottom. And that’s true. But there are other cases as well.

In total there are 7 ways to iterate over a Matrix in ruby:

  • :all This reads left to right top to bottom. This is the default case when you type matrix.each
  • :diagonal: This only reads the diagonal elements or row index == column index
  • :off_diagonal: This will read everything not on the diagonal or row index != column index
  • :lower: This reads the lower triangle of the matrix or row index <= column index
  • :strict_lower: this is more strict and reads only row index < column index
  • :strict_upper: this is a strict upper triangle and is row index > column index
  • :upper: row index >= column index

An example:

Example: Parabola with matrix:

Imagine you want to fit a curve through three points. If you remember math class you might remember that you can do this by fitting a quadratic. For instance lets say we want a line that goes through (1,2) (3, 5.5) and (6, 6). To solve this we would write the equations:

Math equation

The way to solve this would usually involve lots of algebra and substitutions. While it’s easy to solve this in the case where we already know the numbers it is difficult to come up with a general solution (try it I dare you).

Instead of worrying about solving this using non-matrix algebra we can solve it using matrix algebra. The first step is to rewrite the above system into this form:

Math equation

To make it even easier to solve we would rewrite this as Ax = b. To solve this we would take the inverse of A and then multiply both sides by that. Which would yield:

Math equation

Now that we know this, we can easily solve this using Ruby with the following formula.

While it’s close it won’t be correct unless you use Rational. Ruby’s matrix library graciously utilizes functions that preserve precision. You would expect most libraries to convert to floats but ruby does not.

For instance you can change the above function call to:

Whenever possible try to preserve the precision!

The general case, fitting an n-power polynomial to n points:

Up above we only fit this curve to 3 points. But what about 4 or 15 points? This would be quite simple to do and would only require a little bit of modification:

Conclusion:

While you might not use matrices every day, they can be useful to solve problems involving systems of equations. Ruby has a robust matrix library that can be useful in finding solutions to these types of problems. Next time you want to fit a curve keep in mind matrices might be the best way to go!

For more information about matrices I recommend reading Wikipedia articles. There are lots of math professors who spend hours updating them tediously. If they are too confusing, think about picking up a book on matrix algebra like Matrix Computations.

Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , , ,


(Powered by LaunchBit)

Questions on Ruby? Ask An Expert at RubyLearning

Questions on Ruby? Ask An Expert at RubyLearning

RubyLearning is happy to announce the start of a series of blog posts titled “Ask An Expert”. We are assembling experts in various areas of Ruby programming who will answer your questions as a blog post here.

To begin with, some of the experts who have agreed to answer your questions are:

  • Gautam Rege on Ruby with MongoDB, Mongo, MongoMapper and Mongoid
  • Gonçalo Silva on Ruby/Rails performance
  • Sau Sheong Chang on Sinatra
  • Sethupathi Asokan on the Ruby devise gem
  • Staffan Nöteberg on Ruby regex

I am sure there are more Ruby experts to follow.

Please feel free to ask your questions on the topics mentioned above, as comments to this blog post. We will put up these questions to the experts who will answer them here as a separate blog post, provided there are at least 7-8 questions per topic. While asking please do mention your name and any one of the following – your Twitter id or your GitHub URL or your Google+ id.

Technorati Tags: , , , ,

Performance Testing Rails Applications — How To?

Performance Testing Rails Applications — How To?

This guest post is by Gonçalo Silva, who is a full-time Ruby on Rails developer at escolinhas.pt and has participated in the Ruby Summer of Code 2010. He loves and contributes to many open-source projects, being a fan of Linux, Ruby and Android. He likes to call himself a hacker, but that’s just an excuse for being in front of the computer all the time. Oh, and he tweets at @goncalossilva.

Gonçalo Silva Rails 3.1 is just around the corner, and it brings enhanced performance testing tools. Let’s have a look at this often overlooked feature of our web application framework of choice.

This isn’t new

Rails has had built-in performance testing tools since version 2.2. Originally developed by Jeremy Kemper, these allowed developers to test the performance of their applications by writing integration tests which could be benchmarked and profiled under MRI. He later introduced two scripts – benchmarker and profiler – which were great to quickly benchmark or profile small snippets of code.

Actually, this is kind of new

I came across these tools during last year’s Ruby Summer of Code. I remember feeling astonished and bit ashamed about not having played with them before. I couldn’t use them at their full potential because of the lack of full support for YARV (or MRI 1.9), so I set off fixing that. While working on it, I’ve made a list of other things these tools lacked, that I wanted to implement after RSoC, namely: – Rubinius support – JRuby support – Test configurability – Decoupling benchmarker and profiler from RubyProf

Everything listed above is now implemented. Rails 3.1 will ship with these improvements and we’ll no longer have excuses for not using these great tools Rails provides for all of us.

Why you should care

The web should be fast. Response times are a key factor in user experience and there is very limited patience for slow websites. Ruby interpreters aren’t famous for being performant and our beloved framework isn’t known for getting faster with new releases. Nevertheless, we want our websites to be fast and responsive, and buying tons of hardware isn’t always an available choice – we need our code to be fast. We should care.

How does this work?

Rails’ performance testing tools allow you to quickly detect performance bottlenecks. As a rule of thumb, use benchmarking to detect the problem and then use profiling to understand it. Profiling provides in-depth information about your code and what it’s doing, but it lacks the speed and simplicity of benchmarking.

Patching your Ruby interpreter

You can skip this section if you’re a Rubinius/JRuby/REE user.

If you’re an MRI/YARV user, you’ll need a patched interpreter to access all available metrics. Before you run off, let me tell you that it’s very simple to install a patched Ruby interpreter nowadays. Thanks to Wayne, the author of RVM, all you need to do is to specify an additional flag when installing your interpreter, like this: rvm install 1.9.2 --patch gcdata Or, if you’re still using 1.8 (really?): rvm install 1.8.7 --patch ruby187gc

That’s all, folks. You now have a patched Ruby interpreter. If you want, you can have your patched interpreter side by side with your regular one, by simply assigning a name to it:

rvm install 1.9.2 --patch gcdata --name perf
rvm 1.9.2-perf  # my patched interpreter
rvm 1.9.2       # my regular interpreter

And that’s it.

Editing your Gemfile

You can skip this section if you’re using Rubinius/JRuby.

If you’re not, you’ll need to add RubyProf to your Gemfile:

gem 'ruby-prof', :git => 'git://github.com/wycats/ruby-prof.git'

Don’t forget to remove this from your Gemfile and re-run bundle install if you intend to switch to Rubinius or JRuby.

Performance tests

In order to use these tools, you’ll need to write performance tests. These tests are just like integration tests, except that the point is not to assert anything. They’ll just run the code that you want to see benchmarked/profiled.

Generating

As expected, Rails does this stuff for you. Just run:

script/rails generate performance_test example

And a new file will be placed in test/performance/example_test.rb containing the default test:

require 'test_helper'
require 'rails/performance_test_help'
class ExampleTest < ActionDispatch::PerformanceTest
  # Refer to the documentation for all available options
  # self.profile_options = { :runs => 5, :metrics => [:wall_time, :memory]
  #                          :output => 'tmp/performance', :formats => [:flat] }

  def test_homepage
    get '/'
  end
end
Editing

Since ActionDispatch::PerformanceTest inherits from ActionDispatch::IntegrationTest, you can use all available helpers for integration tests in your performance tests. For instance, if you wanted a test for your login action you could use:

class LoginTest < ActionDispatch::PerformanceTest
  fixtures :users
  self.profile_options = { :metrics => [:wall_time, :memory] }

  def test_login
    post_via_redirect "/login", :username => users(:youruser).username, :password => users(:youruser).password
  end
end
Tweaking

Starting with Rails 3.1, performance tests can be configured. As you’ve probably figured out from the aforeshown LoginTest, all you need to do is to specify an optional hash of options to use when benchmarking/profiling. You can use one set of options for each class. Not all options are available to all interpreters, especially the ones related with profiling. Metric/output availability for each interpreter will be shown below. You can skip this section and come back later, after grasping the whole concept. You’ll also be able to check it out on Rails’ performance testing guide once 3.1 comes out.

Metric availability
Benchmarking
Benchmarking
Profiling
Profiling
Output availability
Output availability
Running

Finally, it’s time to run your tests. Let’s start with benchmarking:

rake test:benchmark

And the output should be similar to this:

ExampleTest:
ExampleTest#test_homepage (16 ms warmup)
           wall_time: 0 ms
              memory: 17 KB
             objects: 195
             gc_runs: 0
             gc_time: 0 ms
 homepage (0.75s)

LoginTest:
LoginTest#test_login (92 ms warmup)
           wall_time: 10 ms
              memory: 180 KB
 login (0.44s)

Finished in 1.193759 seconds.

If any result disappoints you, profile it:

rake test:profile TEST=test/performance/login_test.rb

And you should get a similar output:

LoginTest:
LoginTest#test_login (105 ms warmup)
           wall_time: 69 ms
              memory: 2.4 KB
 login (5.02s)

Profiling will give you much more information than what’s printed on your terminal.

Reviewing results

By default, performance tests store their results in tmp/performance (although it can be changed by specifying a value for :output in the profile_options hash). For benchmarks, this is pretty straightforward: it stores one CSV per metric (LoginTest#test_login_memory.csv, for instance) with the results as time goes by.

measurement,created_at,app,rails,ruby,platform
183222,2011-08-10T18:15:09Z,,3.1.0.rc5,ruby-1.9.2.290,i686-linux
216344,2011-08-11T14:37:59Z,,3.1.0.rc5,ruby-1.9.2.290,i686-linux
(...)

When profiling, however, the result files are extremely important. They contain the juicy details of your test runs. Similarly to benchmarking results, there will be one file per metric. There are, however, multiple formats available, specially if you’re using RubyProf (and consequently MRI/REE/YARV). These formats can range from messy flat text files to awesome HTML stack traces, and they will provide valuable input when spotting bottlenecks.

The scope of this article is not to explore RubyProf’s available output formats, but you should have a look at the available printers. However, keep in mind that RubyProf supports more metrics and output formats than Rubinius/JRuby‘s profilers. These can only measure wall time when profiling, and will only print their results in Flat/Graph text formats.

RubyProf's HTML stack printer
RubyProf’s HTML stack printer

Quick tests

Performance tests are great, but they can be inconvenient when all you want is to quickly test a small snippet of code. For this, Rails provides two command line tools: benchmarker and profiler.

Open your terminal and run:

rails benchmarker 'User.all'

And it will work as if you had created a performance test and put that code in it. Very simple, right? Another example:

rails profiler 'User.all' 'User.find_by_login("goncalossilva")' --runs 3 --metrics cpu_time,memory # profiling memory won't work under Rubinius/JRuby (benchmarking memory will!)

Two things pop up from this code snippet: you can run multiple tests in a single command and you can specify options as you would with normal performance tests.

To get a glimpse at all available options, run:

rails benchmarker --help rails profiler --help

What can be done with this?

A lot of things can be accomplished with these tools. First and foremost, you can assess the performance of your application by benchmarking certain parts, either through tests or simple snippets of code. After finding potential bottlenecks, you can use profiling to gain a greater insight into what’s happening and how it can be improved.

There are other useful tasks that can be done with these tools. You could, for instance, compare the performance of different interpreters on your application:

    rvm 1.9.2
    rails benchmarker 'MyModel.slow_method' 'get "/"' --metrics wall_time,memory
    rvm ree
    rails benchmarker 'MyModel.slow_method' 'get "/"' --metrics wall_time,memory
    rvm rubinius
    rails benchmarker 'MyModel.slow_method' 'get "/"' --metrics wall_time,memory
    rvm jruby
    rails benchmarker 'MyModel.slow_method' 'get "/"' --metrics wall_time,memory

Now you’ll know which interpreter takes less/more time/memory when it’s opening your homepage/running MyModel.slow_method.

Giving it a try

If you’ve come this far, now you know how to use these powerful tools. Try playing with them: I’m sure you’ll find valuable information about your applications’ performance, and potentially spot some easily fixable bottlenecks. With little effort, your application will be faster, you will be prouder and your users will be happier!

Feel free to ask questions and give feedback in the comments section of this post. Gonçalo has also written a guest blog post for RubyLearning before, titled – “Ruby gems — what, why and how“. Fellow Rubyists, if you would like to write a guest blog post for RubyLearning write to satish [at] rubylearning.org

Technorati Tags: , ,

How do I test my code with Minitest?

How do I test my code with Minitest?

This guest post is by Steve Klabnik, who is a software craftsman, writer, and former startup CTO. Steve tries to keep his Ruby consulting hours down so that he can focus on maintaining Hackety Hack and being a core member of Team Shoes, as well as writing regularly for multiple blogs.

Steve Klabnik Programming is an interesting activity. Everyone has their favorite metaphor that really explains what programming means to them. Well, I have a few, but here’s one: Programming is all about automation. You’re really just getting the computer to automatically do work that you know how to do, but don’t want to do over and over again.

When I realized this, it made me look for other things that I do that could be automated. I don’t like repeating myself over and over and over again. That’s boring! Well, there’s one particular task that’s related to programming that’s easily made automatic, and that’s testing that your software works!

Does this story sound familiar? You run your program, try a few different inputs, check the outputs, and see that they’re right. Then, you make some changes in your code, and you’d like to see if they work or not, so you fire up Ruby and try those inputs again. That repetition should stick out. There has to be a better way.

Luckily, there is! Ruby has fantastic tools that let you set up tests for your code that you can run automatically. You can save yourself tons of time and effort by letting the computer run thousands of tests every time you make a change to your code. And it’ll never get tired and accidentally type in a 2 when you mean to type 3… Many people take this one step farther. They find testing so important and so helpful that they actually write the tests before they write the code! I won’t expound on the virtues of “test driven development” in this post, but it’s actually easier to write the tests first, once you get some practice at it. So, let’s pick a tiny bit of code to work on, and I’ll show you how to test it using Ruby’s built-in testing library, minitest.

For this exercise, let’s do something simple, so we can focus on the tests. We’ll make a Ruby class called CashRegister. It’ll have a bunch of features, but here’s the first two methods we’ll need:

  • The register will have a scan method that takes in a price, and records it.
  • The register will have a total method that shows the current total of all the prices that have been scanned so far.
  • If no prices have been scanned, the total should be zero.
  • The register will have a clear method that clears the register of all scanned items. The total should go back to zero again.

Seems simple, right? You might even know how to code this already. Sometimes, intermediate programmers practice coding problems that are easy, just to focus on how to write good tests, or to work on getting the perfect design. We call these kinds of problems ‘kata.’ It’s a martial arts thing.

Anyway, enough about all of this! Let’s dig in to minitest. It already comes with Ruby 1.9, but if you’re still using 1.8, you can install it with ‘gem install minitest.’ After doing so, open up a new file, register.rb, and put this in it:

require 'minitest/autorun'

class TestCashRegister < MiniTest::Unit::TestCase
  def setup
    @register = CashRegister.new
  end
  def test_default_is_zero
    assert_equal 0, @register.total
  end
end

Okay! There’s a lot going on here. Let’s take it line by line. On the first line, we have a ‘require.’ The autorun part of minispec includes everything you need to run your tests, automatically. All we need to do to run our tests is to type ruby register.rb, and they’ll run and check our code. But let’s look at the rest of the file before we do that. The next thing we do is set up a class that inherits from one of minitest’s base classes. That’s how minitest works, by running a series of TestCases. It also lets you group similar tests together, and split different ones up into multiple files.

Anyway, enough organizational stuff. In this class, we have two methods: the first is the setup method. This runs before each test, and allows us to prepare for the test we want to run. In this case, we want a new CashRegister each time, and we’ll store it in a variable. Now we don’t have to repeat our setup over and over again… it’s just automatic!

Finally, we get down to business, with the test_default_is_zero method. Minitest will run any method that starts with test_ as a test. In that method, we use the assert_equal method with two arguments. assert_equal is where it all happens, by comparing 0 to our register’s total, and it will complain if they’re not equal.

Okay, so we have our first test. Rock! You might be tempted to start implementing our CashRegister class, but wait! Let’s try running the tests first. We know they’ll fail, because we don’t even have a CashRegister yet! But if we run the tests before writing code, the error messages will tell us what we need to do next. The tests will guide us through the implementation of our class. So, as I mentioned earlier, we can run the tests by doing this:

$ ruby register.rb

We get this as output:

Loaded suite register
Started
E
Finished in 0.000853 seconds.

1) Error:
test_default_is_zero(TestRegister):
NameError: uninitialized constant TestRegister::CashRegister
register.rb:5:in `setup'

1 tests, 0 assertions, 0 failures, 1 errors, 0 skips

Test run options: --seed 36463

Whoah! Okay, so you can see that we had one test, one error. Since we know classes are constants in Ruby, we know that the uninitialized constant error means we haven’t defined our class yet! So let’s do that. Go ahead and stick in an empty class at the bottom:

class CashRegister
end

And run the tests again. You should see this:

1) Error:
test_default_is_zero(TestRegister):
NoMethodError: undefined method `total' for #<CashRegister:0x00000101032a80>
register.rb:9:in `test_default_is_zero'

Progress! Now it says we don’t have a total method. So let’s define an empty one. Modify the class like this:

class CashRegister
  def total
  end
end

And run the tests again. Another failure:

1) Failure:
test_default_is_zero(TestRegister) [register.rb:9]:
Expected 0, not nil.

Okay! No more syntax errors, just the wrong result. Let’s keep it as simple as possible, and fill out a nice and easy total method:

def total
  0
end

Now, you may be saying, “Steve, that doesn’t calculate a total!” Well, you’re right. It doesn’t. But our tests aren’t yet asking to calculate a total, they’re just asking for a default. If we want a total, we should write a test that actually demonstrates adding it up. But we have fulfilled objective #3, so we’re doing good! Now, let’s work on objective #2, since we sorta feel like the total method is lying about what it’s supposed to do. In order to add up the items that were scanned, we need to scan them in the first place! Objective #1 says that this method should be called scan, so let’s write a test. Put it in your test class with the test_default_is_zero method:

def test_total_calculation
  @register.scan 1
  @register.scan 2
  assert_equal 3, @register.total
end

Make sense? We want to scan two things in, and then check that the total is correct. Let’s run our tests!

Loaded suite register
Started
.E
Finished in 0.000921 seconds.

1) Error:
test_total_calculation(TestRegister):
NoMethodError: undefined method `scan' for #<CashRegister:0x00000101031838>
register.rb:13:in `test_total_calculation'

2 tests, 1 assertions, 0 failures, 1 errors, 0 skips

Test run options: --seed 54501

Okay! See that ‘.E’ up there? That graphically shows that we had one test passing, and one test with an error. Our first test still works, but our second is failing because we don’t have a scan method. Add an empty one to our CashRegister class, and run again:

1) Error:
test_total_calculation(TestRegister):
ArgumentError: wrong number of arguments (1 for 0)
register.rb:24:in `scan'
register.rb:13:in `test_total_calculation'

Whoops! It takes an argument. Let’s add that: def scan(price). Run the tests!

1) Failure:
test_total_calculation(TestRegister) [register.rb:15]:
Expected 3, not 0.

Okay! This sounds more like what we expected. Our total method just returns zero all the time! Let’s think about this for a minute. We need to have scan add the price to a list of scanned prices. So we’d better have it do that:

def scan(item)
  @items << item
end

But if you run the tests, you’ll see this:

1) Error:
test_total_calculation(TestRegister):
NoMethodError: undefined method `<<' for nil:NilClass
register.rb:25:in `scan'
register.rb:13:in `test_total_calculation'

Oops! @items is undefined. Let’s make it be an empty array, when we create our register:

def initialize
  @items = []
end

And run the tests:

1) Failure:
test_total_calculation(TestRegister) [register.rb:15]:
Expected 3, not 0.

Okay! We’re back to our original failure. But we’ve made some progress: now that we have an actual list of items, we’re in a position to make our total method work. Also, at each step, even though one test was failing, the other was still passing, so we know that we didn’t break our default functionality while we were working on getting a real total going.

Now, we’re in a better place to calculate the total:

def total
  @items.inject(0) {|sum, item| sum += item }
end

Or, if you want to make it even shorter:

def total
  @items.inject(0, &:+)
end

If you’re not familiar with Enumerable#inject, it takes a list of somethings and turns it into a single something by means of a function, in a block. So in this case, we can keep a running sum of all items, and then add the price of each one to the sum. Done! Run your tests:

Started
..
Finished in 0.000762 seconds.

2 tests, 2 assertions, 0 failures, 0 errors, 0 skips

Woo hoo! We’re done! Our total can now be calculated. Great job!

Now, here’s a challenge, to see if you’ve really learned this stuff: write a test for a new method, clear, that clears the total. That’s objective #4 we talked about above.

Other parts of minitest

This has been a mini intro to minitest and using it to test your code. There are other methods in the assert family, too, like assert_match, which takes a regular expression and tries to match it against something. There’s the refute family of tests, which are the opposite of assert:

assert true #=> pass
refute true #=> fail

There are also other tools that make minitest useful, like mocks, benchmark tests, and the RSpec-style ‘spec’ syntax. Those will have to wait until later! If you’d like to learn about them now, check out the source code on GitHub.

Happy testing!

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Also, do check out Steve’s other article: “How do I keep multiple Ruby projects separate?” on RubyLearning. Thanks!

Technorati Tags: , ,

How Can We Develop For Tomorrow’s Needs?

How Can We Develop For Tomorrow’s Needs?

This guest post is by James M. Schorr, who has been in IT for over 14 years and been developing software for over 11 years. He is the owner of an IT consulting company, Tech Rescue, LLC, which he started along with his lovely wife, Tara, in 2002. They live in Concord, NC with their three children Jacob, Theresa and Elizabeth. James spends a lot of time writing code in many languages and a passion for Ruby on Rails in particular. He loves to play chess online at FICS (his handle is kerudzo) and to take his family on nature hikes. His professional profile is on LinkedIn and you can read more of his writings on his blog.

James M. Schorr The average developer is often forced to get code out the door as quickly as possible, primarily due to unrealistic deadlines and budgets. As a result, the quality and future expandability of software is greatly harmed. Software is now used in medical machinery, our vehicles, power plants, stock markets, aircraft, weapons, etc… As software becomes more and more critical in our lives, the need to think long-term is becoming increasingly critical.

Obviously, the quickest way is almost always not the best way. I hope to give some practical steps to those involved in software development that will help in the development of stable, long-lasting software. A proper strategy session involving the below steps can help save a lot of wasted time and money.

Quality, future-resilient software is tough to define, but reveals itself when it does what it’s supposed to without unpleasant surprises, handles unpredictable user input and system issues in gracious, non-devastating ways, and, in general, makes the user’s life easier. The tough part is that user’s needs and systems change. How do we engineer for tomorrow’s needs?

The keys to successfully developing long-term software are:

Establishing the Purpose: What is the point of the software? Do the needs that it are anticipated to be met look as though they will be the same core needs in the foreseeable future? In other words, will the main needs be met by this software and can we easily build out from there? If not, we need to keep the anticipated future needs in mind as we “scope” out the architecture of the project and provide “space” for them.

Choosing the “Stack”: (what technologies, languages, etc… will be used). The stack should be chosen carefully, based upon:

  • proven stability. For example, it may be “cool” but unwise to write the software in the latest-and-greatest language. I’ve seen instances where a language/framework is chosen strictly due to its current popularity. This is typically a recipe for disaster, as those who go (and enjoy) that route typically move on to the next greatest thing, leaving code behind for non-fad-following developers to handle.
  • current in-house knowledge. For instance, maybe our developers love and know Ruby, should we really force them to write an app in VB? Or perhaps it is a Microsoft-shop, are time and funds available to facilitate the learning-on-the-fly of non-MS technologies? I don’t believe that it is ever appropriate to write mission critical software using a language/framework that is unfamiliar to the developers. There are times, however, where the software is so mission-critical and matches a language’s abilities to the point where it makes sense to pull in new talent. It can be argued that software can be written in almost any language, that the language itself doesn’t matter much. But sometimes it really does, both in terms of expressiveness and developer satisfaction (note: I still contend that a happy developer is a good developer, or at least becoming one).
  • infrastructure requirements. Do we have the hardware and network necessary to decently support the software and its anticipated usage? Disk space, memory requirements, OS, network speed, etc… All of these matter, a lot. It’s best to always plan for 2-3x the anticipated usage. For instance, for a web app, if we anticipate 1k users, let’s build for 2-3k users, with built-in monitoring of the resources being used and a plan of how to scale up quickly when we hit a “soft” threshold.

Planning:

  • Architectural Drawing: I’m a big fan of having at least the “skeleton” of the project drawn out, particularly on a white-board (I’m a bit old-school, I know). It doesn’t have to be a fancy diagram or complicated UML diagram, just a simple drawing; the more understandable, the better. This high-level overview provides guidance when we’re deep into code, as we can look up and see if we’re on track (as it’s all too easy to go down a code “rabbit trail” if we’re not careful). It is counterproductive, however, to draw out every little detail, as this will stifle creativity and overwhelm us while we’re writing code (we just won’t look at the diagram then).
  • Establish Deadlines: we do need to know the deadlines. It’s best, in my opinion, to have several small deadlines with a semi-flexible final deadline. This helps us keep on track and measure our progress little by little. As we hit the small deadlines, our confidence builds, which then improves our productivity and, in general, our code quality.
  • Using Available Expertise Wisely: does it make sense to assign Bob, the awesome Python programmer, to doing CSS and Bill, the great designer, to slinging code? Obviously not (I have seen some managers try this, though), we may lose both team members or end up with Google copy-and-paste code and animated GIFs in our Project. Cross-training is a nice and potentially valuable concept, but it should be done outside of a software project with its accompanying deadlines. Future minor features might provide a better opportunity ground for cross-training. If Bob’s swamped, maybe we need to find him some decent help. :)
  • Determine the Deployment Strategy(for both during and after the Project):
    • code should be checked into our version control system prior to any deployments.
    • maybe we should only deploy code after business hours after alerting such-and-such a group. If our project has any possibility of negatively impacting others, notification is not only kind, but often necessary, especially for large changes.
    • a rollback strategy must always be in place. This strategy must be easily understandable with simple steps, so that little, if any interpretation by support staff, needs to be done in the “heat of the moment” support calls. Even if our developers are top-notch, until code gets into Production, we cannot be 100% sure that it will not need to be rolled back. This is why major companies often have to release an update quickly after a major release. Some things just can’t be easily discovered until they’re released into “the wild”.

Building with Expansion in Mind:

  • One of the wonderful aspects of developing software is also its most dangerous aspect: flexibility. A feature or component can often be written in different ways. There typically are only one or two best ways, though. It can be very difficult to determine, unless one steps back from the project and thinks it through. Well-known software principles help a great deal with this, but come up short if they are not “placed up against” the anticipated needs of the future (in other words, if we don’t understand where we’re going, our code will still be awful even if we follow DRY, OOP, GOF, etc…). As much as possible, this needs to be done not by the developer but by someone outside the code, so to speak, perhaps a technical team lead, etc…
  • When adding core features, we need to at least take a few moments to think through possible future implications of what we’re doing. For example, our Component A is currently parsing JSON from website B using C credentials. Component D depends upon Component A’s data. Wouldn’t it make sense to have these in an encrypted setting field somewhere to make it easy to change in the future? If Component A’s data was slightly different, would Component B “blow up”? Maybe we can abstract all of this a bit?
  • Avoiding Spaghetti-Code: proper design and a commitment to sticking to the design in the future helps to prevent our code from such entanglement. In other words, we need to commit to never, ever quickly throwing code in to the project, as this leads to “spaghetti-code”. Of course, there may be the exceedingly rare occasions, where we may need to do such a stop-gap measure due to an emergency, but we must then learn from our mistake and commit to re-engineering that portion of the code properly.

Data Safety:

  • As we depend more and more upon data, it’s becoming increasingly important that we do our best to have automated backups, which are then checked frequently by a person. This cannot be emphasized heavily enough. All too often, properly designed backups can stop working without anyone noticing until it is too late.
  • If encryption is used:
    • the encryption keys need to be stored off-site in at least 2 secure places. Imagine if we lost our server(s), our office burned down, our VPS provider goes offline, etc…- even if we had backups, could we get to the raw data if needed? No one wants to start over from scratch.
    • Does the encryption depend upon a certain cipher? If so, what is the game plan for when that cipher is cracked someday? How easy will it be for us to move to a new cipher?
  • Does our data depend upon a specific version? For instance, maybe database X version Y can open the data but no other versions can. Do we have a backup of that version to access the data if needed? Better yet, this reveals a key flaw in our design. Our data should not be heavily dependent upon any software version.
  • Would our data be understandable if a new developer 10 years from now is assigned to work with it? For instance, if a column for a user’s API Key is called usrscr_ak12, we may understand it, but it’s not future-proof (a better term may be “future-resilient”, since nothing is truly future-proof). Such obfuscation attempts provide little security, as if someone can get that far (to the data), we’ve lost the security “battle” anyhow. Data should be clearly understood by those who can access it.
  • Can our data be exported easily when the software that we’re lovingly developing now someday gets decommissioned? All software will eventually get replaced by something better. How easily can our data be decoupled from our application?

Pin-pointing Possible “Dominoes” in our project and code-base (e.g. if A happens, does it affect B, which then affects C, etc…, these can be like dominoes). Amazon’s recent AWS issues in 2011 revealed the criticality of this step. The more time that we spend anticipating what can go wrong, the more we can establish quick steps to both prevent such issues and to mitigate possible damage. At the bare minimum, the possible “dominoes” and recommended quick steps need to be written down somewhere. This can greatly help to expedite future troubleshooting.

  • Our Software: We must try to anticipate, as much as possible, what the interdependencies are in our project and its surrounding infrastructure. These dependencies should be in written form and re-reviewed as further functionality is added to the software in the future (e.g. ITIL Change Management).
  • Dependent Software: What software or systems will depend upon our software? When our system goes down, will other software be slamming our system asking for a response?
  • Dependent Systems: if we saturate our network, is our software designed to “back-off” and retry after an appropriate, randomized delay?

Obviously, none of the above can be done overnight. If even some of the above is done, however, the chance of our software having a longer-lasting, positive impact will be greater. I recommend that the start of each project have at least a 3-5 days dedicated to going through these steps. Gathering input from the teams of people who are responsible for various components (e.g. clients/end users, network, sysadmins, developers of other dependent software, etc…) will be invaluable. The payoff will be great.

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Also, do check out James’ other article: “Do You Enjoy Your Code Quality?” on RubyLearning. Thanks!

Technorati Tags: ,

Cryptography Or: How I Learned to Stop Worrying, and Love AES

Cryptography Or: How I Learned to Stop Worrying, and Love AES

This guest post is by Phillip Gawlowski, who is living in the German wilderness of Oberberg near Cologne. Phillip spends his time writing Ruby as a hobby just for fun. He tries to make life a little easier for himself and for others when he is crazy enough to release his code as open source. He’s neither famous nor rich, but likes it that way (most of the time). He blogs his musings at his blog.

Phillip Gawlowski A friend gave you the plans for Dr. Blofeld’s newest Doomsday Device. Over the engine noise of his Aston-Martin, he tells you: “Send this to offers@universal-exports.co.uk, and make sure it arrives there intact!”

All you have is a laptop, wonky Internet access, and Ruby. What to do?

AES For Safety, SHA2 For Integrity

You now have two goals:

  1. Make the Doomsday Device plans unreadable, and
  2. Ensure that the data has arrived at its destination without error.

Fortunately, Ruby provides an API to OpenSSL, a well-tested, widely used library and set of tools used for encryption of all kinds, and includes its own implementations of several cryptographic hashes.

In this article we will use AES for de- and encryption, and SHA2 to hash data.

Using SHA2

Like many things, Ruby makes creating crypto-hashes easy:

require 'digest/sha2'
sha256 = Digest::SHA2.new(256)
sha256.digest("Bond, James Bond")

The SHA2#new call provides us with the bit length we want our hash to have. SHA2 exists in two variants: 256, also called SHA256, and 512, called SHA512. A longer key length takes longer to calculate, but is also more accurate, and much more difficult to attack with a rainbow table or other cryptanalysis.

Once we have our SHA object, we pass a String of data into the #digest to have the hash of this data returned as a String.

You can call the #digest method directly when you are working with MD5 or SHA1:

require 'digest/MD5'
Digest::MD5.digest "Bond, James Bond"

The Advanced Encryption Standard

Theory

As AES is a so-called symmetric-key block cipher, it operates on chunks of data, called blocks, and applies the provided key to this block to create de- and encrypted output. The use of the same key for encryption and decryption is what makes the cipher symmetric. Conversely, asymmetrical ciphers use different keys for decryption and encryption, usually a private key known only to the recipient to decrypt, and a public key known to anyone to encrypt. SSH, SSL/TLS and PGP are examples for this kind of cipher.

The AES family has three modes of operation: 128 bit, 192 bit, and 256 bit. Just as with SHA2, you’ll find AES-128, or AES-256 being used to describe the particular block size that can be used.

The downside to this approach is that the same key is used for each block of data, which weakens the encryption (the same data is encrypted in the same way!). The solution is to use a so called “mode of operation”, which scrambles the cipher so that it becomes indistinguishable from noise.

A full discussion of methods of operation and their strengths and weaknesses would go well beyond the scope of this article, however.

…And Practice

Now let’s take a look at Ruby’s encryption API:

require 'openssl'
require 'digest/sha2'

payload = "Plans for Blofeld's newest Doomsday Device. This is top secret!"
sha256 = Digest::SHA2.new(256)
aes = OpenSSL::Cipher.new("AES-256-CFB")
iv = rand.to_s
key = sha256.digest("Bond, James Bond")

aes.encrypt
aes.key = key
aes.iv = iv
encrypted_data = aes.update(payload) + aes.final

puts encrypted_data

Since Ruby’s OpenSSL API is pretty straight forward (and so is the OpenSSL API, if you would like to use OpenSSL in C code), we will only discuss what’s really important.

OpenSSL::Cipher.new("AES-256-CFB") sets up an AES object, with a block size of 256 bits and the CFB mode of operation. To find out which ciphers are supported, OpenSSL::Cipher.ciphers allows you to interrogate the class for which ciphers are understood.

The iv variable stores our random Initialization Vector, random data to seed the mode of operation to ensure that each 256 bit block is encrypted uniquely, and thus (hopefully) indistinguishable from noise.

We also take advantage of SHA2′s 356 bit variant to generate a 256 bit password from a simpler password. AES expects the encryption key to be as long as a block of data, and since creating a 256 bit password from hand is pretty difficult, we let the computer do the job. When used in production, you most likely want to add a salt to the hash, or use a user’s already hashed password.

With the #decrypt and #encrypt methods, we put our AES object into the proper state. Behind the scenes, this initializes OpenSSL’s encryption engine. These two method calls are required before any other method call!

Last but definitely not least, the #update and #final methods are where the encryption actually happens. The more data you have, the longer the chunks, and the more complex the cipher, the longer this will take. The #final method does the same as #update, but ads padding to a chunk to bring it up to the required block size.

In case you make a mistake, or want to do another round of encryption or decryption, the #reset method can reset a Cipher object.

Decryption works pretty much the same as encryption, except that we pass the encrypted data to the #update-method:

aes.decrypt
aes.key = key
aes.iv = iv
puts aes.update(encrypted) + aes.final

Note, however, that both the key and the IV must be the same, and thus have to be stored or transmitted to the recipient of the encrypted data!

Verifying Integrity

As we’ve already seen, a hashing algorithm can turn data of arbitrary length into a fixed length, unique stream of bytes. This can function as password storage, to generate securer keys for encryption, or, since the output of a hash algorithm is deterministic (it’s always the same for the same input) as an integrity check.

If you’ve downloaded a Linux distribution or other software, you have already seen this, in the form of MD5 digests, with which you can verify that a download is complete and error free, like on Ruby’s homepage.

We will do the same with our encrypted data, as a poor man’s message authentication code–a technique in cryptography to ensure that a message has not been tampered with:

poor_mans_mac = sha2.digest(encrypted)

Now all that’s left is to send an email to James’ employer with the Doomsday Device plans, and to give them a call to give them the IV and key.

Closing Remarks

Think of the Future

Security is not a state, it is a process. You should write your security-aware code in such a way that you don’t depend on a particular cryptographic algorithm. Ruby’s API (and OpenSSL’s own API) wrap encryption abstractly, so that you can swap out the algorithm you use at any time. This is also necessary for hashing algorithms: While there are no feasible attacks against SHA2 yet, the cryptanalysis only gets better over time, as the histories of MD5 and DES show.

Schneier’s Law

Schneier’s Law states that “any person can invent a security system so clever that she or he can’t think of how to break it.” This is why Ruby’s developers use OpenSSL to do encryption, a widely tested and certified (in some variants!) cryptographic library, instead of writing their own library.

A mistake in your implementation can compromise your and your customer’s data, since so called “side channel attack” are used as a matter of course to attack cryptography.

Encryption Does Not Mean You Are Safe

It is important, and I cannot stress this enough, that you do not store encrypted data and the keys to access it on the same machine (ideally, you don’t store these things on the same network!), or do your encryption and decryption on the same machine that you store you encrypted data on. Whole libraries have been filled with books on how to design a secure system, from hardware to software. Above all, security is a mindset, and you have to be properly paranoid to secure your data and access to this data. Sooner or later, if you deploy, or are about to deploy, security relevant code have your code tested by outsiders. Penetration testing is worth your while.

Asymmetric encryption has been invented to solve one problem with encryption: It is not necessary for such a cipher to transmit the key. However, they have their own set of trade offs (key trust, and computational efficiency, among others).

The Safest Data is No Data

Like the fastest code is no code at all, if you don’t store data you don’t absolutely, positively have to store, don’t even bother with it. What you don’t have can’t be compromised.

Conclusion

This article is nothing but a superficial introduction to encryption in Ruby. There are dozens of standards and regulations that govern this vast topic. However, I have tried my best to give you, fellow Rubyists, enough knowledge about this topic for you to know which questions you should ask, which is, in the end, much more important than the code itself. Now go forth, and hash an encrypt and decrypt, and, above all, have fun doing it!

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , ,

Throw, Catch, Raise, Rescue… I’m so confused!

Throw, Catch, Raise, Rescue… I’m so confused!

This guest post is by Avdi Grimm, who is the author of “Exceptional Ruby“, an in-depth guide to exceptions and failure handling in Ruby. RubyLearning readers can get a $3 discount on the book by using code RUBYLEARN. Avdi has been hacking Ruby code for 10 years, and is still loving it. He is chief aeronaut at ShipRise, a consultancy specializing in sustainable software development and in helping geographically dispersed teams work more effectively. He lives in southern Pennsylvania with his wife and four children, and in his copious spare time blogs and podcasts at Virtuous Code and WideTeams.com.

Old keywords, new meanings

Avdi Grimm One of the aspects of Ruby that often confuses newbies coming from other languages is the fact that it has both throw and catch and raise and rescue statements. In this article I’ll try and clear up that confusion.

If you’re familiar with Java, C#, PHP, or C++, you are probably used to using try, catch, and throw for exception handling. You use try to delineate the block in which you expect an exception may occur. You use catch to specify what to do when an exception is raised. And you use throw to raise an exception yourself.

You’ve probably noticed that Ruby has throw and catch… but they don’t seem to be used the way you’re used to in other languages! And there are also these “begin“, “raise” and “rescue” statements that seem to do the same thing. What’s going on here?

Getting out fast

If you’ve done much programming in another language like Java, you may have noticed that exceptions are sometimes used for non-error situations. “exceptions for control flow” is a technique developers sometimes turn to when they want an “early escape” from a particular path of execution.

For instance, imagine some code that scrapes a series of web pages, looking for one that contains a particular text string.

def show_rank_for(target, query)
  rank = nil
  each_google_result_page(query, 6) do |page, page_index|
    each_google_result(page) do |result, result_index|
      if result.text.include?(target)
        rank = (page_index * 10) + result_index
      end
    end
  end
  puts "#{target} is ranked #{rank} for search '#{query}'"
end

show_rank_for("avdi.org", "nonesuch")

(For brevity, I’ve excluded the definitions of the #each_google_result_page and #each_google_result methods. You can view the full source at https://gist.github.com/1075364.)

Fetching pages and parsing them is time-consuming. What if the target text is found on page 2? This code will keep right on going until it hits the max number of result pages (here specified as 6).

It would be nice if we could end the search as soon as we find a matching result. We might think to use the break keyword, which “breaks out” of a loop’s execution. But break only breaks out of the immediately surrounding loop, and here we have a loop inside another loop.

This is a situation where we might come up with the idea of using an exception to break out of the two levels of looping. But exceptions are supposed to be for unexpected failures, and finding the results we were looking for is neither unexpected, nor a failure! What to do?

Throwing Ruby a fast ball

Ruby has given us a tool for just this situation. Unlike in other languages, Ruby’s throw and catch are not used for exceptions. Instead, they provide a way to terminate execution early when no further work is needed. Their behavior is very similar to that of exceptions, but they are intended for very different situations.

Let’s look at how we can use catch and throw to end the web search as soon as we find a result:

def show_rank_for(target, query)
  rank = catch(:rank) {
    each_google_result_page(query, 6) do |page, page_index|
      each_google_result(page) do |result, result_index|
        if result.text.include?(target)
          throw :rank, (page_index * 10) + result_index
        end
      end
    end
    "<not found>"
  }
  puts "#{target} is ranked #{rank} for search '#{query}'"
end

This time we’ve wrapped the whole search in a catch{...} block. We tell the catch block what symbol to catch, in this case :rank. When the result we are looking for is found, instead of setting a variable we throw the symbol :rank. We also give throw a second parameter, the search result :rank. This second parameter is the throw’s “payload”.

The throw “throws” execution up to the catch block, breaking out of all the intervening blocks and method calls. Because we gave the throw and catch the same symbol (:rank), the catch block is matched to the throw and the thrown symbol is “caught”.

The rank value that we gave as a payload to throw now becomes the return value of the catch block. We assign the result value to a variable, and proceed normally.

What if the search string is never found, and throw is never called? In that case, the loops will finish, and the return value of the catch block will be the value of the last statement in the block. We provide a default value (“<not found>”) for just this possibility.

catch and throw in the real world

The Rack and Sinatra projects provide a great example of how throw and catch can be used to terminate execution early. Sinatra’s #last_modified method looks at the HTTP headers supplied by the client and, if they indicate the client already has the most recent version of the page, immediately ends the action and returns a “Not modified” code. Any expensive processing that would have been incurred by executing the full action is avoided.

get '/foo' do
  last_modified some_timestamp
  # ...expensive GET logic...
end

Here’s a simplified version of the #last_modified implementation. Note that it throws the :halt symbol. Rack catches this symbol, and uses the supplied response to immediately reply to the HTTP client. This works no matter how many levels deep in method calls the throw was invoked.

def last_modified(time)
  response['Last-Modified'] = time
  if request.env['HTTP_IF_MODIFIED_SINCE'] > time
    throw :halt, response
  end
end

The way Rack uses catch/throw illustrates an important point: the throw call does not have to be in the same method as the catch block.

Conclusion

Ruby is a language that tries to anticipate your needs as a programmer. One common need is a way to terminate execution early when we find there is no further work to be done. Unlike in some languages, where we would have to either abuse the exception mechanism or use multiple loop breaks and method returns to achieve the same effect, Ruby provides us with the catch and throw mechanism to quickly and cleanly make an early escape. This leaves begin/raise/rescue free to be used for errors, and nothing else.

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , ,

How do I smell Ruby code?

How do I smell Ruby code?

Understanding the worst of code

This guest post is by Timon Vonk, who is a self-employed Ruby enthusiast and standard nerd with an edge. He has worked with Ruby for several years, but is well-known with many other (programming) languages. Also likes martial arts, loud music, varying quantities of booze and a good scotch.

Introduction

Timon Vonk Writing bad code isn’t a bad thing. Not understanding the problem you’re trying to solve any better after having written that piece of code is. Fortunately, that happens far less often. In this article I hope to give a better understanding of Ruby code by going into Ruby specific code smells. We’ll start with some simple examples that are common in all programming languages – they just need to be covered – and then dive into some Ruby specific smells.

So what is that smell?

The term was coined in the 90s by Kent Beck on WardsWiki (one of the first wikis around) and has been popularized ever since. A code smell is a hunch, not necessarily measurable, that the code you’re looking at can be improved in some way. This process of improvement is called refactoring, as you might know. And as far as refactoring is concerned, there is no time like now, don’t leave open ends; it’s a bad habit.

The Basics

Let us give a quick rundown on the more basic code smells:

  • Duplicated code, if you see any, is almost always a bad thing. We’ll get into this part a little later.
  • Multiple method / class responsibility is always a bad thing. Try factoring out your solution in multiple methods. It will make you’re code more readable and a lot better maintainable. Large methods and classes are a dead giveaway for this as well.
  • A class should never use more methods from other classes than from itself. Why is it even there?
  • A child class should always honor the contract of the parent class, i.e., be a kind of that class. Check out the Liskov substitution principle for more information.
  • If a class hardly does anything, why is it there?
  • Does your solution have a more simple approach? Complexity can be a reason of pride for some – if not most – coders, but too much makes it terrible to understand, especially later on.
  • Non-descriptive or too long identifiers or names are a good sign that either you can’t define the responsibility of the code, or you have a hard time with naming conventions.

Simple, easy smells should give a good start on fine tuning your code. However, every language has its own specific quirks. Let us take a look at Ruby.

Calling eval on user input or unchecked code

input = "'rm -rf /'"
klass =  eval input

Of course this is bad. I hope it doesn’t need any explanation. Try not to use eval, but instance_eval instead. And if you use either, make sure that the code you eval is secure — never eval user input directly.

Nested blocks without added value

array = [["banana", "apple"],["pineapple","beer"]]
# And I want to call reverse on each element, I could do
array.collect { |sub_array| sub_array.collect { |subsub_array| subsub_array.reverse }}
# But this is much nicer
array.collect { |sa| sa.collect(&:reverse) }
# => [["ananab", "elppa"],["elppaenip","reeb"]]

The reason is simple enough. Your code is more readable, and that’s what we all want. So what about nested multi-line blocks? Check it out, big chance your solution is the root of this particular evil.

Code similarity

def post_to_site(data)
  url = build_url(data)
  response = RestClient.post(url)
end

def get_from_site(data)
  url = build_url(data)
  response = RestClient.get(url)
end

def delete_from_site(data)
  url = build_url(data)
  response = RestClient.delete(url)
end

You can easily solve this lump of code by introducing some meta-programming:

def  response_from_site(data, method = :get)
  url = build_url(data)
  response = RestClient.public_send(method, url)
end

This gives you a clean, nicer method. And it’s readable too! Isn’t that nice?

Long, repetitive and cluttering statements

Often enough you have similar parameters that call similar methods. For instance, you might need to check on some parameter and call the associated method. Or even simpler, you might need to check if a certain user input matches your criteria. I prefer a simple rule of thumb, if you’re working with any sort of collection or set, the functional approach is always cleaner, more simple and most definitely faster. The point is not to dictate when and whether you should prefer the functional approach, just that you understand that long lines of repetitive clutter screw up your code.

Take the following example:

input = "english"
case input
when "english"
  puts "English, ***, do you speak it?"
when "french"
  puts "Baguette!"
when "dutch"
  puts "I only smoke *** when it's free."
else
  puts "Dunno"
end

You can imagine in complex applications that this goes on and on. I actually see it happen a lot and it’s not necessarily bad. However, it’s hard to maintain and in more complex situations it can get really hard to read through. I’ve seen loads of Rails applications where they use just this to check on a particular parameter. Really ugly!

Since it seems like you’re white listing, one way to solve it would be to use a hash with input:result.

whitelist = {
  "english" => "English, ***, do you speak it?",
  "french" => "Baguette!",
  "dutch" => "I only smoke *** when it's free.",
  "other" => "Dunno."
}

if whitelist.has_key?(input)
  puts whitelist[input]
else
  puts whitelist[other]
end

It’s always important to be proud of the code you write. It really helps if it doesn’t smell. And I hope this article helped you do that.

Feel free to ask questions and give feedback in the comments section of this post. Thanks!

You might want to read a related article:

Technorati Tags: , ,

Interview: Michael Hartl, author of the Ruby on Rails Tutorial (railstutorial.org)

RubyLearning participants talk to Michael Hartl the author of the Ruby on Rails Tutorial (railstutorial.org).

Michael Hartl

Satish Talim>> Welcome Michael and thanks for taking out time for RubyLearning. For the benefit of the readers of this blog could you please introduce yourself and tell us what you do for a living?

Michael>> Happy to be here. I’m a programmer, educator, and entrepreneur. Recently, I’ve been focused on making educational products and selling them online. I’ve been doing web development since around 2001 and Rails development since 2005. I also have a background in academic teaching and research, principally in theoretical and computational physics.

My current products are the Ruby on Rails Tutorial book and the Ruby on Rails Tutorial screencasts. The book is available for free online, for buy as a PDF, and as a print edition. The screencasts are available for purchase from the Rails Tutorial website or (if you have a subscription) from Safari Books Online. I especially recommend the Rails Tutorial PDF/screencast bundle.

Ricardo Astorquia, Spain>> How do you get the right balance for teaching in a book for those folks that may have different backgrounds, where more details are necessary while another reader may need just a little more guidance than just a reference book?

Michael>> It’s important to realize that advanced readers rarely mind a little basic material, especially if including it is a core part of your style, and that basic material helps bring the newbies up to speed. One inspiration is The Economist magazine’s house style, which usually includes some information about a company or person, no matter how famous; for example, they might write “General Electric, an American conglomerate” or “Steve Jobs, boss of Apple”. I try always to include enough detail that even a beginner has a place to start if they need further information.

Vince Vincent, USA>> Do you intend to create a sequel as new Rails versions are released? If not, what is the speediest way for a Rails developer to progress from here (aside from reading the API which many suggest)?

Michael>> I plan to keep the Ruby on Rails Tutorial up-to-date. The book is easy to edit, but the screencasts are trickier, so for a while I might only supplement the screencasts. Eventually, though, I anticipate having to re-cut the entire series once Rails has changed enough to justify the effort.

Imhotep Albasiel, USA>> Would you be writing about Rails development on Windows in the future?

Michael>> I am hoping to cover Rails development on Windows in future editions. Part of the issue has been the lack of a standard Windows installation method, but the new Rails Installer aims to change that, so I’m optimistic that Rails development will start to take off on Windows.

Samnang Chhun, Cambodia>> Is it important to understand Rack when learning Rails?

Michael>> Rack is a Ruby library that provides a standard interface between web frameworks and web servers. Most Ruby web frameworks, including Rails and Sinatra, use Rack, and it is certainly important in some contexts, but I think that Rack can be skipped when first learning Rails. It’s really more of an intermediate-to-advanced topic.

Samnang Chhun, Cambodia>> What should I do next to become a Rails guru?

Michael>> There are lots of great Rails resources out there, and I particularly recommend Railscasts by Ryan Bates. Of course, there’s no substitute for writing your own application, so I suggest picking a problem that interests you and plunging ahead.

Victor Goff, USA>> On January 25th, you were notified that your RailsTutorial was banned by a certain country. Have you cashed in on the notoriety yet!?

Michael>> I’m not sure being blocked by the Great Firewall of China is a big enough story to earn me much notoriety. It is weird, though, and disappointing. I guess it means I’ve made the big time?

Victor Goff, USA>> How do you manage having a publication that can be broken by updates in the gems that you use, either in the production directly, or in testing?

Michael>> This is a big lesson I learned from my first Rails book, called RailsSpace. In that book, my coauthor and I made the mistake of not using specific version numbers for the gems, but the Ruby on Rails 3 Tutorial book avoids this error. Every one of the gems in the book is tied to a particular version number, so the tutorial is (virtually) guaranteed to work as advertised. Of course, I do occasionally update the book with new gem versions, but I always test the new gems to make sure they work. (The sample application’s test suite proves invaluable in this context.)

Robin Gowin, USA>> Where do you see Rails going, and what do you think of the Rails – Merb merger?

Michael>> I think Rails is off to the races now, especially with the release of Rails 3. The Rails core team and the Merb developers deserve immense credit for setting aside their differences and joining forces to make Rails 3 happen. Given how modular the core of Rails is now, I expect all kinds of great innovation in the next few years.

Mohnish Jadwani, India>> If developers want to migrate from an application built on Rails 2 to an application built on Rails 3, what are the challenges one would face for this migration (I understand this would be an app specific question, I only want to know in generic terms). How best can this be dealt?

Michael>> Since the Rails Tutorial is aimed mainly at beginners, I didn’t feel that covering the upgrade from Rails 2 to Rails 3 fit with the core philosophy of the book. Moreover, there are already lots of resources to help make the Rails 2.x to 3.x upgrade, including an e-book dedicated to this subject (Jeremy McAnally’s “Rails 3 Upgrade Handbook“).

Zachary S. Scott, USA>> Do you have plans for any other Ruby (non-Rails related) project?

Michael>> I am contemplating making a Ruby tutorial at some point, but no promises! I’m also planning to open-source PolyTeXnic, the Ruby program I use to make the HTML and PDF versions of the book.

Zachary S. Scott, USA>> What do you think of Sinatra?

Michael>> I’ve only dabbled with Sinatra, but I’d like to know it better. It seems very clean and elegant.

Satish Talim>> Anything else you would like to add?

Michael>> Web development is hard, so don’t get discouraged if you run into difficulties. All web developers run into difficulties all the time. With practice, you’ll get better at powering through the problems &mdash and you’ll also learn that sometimes you have to give up and hack around them. :-)

Thank you Michael. In case you have any queries and/or questions, please post your questions here (as comments to this blog post) and Michael would be glad to answer.

Technorati Tags: , ,

How do I make a command-line tool in Ruby?

How do I make a command-line tool in Ruby?

This guest post is by Allen Wei, who works as Senior Ruby On Rails Engineer for Seravia, in Beijing. He is very enthusiastic about Ruby. He started using Ruby after several years of using Java, .NET and never came back to them. When he has some spare time, he develops Ruby gems, holds tech sessions, and shares his experience in his blog. He is also a fan of BDD and TDD, using them in all his open source projects. He gains a lot from the Ruby community and hopes to give back.

Introduction

Allen Wei Ruby, as a dynamic language, is always used for quick processing command-line tool for its simplicity and productivity.

This article talks about three ways to write a command-line tool.

Before we start, there are a few things you need to know:

  1. Put line #!/usr/bin/env ruby into the first line of your command-line file which will tell the shell to execute your file using Ruby (#!/usr/bin/env ruby is similar to simply calling ruby from the command line, so the same rules apply. Basically, the individual entries in the $PATH environment variable are checked in order, and the ruby that is found first is used.).
  2. Make sure your file is executable, run chmod u+x FILE_PATH.
  3. Print help text and return right exit code (0 means success, other number means fail) if the user uses it in the wrong way.

Note that other people will not be sure how to execute your command-line tool.

Conventions

I’ll use three definitions:

  1. Command-line file name
  2. Command
  3. Option

For example there is a command: ‘server start -e development’

  1. Command-line file name is ‘server’
  2. Command is the first argument ‘start’
  3. Option is the reset of argument pair ‘-e development’

Let’s go

We shall start from a simple example: write a command-line tool to start, stop and restart the server.

Without any lib

# server.rb
case ARGV[0]
when "start"
  STDOUT.puts "called start"
when "stop"
  STDOUT.puts "called stop"
when "restart"
  STDOUT.puts "called restart"
else
  STDOUT.puts <<-EOF
Please provide command name

Usage:
  server start
  server stop
  server restart
EOF
end

ARGV, all arguments will be stored as an array in this variable.

What if you need to pass some options?

# server.rb
def parse_options
  options = {}
  case ARGV[1]
  when "-e"
    options[:e] = ARGV[2]
  when "-d"
    options[:d] = ARGV[2]
  end
  options
end

case ARGV[0]
when "start"
  STDOUT.puts "start on #{parse_options.inspect}"
when "stop"
  STDOUT.puts "stop on #{parse_options.inspect}"
when "restart"
  STDOUT.puts "restart on #{parse_options.inspect}"
else
  STDOUT.puts <<-EOF
Please provide command name

Usage:
  server start
  server stop
  server restart

  options:
    -e ENVIRONMENT. Default: development
    -d DEAMON, true or false. Default: true
EOF
end

This code is simple but it has some disadvantages:

  • Writing option parser and help text in different places will bring you troubles when they are not matched.
  • Using array index to get options from ARGV. These magic numbers will create a maintenance problem.

Using OptionParser

OptionParser is a built-in Ruby lib to help you parse arguments.

We can refactor our code like this:

require 'optparse'

options = {}

opt_parser = OptionParser.new do |opt|
  opt.banner = "Usage: opt_parser COMMAND [OPTIONS]"
  opt.separator  ""
  opt.separator  "Commands"
  opt.separator  "     start: start server"
  opt.separator  "     stop: stop server"
  opt.separator  "     restart: restart server"
  opt.separator  ""
  opt.separator  "Options"

  opt.on("-e","--environment ENVIRONMENT","which environment you want server run") do |environment|
    options[:environment] = environment
  end

  opt.on("-d","--daemon","runing on daemon mode?") do
    options[:daemon] = true
  end

  opt.on("-h","--help","help") do
    puts opt_parser
  end
end

opt_parser.parse!

case ARGV[0]
when "start"
  puts "call start on options #{options.inspect}"
when "stop"
  puts "call stop on options #{options.inspect}"
when "restart"
  puts "call restart on options #{options.inspect}"
else
  puts opt_parser
end

Try to execute this file without arguments; you’ll find it prints a very nice help text.

opt_parser.parse! is the method extract options from ARGV, extracted value will be deleted from ARGV.

OptionParser is better than that.

You can define options value type, then OptionParser will convert value to the type you defined, like this:

opt.on("-e","--environment ENVIRONMENT",Numeric,
       "which environment you want server run") do |environment|
  options[:environment] = environment
       end
opt.on("--delay N", Float, "Delay N seconds before executing") do |n|
  options[:delay] = n
end
opt.on("-j x,y,z","--jurisdictions x,y,z", Array,
       "which jurisdiction will start") do |jurisdictions|
  options[:jurisdictions] = jurisdictions
       end
server_list = %w[a b c]
opt.on("-s SERVERS","--servers SERVERS", server_list,
       "which server will start between #{server_list.join(',')}") do |servers|
  options[:servers] = servers
       end

You can mark whether the value of the option is mandatory.

# Mandatory argument.
opts.on("-r", "--require LIBRARY",
        "Require the LIBRARY before executing your script") do |lib|
  options.library << lib
        end

# Optional argument; multi-line description.
opts.on("-i", "--inplace [EXTENSION]",
        "Edit ARGV files in place",
        "  (make backup if EXTENSION supplied)") do |ext|
  options.inplace = true
  options.extension = ext || ''
  options.extension.sub!(/A.?(?=.)/, ".")  # Ensure extension begins with dot.
        end

For more details your can see this article and refer the Ruby rdoc.

Benefit of OptionParser is: we don’t need to use array index to retrieve options and we can write help text along with option definition.

Disadvantage of OptionParser is: since different commands need using the same option parser, you cannot define different option parsers for different commands. To solve this problem, you can resort to Thor.

Using Thor

As you know Thor is a replacement of Rake. Let’s see how we use Thor to refactor our command-line tool.

require 'rubygems'
require 'thor'

class ThorExample < Thor
  desc "start", "start server"
  method_option :environment,:default => "development", :aliases => "-e",
:desc => "which environment you want server run."
  method_option :daemon, :type => :boolean, :default => false, :aliases => "-d",
:desc => "running on daemon mode?"
  def start
    puts "start #{options.inspect}"
  end

  desc "stop" ,"stop server"
  method_option :delay,  :default => 0, :aliases => "-w",
:desc => "wait server finish it's job"
  def stop
    puts "stop"
  end
end

ThorExample.start
  • desc defines command name and long description.
  • method_option defines option parser for this command.
  • ThorExample.start is a method to start parse argument.

Execute it without argument, the output is:

Tasks:
  thor_example help [TASK]  # Describe available tasks or one specific task
  thor_example start        # start server
  thor_example stop         # stop server

Execute it with argument help start, you’ll get help text for command start:

Usage:
  thor_example start

Options:
  -e, [--environment=ENVIRONMENT]  # which environment you want server run.
                                   # Default: development
  -d, [--daemon]                   # running on daemon mode?

start server

As you can see, it’s very clean and easy to write.

For a more detailed usage, you can visit Thor github page and its rdoc.

Summary

Of course there are more ways to write a command-line tool. Choose what best fits your need and not the most powerful or latest one.

All the sample code is on github https://github.com/allenwei/ruby_command_line_sample.

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Do also read this awesome Guest Post:

Technorati Tags: , , ,

Being Awesome with the MongoDB Ruby Driver

Being Awesome with the MongoDB Ruby Driver

This guest post is by Ethan Gunderson, who is a software developer living in Chicago. By day he is a developer at Obtiva, where he helps clients deliver projects and be more awesome. By night, he is part of the gathers.us team, a co-organizer of ChicagoDB, and contributes when he can to the MongoDB community. You can find him at ethangunderson.com, or on Twitter as @ethangunderson.

Ethan Gunderson MongoDB is fast becoming one of the more popular and widely used NoSQL databases, and rightfully so. Its flexible key/value store, powerful query language and sexy scaling options is enough to piqué any developers interest. While most Ruby developers may jump right into the warm embrace of the Active Record replacements Mongoid and MongoMapper, that often robs developers of a valuable learning experience.

The MongoDB Ruby driver is not only simple to use, but it will get you familiar with how queries look and how they operate. Armed with this knowledge, moving into an ORM becomes much easier. You’ll not only be able to understand what is abstracted away, but you’ll be able to spot bad and inefficient generated queries, making performance troubleshooting a snap. To help you hit the ground running, we’ll be building up some of common queries you would find in a common blog. Let’s get started!

Installation

Since the driver is just a gem, installation is simple:

(sudo) gem install mongo

There is one more piece to install, bson_ext. Essentially, this is a collection of C extensions used to increase the speed of serialization. While this is optional, I recommend that you install it.

(sudo) gem install bson_ext

Now that we have everything installed that we need, lets hop into some code and see what we can do.

Getting Started

First things first, we need a database connection. For the sake of simplicity, we’ll be using localhost with the default port.

require 'mongo'
include Mongo

db = Connection.new.db('ruby-learning')

The next thing we’ll need is a place to store all of our posts. Let’s go ahead and get a post collection started.

posts = db.collection('posts')

That’s it! If you notice, there are two things we *didn’t* do that are kind of cool: we didn’t create the database or the collection. In fact, neither still exist at the database level, and won’t until we insert some data.

Inserting & Updating Documents

Let’s get our blog rolling with a high quality post.

new_post = { :title => "RubyLearning.com, its awesome", :content => "This is a pretty sweet way to learn Ruby", :created_on => Time.now }
post_id = posts.insert(new_post)

So, what did we just do? MongoDB stores its data as key/value pairs, which maps nicely to Ruby’s Hash. After creating a hash with our data, we inserted it into the posts collection, and in return, we received the ObjectId for the post from MongoDB. Pretty simple, right? It’s just as simple to update that document as well.

post = Posts.find( :_id => post_id ).first
post[:author] = "Ethan Gunderson"
posts.update( { :_id => post_id }, post )

Using the ObjectId we got back from our insert query, we find that same document again. After changing the data as we see fit, we issue an update query. An update query takes two arguments, the first one is conditions used to find the document (just the ObjectId in our case), and the second is the data.

While this works, it’s kind of silly if you think about it. We query the database for our document, change a small amount of information, and insert the entire document again. There’s gotta be a better way! Luckily, there is. MongoDB has the concept of Query Operators. One of these operators is $set, which allows you to, as I’m sure you can guess, set the value of an attribute.

posts.update( { :_id => post_id }, '$set' => { :author => 'Ethan Gunderson' } )

Here, we supply our find conditions, similarly to our previous update, but instead of supplying the entire document, we just set the values we wish to change. Now, instead of having to issue two queries against the database, we can accomplish the same task in one query, and less code to boot.

Now let’s take care of the post index page next.

posts = Posts.find

If you run this, you’ll probably notice a problem. Most of the time, blogs list their posts in descending order. Let’s change our query to account for this.

posts.find.sort( [['_id', -1]] )

This query has a couple of interesting points. Firstly, note that we are sorting on id. Since MongoDB’s ObjectId’s contain a timestamp, we can accurately sort based on that. This effectively removes the need for a created_at timestamp as well! Secondly, the sort parameter must always take an array of array, even if there is only one field you are sorting on.

So, that wasn’t so hard, but pretty naïve. What happens when we have 1,000 posts? We don’t want to torture our visitors with a ridiculous page load time, so let’s trim that back.

posts.find.sort( [['_id', -1]] ).limit(5)

Again, this was pretty simple, and by now you should be noticing a pattern. Building up relatively complex queries is just a matter of chaining methods together. To further this example, here’s a query showing a theoretical pagination query.

posts.find.sort( [['_id', -1]] ).limit(5).skip(5)

Tags

Another common element to blogs is the concept of tags. To accomplish this in our example, we’ll be adding an array of tags to our blog post.

post = Posts.find( :_id => post_id ).first
post[:tags] = ['mongo', 'nosql', 'awesome']
posts.update( { :_id => post_id }, post )

Now lets find a post based on a specific tag.

posts.find( :tags => 'mongo' )

It really doesn’t get more simple than that, folks. Now that the basic implementation is out-of-the-way, how do we find posts that match more than one tag? To accomplish this, we’ll be using another Query Operator called $all. As you can imagine, the $all operator specifies that selected documents contain all the elements in the supplied array.

posts.find( :tags => { '$all' => ['mongo', 'awesome'] } )

To round out our tags feature, let’s build a query that will list all the unique tags in our system. There are a couple of ways to skin this particular cat, since we don’t need to do any aggregation, and it needs to be performant, we’ll be using distinct. Though, if we needed to also produce a count of tag occurrences, Map/Reduce may be a better option.

posts.distinct('tags')

Indexing

Now that our blog is starting to grow in complexity, we’ll need to start thinking about adding proper indexes. If you notice in our tags implementation, we’re now querying on an attribute that is not indexed. Let’s fix that.

posts.create_index('tags')

And there we have it. In a relatively short amount of time, we’ve built up a lot of the common queries you would see in a standard blog. While I’ve only touched the surface of what you can accomplish with the MongoDB Ruby driver, I hope that I’ve shown you it’s power. I’ve included some more learning material and references below to continue your learning. Of course, if you have any questions, feel free to ask questions and give feedback in the comments section of this post. Thanks!

References

Technorati Tags: , , ,

How do I keep multiple Ruby projects separate?

How do I keep multiple Ruby projects separate?

This guest post is by Steve Klabnik, who is a software craftsman, writer, and former startup CTO. Steve tries to keep his Ruby consulting hours down so that he can focus on maintaining Hackety Hack and being a core member of Team Shoes, as well as writing regularly for multiple blogs.

Steve Klabnik If you’re anything like me, you’re already starting a new project immediately after wrapping up the last one. There just aren’t enough hours in the day to code up all the crazy ideas I have floating around in my head. Often, these ideas are the result of checking out some fun new gem, GitHub project, or even a different Ruby. Real quickly, a problem develops: what happens when these projects interfere with one another? What if I want to use Ruby 1.8.7 for an older project, Ruby 1.8.5 for a legacy application, Ruby 1.9.2 for the latest and greatest, and JRuby to use an interesting Ruby library? Luckily, there are a few things that you can do to isolate your different projects from one another, and some settings for that will make them quite painless to use. There are three main things that can go wrong when you try to use different sets of tools on a per-project basis: conflicts between Ruby versions, conflicts between gems, and forgetting which tools you use on which project.

Ruby Version Conflicts

This is the biggest and most painful kind of problem. If you want to use Ruby 1.8 for one project and Ruby 1.9 for another, you have a problem. If you’re using Linux, for example, your package manager may see that both ruby18 and ruby19 fulfill a ‘ruby’ dependency, and so it won’t let you have them both installed side by side. The solution isn’t pretty: install different Rubies from source. This gets ugly really quickly, because it’s easy to forget where you’ve compiled different Rubies, and having software outside of your package manager isn’t a great answer. If you’re on OS X or Windows, you skip right past the package manager problem and straight to the source ‘solution.’ This is no good!

Luckily, there’s an awesome project by Wayne E. Seguin named rvm. rvm is sort of like a package manager for Ruby. If you’d like to install both Ruby 1.8.7 and 1.9.2, just type this in:

$ rvm install 1.8.7
$ rvm install 1.9.2

It’ll go fetch the Ruby source code, compile it, and get you all set up. To use a specific Ruby, you can type ‘use’:

$ rvm use 1.8.7
$ ruby -v
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin10.4.0]
$ rvm use 1.9.2
$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]

Neat! You can even get other Ruby versions:

$ rvm install jruby
$ rvm install rbx
$ rvm install macruby

You can see a full list of these with ‘rvm list known’. For a full list of everything that rvm can do, as well as installation instructions, visit the rvm website.

Gem Conflicts

Once you’ve gotten your Rubies straight, you can still have conflicts between different gems that your project needs. One project uses Rails 2.3.8, another uses Rails 3… It gets worse when you have certain gems installed only as a dependency, and you don’t know exactly which one is correct:

$ gem list | grep net-ssh
net-ssh (2.0.23, 2.0.4, 1.1.4)

rvm has a neat feature called ‘gemsets.’ They let you create separate sets of gems per Ruby you have installed. This allows you to isolate each application, giving it its own set of gems. Check it out:

$ gem list

*** LOCAL GEMS ***

aasm (2.1.5)
abstract (1.0.0)
acl9 (0.12.0)
*snip*

$ rvm gemset create new-gemset
$ rvm use 1.9.2@new-gemset
$ gem list

*** LOCAL GEMS ***

$

Cool stuff! As you can see, use an ‘@’ symbol to tell rvm which gemset you’d like to use. Now we’ve isolated each project’s gems from each other. There is, however, a much more complicated kind of conflicts that can occur between gems. This happens when two gems have interlocking dependencies.

Here’s an example of this from the past: ActionPack 2.3.5 requires Rack =1.0.0, which is the newest version. Unicorn requires Rack >1.0.0. Rack releases a new version, 1.1.0. Now, when starting up a Rails application, the unicorn gem is loaded first, so it loads the newest version of the gem that works, which is rack-1.1.0. Then rails loads, and it loads actionpack, which tries to load rack. It needs =1.0.1, but sees that 1.1.0 has already been loaded, and throws this ugly, ugly error:

Gem::LoadError: can't activate rack (~> 1.0.0, runtime)
for ["actionpack-2.3.5"], already activated rack-1.1.0
for ["unicorn"]

There’s a set of versions here that works, but the way that the gems are loaded means that it doesn’t. The problem is that at the time that unicorn loads, it can’t possibly know that you’re planning on loading a different version of rack somewhere down the line. What we really need is a tool that knows about all of our dependencies, and can calculate the graph of all of our requirements, and figure out which versions of everything we need, and then only place those versions on the $LOAD_PATH. Luckily, such a project exists: bundler.

To use bundler, you first need to make a file named ‘Gemfile’ in the root of your project directory. This file looks something like this:

source "http://rubygems.org"

gem "rails", "~>3.0.0"

group :development do
  gem 'sqlite3-ruby', :require => 'sqlite3'
end

group :production do
  gem "pg"
end

The first line tells Bundler where to look for gems. The second line says that we want to use the ‘rails’ gem, and we want any version that’s at least 3.0.0 but less than 3.1.0. Finally, the other lines show ‘groups’ of gems: in development, we want to use sqlite3-ruby, and we need to require it via the name ‘sqlite3′, but we want to use Postgres in production. To install these gems, just:

$ bundle install

Bundler gets all the information that it needs on all the gems, figures out what versions of everything work together, and then installs the right versions. It then creates a Gemfile.lock file that holds all of this information. It’s just a simple YAML file, you can open it up and see the specifics. You’ll want to add the Gemfile and Gemfile.lock into your version control, so that anyone else that’s developing with you can also get the same gem versions.

To use the gems in your bundle, just use these two lines:

require "rubygems"
require "bundler/setup"

From there, whenever you require a gem, it’ll be the version from the bundle. If you want Bundler to automatically require all of your gems for you, just ‘Bundler.require‘ and it’ll require the default group of gems.

Rails 3 automatically comes with a Gemfile and bundler support right out of the box. If you want to use Bundler with Rails 2.3, check out the Bundler site for setup instructions.

The combination of gemsets and Bundler will make sure that you don’t have any nasty gem conflicts. Gemsets keep your projects isolated from each other, and Bundler keeps your gems’ versions from interfering with each other. The two work really well together.

I can’t remember which tool I used!

All of these rubies and gemsets can get confusing. Luckily, rvm has an awesome feature to take care of this, too: .rvmrc files. If you put a file named ‘.rvmrc’ in your project’s root directory, when you enter the project, it’ll switch your Ruby version (and gemset) automatically. It’s really easy to use, too. Just put the command you’d use to switch in the file. For example, in the Hackety Hack website project, I have the following .rvmrc:

rvm 1.8.7@hackety-hack.com

Astute readers will notice that I left off the ‘use,’ rvm defaults to ‘use’ if you don’t give it a different command. Check it out:

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.4.0]
$ cd hackety-hack.com
$ ruby -v
ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-darwin10.4.0]

Super cool. Now you’ll never forget which Ruby you were using, and you don’t even need to switch manually. This is one of the first things that I do when I start a new project in Ruby: Pick a Ruby version, make a gemset with the same name as the project, and set up an .rvmrc. It’s saved me hours of time and headaches.

Multiple projects: super simple

rvm is a fantastic tool to help solve your multiple-ruby woes. It really does make using multiple kinds of Ruby really, really easy. And Bundler makes sure that your gems play nice togther. It’s a great time to be a Rubyist.

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , , ,

Getting started with Heroku

Getting started with Heroku

This guest post is by Ben Scofield, who is Heroku’s developer advocate, responsible for listening to the tens of thousands of developers deploying their Ruby applications to the cloud. He’s spoken at many conferences around the world, and in 2010 became the co-chair for RailsConf.

Introduction

Ben Scofield Heroku has been in the news a lot lately, and it’s been a popular choice for Ruby application developers for a few years.If you haven’t worked with it before, here’s your chance — it’s designed to be as painless as possible to get going, and to give a powerful, stable, and scalable platform for your code.

Setting up

If this is your first time working with Heroku, you’ll need to start by setting up an account. Visit https://api.heroku.com/signup and enter your email address. You’ll soon get an email to confirm your account; click on the confirmation link and choose a password, and you’re registered!

Next, you’ll want to create an app (or find an existing one you want to push). Heroku supports any Rack-based Ruby web framework — so you can use Rails, Sinatra, Camping, Ramaze, or pretty much anything you want. Let’s say you’re going to build a new Rails application:

$ rails new myapp
$ cd myapp

After you’ve chosen (or created) your app, you’ll need to make sure it’s tracked in git:

$ git init
$ git add .
$ git commit -m "initial commit"

Once you’ve got your app ready to go, you’ll want to install the heroku gem. As you’ll see, it’s a powerful tool for managing your apps from the command line.

$ gem install heroku

And now, from your application route, run:

$ heroku create

If this is your first time using the heroku CLI, it’ll prompt you for your username and password — on subsequent uses, it’ll pull your username and API key for accessing Heroku from ~/.heroku/credentials, but that doesn’t exist until you’ve logged in through the CLI. It will also upload your public SSH key, and finally it’ll create your new application on Heroku and add a git remote.

If you want to specify the name of your app (and thus the subdomain on Heroku), you can pass an argument:

$ heroku create myapp # created myapp.heroku.com

Finally, to push your code to heroku, push it as you would to any git remote:

$ git push heroku master
$ heroku rake db:migrate # you'll need to do this for any schema change

You’ll see feedback on the process, but by the end your code should be up and running on Heroku’s platform!

Of course, there’s a lot more to working with Heroku than just that, so here’s a little more information.

CLI

The heroku gem gives you a lot more than just ‘heroku create‘, though. It provides a full CLI for working with your application. Here’s an incomplete list of what you can do with it:

rake

You can run any rake task you like by prefacing it with ‘heroku rake‘:

$ heroku rake db:migrate
$ heroku rake routes

(Note that heroku doesn’t run your migrations by default — when you change your schema, you’ll need to run ‘heroku rake db:migrate‘ to update your production database.)

Resources

You can change your resource allocation from the command-line, too. ‘heroku dynos 5‘ sets your application to 5 dynos; you can do the same with workers. As you’ll see below, this extends to add-ons, as well.

config

Many capistrano-deployed projects have sensitive configuration information (database.yml, etc.) in a shared folder on the server. When a new version of the code is deployed, those files get symlinked into the app. On Heroku, that’s not possible. Instead, the best practice is to use config variables.

$ heroku config # lists all configuration variables
$ heroku config:add NAME=VALUE # set a new variable
$ heroku config:remove NAME # remove an existing variable

There’s also a ‘heroku config:clear‘ command, but it’s dangerous — it clears out all your environment variables, which includes those set by Heroku itself. If you do that, then there’s a very good chance you’ll lose information that you might not know (e.g., your DATABASE_URL).

add-ons

Heroku allows third-party developers to create add-ons for your application, providing for both infrastructure features (exception tracking through Exceptional and Hoptoad and business features (email delivery through SendGrid, subscription billing through Recurly, etc.) As the owner of an application, you can manage your add-ons from the command line:

$ heroku addons # lists addons
$ heroku addons:add newrelic:bronze # app monitoring for free? count me in!
$ heroku addons:remove piggyback_ssl

plugins

Add-ons extend your application’s functionality; plugins extend the heroku gem itself. You can see available plugins at Herocutter, but some of our favorites are:

And more

The heroku CLI provides even more functionality; take a look through the documentation or its own help (‘heroku‘) to see more.

Common problems

Heroku imposes some constraints on your application; some of these stem from architectural decisions and are pretty much unavoidable, while others come from less fundamental decisions and can be worked around.

Filesystem access

Heroku’s architecture means that you can never be certain that your application will be running in the same space for two separate requests — two different dynos might serve later requests, and a single dyno might be moved from one EC2 instance to another for a variety of reasons. Because of that, Heroku doesn’t allow you access to the filesystem; it just doesn’t make sense.

To solve this, you should use an external service to host content that you might want to serve up. Filesystem page caching, for instance, can be replaced by properly using Heroku’s HTTP caching layer. Uploaded assets should be saved to S3 or a similar service.

Process timeouts

Heroku has some opinions about acceptable HTTP behavior, and timeouts are a result of that. If you have a request that runs for more than 30 seconds, the platform will automatically kill it. For many apps, this might include hitting remote services (like the Twitter API) or doing file processing (with Paperclip or a similar tool).

The solution is to move those long-running processes into a background worker. You can read more about this in the documentation.

Idling

Heroku is fantastic for experimentation, which leads to a predictable conclusion: a lot of abandoned applications on the platform. In order to keep them from chewing up an inordinate amount of resources, the platform treats single-dyno applications a little differently (with the assumption that an experiment is likely to be running on the free plan): if the app hasn’t been hit in a certain amount of time, it gets idled (or spun down). Then, on the next attempt to access it, the single dyno is unidled.

The effect of this is very similar to what you might do on your local development box. When you’re working on an app, you fire up a local app server (with rails server or something like that); when you stop working on it, you shut down the server. Then, when you next want to hit the app, you have to spend a few seconds starting the server again.

Postgres migration

Every application on Heroku gets its own database — by default, it’s a 5MB shared Postgres db, though you can pay to get larger (or dedicated) instances. This can cause problems, since the majority of Rubyists seem to use MySQL in development, and Postgres and MySQL aren’t always same in how they treat SQL and display messages. You can see some of the most common issues (and their solutions) in the Heroku documentation.

Troubleshooting

Every app runs into problems in production — and sometimes an exception tracker (like Exceptional or Hoptoad) don’t give you all the information you need to fix it. On a VPS or dedicated server, you might be accustomed to SSHing in and popping into an interactive console, digging through logs, or something similar. With Heroku, that isn’t an option — but we do have some alternatives, provided in the heroku gem.

heroku console

You might not be able to run rails server or irb on the server yourself, but ‘heroku console‘ gives you an interactive shell for your application. Once in the shell, you’re interacting directly with your production instance, so be as careful as you’d normally be when futzing with production data.

There are a few things to be aware of with this console. First, it runs over HTTP — every command you enter is pushed up to Heroku as an HTTP request, so it’s subject to the same restrictions as your web app. For instance, any process you start that runs longer than 30 seconds will be killed. Also, requests from your console session tie up a dyno, so if you’re running on a single dyno then your web app isn’t available to serve regular requests while you’re updating your database.

The other important thing to note is that each line you send is a separate HTTP request. This means that you can’t write multi-line code in the heroku console. Say you’re trying to do this:

User.all.each do |user|
  puts user.email
end

When you hit enter after the first line, the console sends ‘>User.all.each do |user| to the server, which isn’t a complete expression. Before you can start typing the next line, then, the system sends back an error. You can still run this code, but you have to rewrite it to be on a single line:

User.all.each {|user| puts user.email}

heroku ps

Sysadmins live and die by process lists, so Heroku provides a tool to see what processes you have available and what state they’re in. If you run heroku ps for an active application, you’ll see something like the following:

    UPID     Slug          Command                     State       Since
    -------  ------------  --------------------------  ----------  ---------
    xxxxxxx  xxxxxxxxxxxx  dj                          up          16h ago
    xxxxxxx  xxxxxxxxxxxx  cron                        idle        43m ago
    xxxxxxx  xxxxxxxxxxxx  dyno                        up          16h ago
    xxxxxxx  xxxxxxxxxxxx  dyno                        up          16h ago

This is especially useful when combined with the Unix watch command (if you’re on OS X, you may have to install it manually), which reruns the command periodically so you can see how things are changing in real-time.

heroku logs

And finally, the logs. Anyone who’s built a web app knows just how important logs are, so Heroku provides a set of tools to help review (and in some cases analyze) them.

To use Heroku’s logging, you have to install both a plugin and an add-on:

$ heroku plugins:install http://github.com/heroku/heroku-logging.git
$ heroku addons:add logging

Once that’s done, anything your app pushes to STDOUT or STDERR is captured in your logs — if you’re using Rails, you should make sure to redirect your logger to STDOUT by adding this line to your config (application.rb or environment.rb, depending on what version of Rails you’re running):

config.action_controller.logger = Logger.new(STDOUT)

The ‘heroku logs‘ command by itself will show you the last 20 lines of your log, looking something like this:

    2010-12-10T15:13:46-07:00 app[web.1]: Completed in 74ms (View: 31, DB: 40) | 200 OK [http://myapp.heroku.com/]
    2010-12-10T15:13:46-07:00 heroku[router]: GET myapp.heroku.com/posts queue=0 wait=0ms service=1ms bytes=975
    2010-12-10T15:13:47-07:00 app[worker.1]

You can filter the logs by source (-s) and process (-p), you can tail them in real-time with -t, and you can ask for a specific number of lines with -n. Perhaps most powerfully, you can also add syslog drains for your logs, pushing syslog packets to another server for long-term storage or analysis:

$ heroku logs:drains add syslog://your.syslog.host

Where to go for help

This is just the tip of the iceberg, really — there’s a lot you can do with Heroku, and spending time digging into the platform is very worthwhile. Take a look at our documentation, and talk to other developers in our Google group and on IRC.

Feel free to ask questions and give feedback in the comments section of this post. Thanks and Good Luck!

Technorati Tags: , ,

Ruby gems — what, why and how

Ruby gems — what, why and how

This guest post is by Gonçalo Silva, who is a full-time Ruby on Rails developer at escolinhas.pt and has participated in the Ruby Summer of Code 2010. He loves and contributes to many open-source projects, being a fan of Linux, Ruby and Android. He likes to call himself a hacker, but that’s just an excuse for being in front of the computer all the time. Oh, and he tweets at @goncalossilva.

What is a gem

Gonçalo Silva At its most basic form, a Ruby gem is a package. It has the necessary files and information for being installed on the system. Quoting RubyGems: «A gem is a packaged Ruby application or library. It has a name (e.g. rake) and a version (e.g. 0.4.16)».

Being very powerful, gems are of great importance in the Rubyland. They can easily be used to extend or change functionality within Ruby applications.

Structure

Every gem is different, but most follow a basic structure:

gem/
|-- lib/
|   |-- gem.rb
|-- test/
|-- README
|-- Rakefile
|-- gem.gemspec

Your gem’s code is located under lib/ which typically holds a Ruby file with the name of the gem. You can choose to have all the magic happening in this file, but you can also use it to load some other Ruby files also located under lib/, typically inside a folder with the gem’s name. Confused? Have a look:

your_gem/
|-- lib/
|   |-- your_gem.rb
|   |-- your_gem/
|   |   |-- source1.rb
|   |   |-- source2.rb
|-- ...

The test folder’s name is not necessarily named test/. When you’re working with RSpec, for instance, its name is usually spec/. As you’ve probably guessed, this folder holds tests for your gem.

After the README file, which hopefully doesn’t need any introduction, comes the Rakefile. In a gem’s context, the Rakefile is extremely useful. It can hold various tasks to help building, testing and debugging your gem, among all other things that you might find useful.

The gemspec—as the name implies—contains your gem’s specification by defining several attributes. An example gemspec file could be:

Gem::Specification.new do |s|
  s.name              = "gem"
  s.version           = "0.0.1"
  s.platform          = Gem::Platform::RUBY
  s.authors           = ["Gonçalo Silva"]
  s.email             = ["goncalossilva@gmail.com"]
  s.homepage          = "http://github.com/goncalossilva/gem_template"
  s.summary           = "Sample gem"
  s.description       = "A gem template"
  s.rubyforge_project = s.name

  s.required_rubygems_version = ">= 1.3.6"

  # If you have runtime dependencies, add them here
  # s.add_runtime_dependency "other", "~> 1.2"

  # If you have development dependencies, add them here
  # s.add_development_dependency "another", "= 0.9"

  # The list of files to be contained in the gem
  s.files         = `git ls-files`.split("\n")
  # s.executables   = `git ls-files`.split("\n").map{|f| f =~ /^bin\/(.*)/ ? $1 : nil}.compact
  # s.extensions    = `git ls-files ext/extconf.rb`.split("\n")

  s.require_path = 'lib'

  # For C extensions
  # s.extensions = "ext/extconf.rb"
end

Some attributes like the name, version, platform and summary are required others are optional. If you use git with your project, you can use the nifty trick shown above to list the project’s files, executables and extensions. If you don’t, you can simply fall back to using pure Ruby code like:

s.files = Dir["{lib}/**/*.rb", "{lib}/**/*.rake", "{lib}/**/*.yml", "LICENSE", "*.md"]

The dummy gem is very simple. Because of this, it perfectly illustrates some of the ideas explained above. Some interesting bits are shown below:

dummy/
|-- lib/
|   |-- dummy/
|   |   |-- core_ext/
|   |   |   |-- array.rb
|   |   |   |-- string.rb
|   |   |-- address.rb
|   |   |-- company.rb
|   |   |-- ...
|   |-- dummy.rb

This gem is organized into several source files inside lib/. The dummy.rb implements the top-level module and loads all functionality from the Ruby files inside lib/dummy/. It also includes some core extensions, namely to the Array and String classes (which are part of Ruby’s core).

RubyGems

Finally, RubyGems. It is a package manager which became part of the standard library in Ruby 1.9. It allows developers to search, install and build gems, among other features. All of this is done by using the gem command-line utility. You can find its website at rubygems.org.

Why is this useful

Gems are very useful for not reinventing the wheel and avoiding duplication. That’s basically it. Many Ruby developers create and publish awesome gems which address specific requirements, solve specific problems or add specific functionality. Anyone who comes across similar requirements or problems can use them and eventually improve them. That’s the joint awesomeness of Ruby’s strong open-source foundation and extreme flexibility. Anyway, you’re reading this article… so you’ve probably understood the concept and grasped its usefulness long before reading this paragraph.

How to make your own

Making your own gem is nothing more than packaging your library or application according to the structure stated above. Put all your code under lib/, all your tests under test/ or spec/, your gem specification under your_gem.gemspec and you’re good to go. Of course, a few other files might come in handy, namely a Rakefile, a README and a LICENSE. A CHANGELOG, sometimes, might be useful as well.

Ruby Idioms

When developing a gem, you are probably creating, extending or overriding functionality. You might want people to include your module in their classes, or perhaps you just want to extend a given class with your module—it’s your choice. What you shouldn’t really do, however, is reinventing Ruby’s module system. There is an excellent blog post on this which can help if you—like many gem authors I’ve seen—start overriding include to behave like extend. It’s very important to understand the difference between the two and, fortunately, there are great resources about this out there.

Developing with Bundler

Using Bundler to manage your gem’s dependencies is also pretty easy. Just create a Gemfile and add:

gemspec

After this, fire Bundler:

bundle install

And yes, you got it right. After adding gemspec to your Gemfile, Bundler can scan your gemspec, find your runtime and development dependencies and install them for you.

While not being mandatory, I strongly recommend you to consider using Bundler to manage your gem’s dependencies. If used correctly, it can probably be a time saver.

Testing

When it comes to testing, you’ve got plenty of good options. Some people rely on test-unit (or minitest in 1.9), others prefer RSpec. It’s really up to you. The only bad choice you can possibly make is opting to not testing your gems at all.

Once again, I’m going to use dummy’s simplicity to explain this a bit further. All tests were built on test-unit and are organized as follows:

dummy/
|-- test/
|   |-- address_test.rb
|   |-- company_test.rb
|   |-- ...
|   |-- test_helper.rb

As you’ve seen, tests are structured similarly to dummy itself. The test_helper is in charge of loading the necessary libraries and setting up any variables or methods used across most (if not all) tests. All tests are organized into files which target specific functionality in dummy. The tests contained in address_test.rb run against address.rb and so on.

Publishing

After everything is coded and tested, all you got left to do is packaging and publishing. The previously mentioned gem utility makes it all very simple. Just run gem build your_gem.gemspec and you should see something along these lines:

Successfully built RubyGem
Name: your_gem
Version: 0.0.1
File: your_gem-0.0.1.gem

Pushing your gem to RubyGems is as easy as it is to build it. Just gem push your_gem-0.0.1.gem and soon it’ll be published. Be aware that the first time you issue this command you’ll be prompted to login with a RubyGems.org account.

Concerning this, I like keeping these simple tasks in my Rakefile:

desc "Validate the gemspec"
task :gemspec do
  gemspec.validate
end

desc "Build gem locally"
task :build => :gemspec do
  system "gem build #{gemspec.name}.gemspec"
  FileUtils.mkdir_p "pkg"
  FileUtils.mv "#{gemspec.name}-#{gemspec.version}.gem", "pkg"
end

desc "Install gem locally"
task :install => :build do
  system "gem install pkg/#{gemspec.name}-#{gemspec.version}"
end

These help me build and install my gems. They also aid at keeping all packages in the pkg/ folder, which is useful for keeping the root directory clean and tidy.

Gems for building gems

There are a few gems which were specifically created to help developers build their own gems. Among them are the renowned jeweler, hoe and echoe. I can’t go into detail in any of these since I’ve never really used them – I started building my gem skeleton from scratch right at the beginning. However, some of these tools are very helpful so you should really take a look and see if any fits your needs.

Gem template

As I mentioned, I’ve been using a gem skeleton for some time now, which you can find at GitHub. Every gem I’ve built started with that template, which I kept trying to improve over time.

You can start your gems from scratch, but that’s just nonsense. You should create your own skeleton, use one made by someone else or use a third-party gem to help creating your gem.

Legen—wait for it—dary

Ruby gems are filled with awesomeness. Hop in and start making your own!

Feel free to ask questions and give feedback in the comments section of this post. Thanks and Good Luck!

Technorati Tags: , ,

My Ruby Regrets

My Ruby Regrets

This guest post is by Jeff Langr, who has developed software for thirty years, mastering many other languages (including Smalltalk, C++, Java, and currently C#), but just not Ruby and Python… yet. (Ever?) He owns the consulting and training company Langr Software Solutions, and codes full-time as an employee of GeoLearning. Jeff is the author of close to a hundred articles on software development and three books, including Agile Java and the very-soon-to-be-published Pragmatic Programmers “book,” Agile in a Flash.

Jeff Langr Ruby guru? Hardly. Even though I first “learned” Ruby about nine years ago, perpetual Ruby newbie is a far more correct term. In those nine years, I’ve coded here and there on a number of throwaway scripts. For each separate effort, I would get past novice struggles to the point where I felt reasonably comfortable with the language. But just as I started to enjoy high levels of productivity, the job was done and it was back to “enterprisey” Java coding. Months later, sometimes more than a dozen, I’d work another Ruby script. The cycle would start again at a slightly higher proficiency level than the last time I started. My feeble brain would struggle to recall any remnants of my Ruby memory. It’s impossible to become a guru this way!

I’m once again working on a side effort in Ruby, pairing with a few other developers to build a testing framework. It’s intended for inbetweeners (my newly coined term for QA people who are willing to get a little technical) to easily put together Watir-based tests. As usual, it’s a small effort, a few days at most. As usual, we (me and my pair partners) decided to just “hack at it.” Never mind adhering to good OO design concepts, and never mind test-driving the code. Why?

  • it’s a small effort that should stay small over the long haul.
  • we’re (re)learning Ruby, which means we’re experimenting in irb and pasting over code that appears to work.
  • it’s a test tool itself. Do we really need to write tests for a test tool?

I’d also paired to develop a couple comparably scoped (that is, small) Python scripts in the earlier few months. For similar reasons, we eschewed test-driven development (TDD) and good OO design on those efforts too.

Fortunately, our constant companion Humility is one of the best teachers. It doesn’t take long to generate legacy code (code without tests) to the point of regret. In the case of my three most recent scripting efforts, the point of regret was somewhere after a half day of code cranking. What happened? Looking more closely at our three arguments for hacking it out, we can easily find flaws.

We quickly slammed out a couple hundred lines of Ruby code. Small, yes, but we quickly found ourselves often unsure about what code was where, and we knew that we had a good amount of unnecessary duplication. But our only verification mechanism was to manually run the test framework against the handful of scripts that it drove–a cycle that took about 90 seconds. We couldn’t make the rapid, ten-second changes that we wanted to–no one wants to wait more than a minute to verify every two-line code change.

As far as the learning Ruby part, we know that there’s a Ruby way to express code, but our novice Ruby brains tend to code it a little more familiarly first. Without unit tests, we’ve been a lot slower to transform the ugly procedural-isms into tight Ruby-way constructs. Unit tests are great for letting you slap together a method’s implementation, and then safely play with improving its expressiveness.

From the design standpoint, it’s a little more work to create proper classes, but inevitably we found that the lack of good design just made for crappier code and more duplication. We also found that mixins, while very beneficial, can create some interesting challenges and confusion if you’re not careful with them. Moving to a more OO solution simplified our codebase and gave us more flexibility.

We quickly wished we had built more tests. As we got more frustrated with dumb mistakes that took a while to pin down, we started building some TAD (test-after development) tests. At least these tests let us put a stake in the ground, but it’s unlikely that we’ll have the time to go back and completely cover the code. Had we started with TDD, we would have avoided getting bogged down in the defects. We also would have been able to keep the code base small, and exhibiting minimal duplication and confusion.

I didn’t do things the right way the past three times I’ve built small scripts. But next time I code in Ruby–no doubt enough months away that I’ll have forgotten much of what I re-learned–I hope my feeble brain won’t have forgotten, once again, this important lesson. Don’t wait to test-drive, and don’t hack just because you can.

Feel free to ask questions and give feedback in the comments section of this post. Thanks and Good Luck!

Technorati Tags: ,

How do I build DSLs with yield and instance_eval?

How do I build DSLs with yield and instance_eval?

This guest post is by Michael Bleigh, a Rubyist developing web applications and more for Intridea from his hometown of Kansas City. He is a prolific member of the open-source and Ruby communities, releasing such projects as OmniAuth and Hashie. In addition, he has presented at many Ruby events including RubyConf 2010, RailsConf 2009/2010 and more. While he spends much of his time writing Ruby code, he also enjoys graphic design and user experience work.

Michael Bleigh Ruby provides some fantastic built-in features for creating Domain Specific Languages (DSLs). A Domain Specific Language is, for our purposes today, like a miniature specialized programming language within a programming language. It is a way to expose functionality in a simple, readable format for other programmers (or yourself) to use. One of the most commonly used DSLs in the Ruby world is Sinatra:

require 'rubygems'
require 'sinatra'

get '/hello' do
  "Hello world."
end

Sinatra is a Domain Specific Language for building web applications. Its syntax is built based on the HTTP verbs such as GET, POST, and PUT. By exposing functionality in this way, the code is much more readable than using a more complex, programmatic API such as something like this:

app = NoDSL::Application.new

app.on_request(:get, :path_info => '/hello') do |response|
  response.body = "Hello world."
end

This is far less readable than Sinatra’s code, but in many programming languages this would be a perfectly acceptable design for a library. However, because Ruby has powerful facilities for metaprogramming and first-class functions, it is not only common practice but essentially expected for libraries to provide clean, readable APIs and leverage DSLs when necessary to do so.

Yield to Oncoming Code

The yield statement is a very important concept to understand when building a Ruby DSL. The functionality provided by yield allows a developer to pass off control temporarily to allow for configuration or advanced functionality. Yielding is a pattern that completely pervades the Ruby language, including the Ruby standard library (the functionality included with the language itself). If you’ve ever used the Array#map (or Array#collect) functionality, that’s one example of a yield pattern. An example use to increment all the items in an array would look like this:

[1, 2, 3].map{|i| i + 1} # => [2, 3, 4]

So how would we re-implement the map functionality if it weren’t provided for us? It’s actually quite simple using the yield statement:

class Array
  def my_map
    result = []
    self.each do |item|
      result << yield(item)
    end
    result
  end
end

[1, 2, 3].my_map{|i| i + 1} # => [2, 3, 4]

The yield statement essentially stops the evaluation of the method and evaluates the block passed into the method, calling it with any arguments supplied in the yield statement itself. So if I had a method that simply yielded its argument, it would look like this:

def parrot(argument)
  yield argument
end

parrot("Polly want a cracker.") do |argument|
  puts argument
end

# Output: "Polly want a cracker."

Using yield for DSLs

Now, using yield, we have the facilities to build a simple DSL. Let’s say we want to create a Domain Specific Language for describing kitchen recipes. We want to be able to add ingredients as well as steps, then print out the result. Our basic class would look something like this:

class Recipe
  attr_accessor :name, :ingredients, :instructions

  def initialize(name)
    self.name = name
    self.ingredients = []
    self.instructions = []
  end

  def to_s
    output = name
    output << "\n#{'=' * name.size}\n\n"
    output << "Ingredients: #{ingredients.join(', ')}\n\n"

    instructions.each_with_index do |instruction, index|
      output << "#{index + 1}) #{instruction}\n"
    end

    output
  end
end

Now we can build a recipe:

mac_and_cheese = Recipe.new("Mac and Cheese")

mac_and_cheese.ingredients << "Noodles"
mac_and_cheese.ingredients << "Water"
mac_and_cheese.ingredients << "Cheese"

mac_and_cheese.instructions << "Boil water."
mac_and_cheese.instructions << "Add noodles, boil for six minutes."
mac_and_cheese.instructions << "Drain water."
mac_and_cheese.instructions << "Mix in cheese with noodles."

The output of ‘puts mac_and_cheese’ will look like this:

Mac and Cheese
==============

Ingredients: Noodles, Water, Cheese

1) Heat water to boiling.
2) Add noodles, boil for six minutes.
3) Drain water.
4) Mix in cheese with noodles.

While this works, the code doesn’t seem to be very elegant at all! We need a way to make it look more like you would see on a recipe card. Let’s add some functionality using yield. First, we’ll rewrite the initializer to use yield:

def initialize(name)
  self.name = name
  self.ingredients = []
  self.instructions = []

  yield self
end

Upon initialization, the Recipe class will now yield itself, meaning that the caller can call modify it within a block context. Next, we need to add some friendly methods for adding ingredients and instructions to the class:

def ingredient(name, options = {})
  ingredient = name
  ingredient << " (#{options[:amount]})" if options[:amount]

  ingredients << ingredient
end

def step(text, options = {})
  instruction = text
  instruction << " (#{options[:for]})" if options[:for]

  instructions << instruction
end

This lets us create a recipe in a much more natural way:

mac_and_cheese = Recipe.new("Mac and Cheese") do |r|
  r.ingredient "Water", :amount => "2 cups"
  r.ingredient "Noodles", :amount => "1 cup"
  r.ingredient "Cheese", :amount => "1/2 cup"

  r.step "Heat water to boiling.", :for => "5 minutes"
  r.step "Add noodles to boiling water.", :for => "6 minutes"
  r.step "Drain water."
  r.step "Mix cheese in with noodles."
end

Once again, if we run ‘puts mac_and_cheese’ we can see the results of our handiwork:

Mac and Cheese
==============

Ingredients: Water (2 cups), Noodles (1 cup), Cheese (1/2 cup)

1) Heat water to boiling. (5 minutes)
2) Add noodles to boiling water. (6 minutes)
3) Drain water.
4) Mix cheese in with noodles.

Great! Not only do we have more functionality (allowing the user to specify amounts of ingredients and durations for instructions), but this looks a lot closer to something you might see on a recipe card.

Using yield is a great way to provide a simple configuration DSL and it takes almost no extra effort. However, to really take a DSL to the next level, you may be interested in utilizing another piece of the Ruby language called instance_eval.

Kicking It Up A Notch With instance_eval

While almost all programming languages give an eval function for evaluating a provided string as though it were source code, Ruby’s powerful blocks allow you to do this in a much cleaner and more readable fashion in some specific cases. For our purposes today, we’ll be using instance_eval. The instance_eval method takes either a string or a block and evaluates the passed block in the context of the object calling instance_eval. You can do this with any object in Ruby, even a String:

"Hello.".instance_eval{ size } # => 6

This provides a distinct advantage, in some ways, over yield by actually changing the evaluation context so that there’s no need to specify the object in question for each statement (e.g. r.ingredient). You can see an instance_eval based DSL in action if you’ve used the Rails 3 Router. However, the Rails 2.3 router was based on yield (thus map.resources instead of just resources).

Caveat Eval

While instance_eval may be a good option (and even the correct one) for a specific DSL you are working on, it is not an universally useful tool. Because instance_eval changes the evaluation context, you will lose access to methods on the calling context (because self changes) as well as expose private methods of the evaluating object that you may not have intended to be accessible. Remember that whenever you use instance_eval, the code passed in is treated as though it were being written into a method body of the object. A simple example of this:

def me
  "Michael Bleigh"
end

class YieldDSL
  attr_accessor :name
  def initialize
    yield self
  end
end

class EvalDSL
  attr_accessor :name
  def initialize(&block)
    instance_eval &block
  end
end

YieldDSL.new do |d|
  d.name = me
end
# => #<YieldDSL:0x101771bc0 @name="Michael">

EvalDSL.new do
  self.name = me
end
# EXCEPTION: NoMethodError

So it is wise to be careful when providing an instance_eval based DSL, as it may not always be more beneficial for the user. A simpler syntax comes at the cost of changing evaluation context.

Building Recipes with instance_eval

In our case for building Recipes, however, there isn’t danger in switching context. We’re mostly passing in strings and it’s unlikely that any complex context is going to be associated. So let’s upgrade it! All we need to do is redefine the initializer once more:

def initialize(name, &block)
  self.name = name
  self.ingredients = []
  self.instructions = []

  instance_eval &block
end

Ruby has a convention that the last argument passed to a method is a block that can be captured in the method by using an ampersand (&) character with a variable name. In this way, we have direct access to the block (whereas before with yield we were making an implicit call to the block). You can also use the built-in block_given? method to check whether or not a block was passed into the method you’re currently evaluating. This should be done instead of checking for block.nil? or similar.

So what can we do with our fancy new instance_eval DSL? We can define a recipe with an even prettier syntax!

mac_and_cheese = Recipe.new("Mac and Cheese") do
  ingredient "Water", :amount => "2 cups"
  ingredient "Noodles", :amount => "1 cup"
  ingredient "Cheese", :amount => "1/2 cup"

  step "Heat water to boiling.", :for => "5 minutes"
  step "Add noodles to boiling water.", :for => "6 minutes"
  step "Drain water."
  step "Mix cheese in with noodles."
end

And if we run ‘puts mac_and_cheese’, we get the same results as before.

Finishing Up

So now you should have some basic idea of how to build DSLs in Ruby using yield and instance_eval. The ability to expose functionality in a concise, easily-readable way is a very useful weapon for your programming arsenal. Before we wrap, let’s take a look at a couple more things:

Having AND Eating Cake

There’s no reason that yield and instance_eval DSLs need to be mutually exclusive. Far from it! In Ruby we encourage options, and it’s actually quite easy to provide a way to yield OR instance_eval based on the block passed in:

def initialize(&block)
  if block_given?
    if block.arity == 1
      yield self
    else
      instance_eval &block
    end
  end
end

What this snippet does is check the arity (number of arguments) of the block that’s passed in. If it’s one (meaning that they’re asking for something to be passed to the block) then we use the yield DSL strategy. Otherwise, we use the instance_eval strategy. That wasn’t so hard, was it?

Advanced DSLs with Treetop

The DSLs covered in this article so far have been internal DSLs, that is, DSLs that are executed inside the context of Ruby code. However, it is also possible to build external DSLs that do not have to contain any Ruby code at all! For example, Cucumber, the integration testing framework, is an external Natural Language DSL. Rather than being wrapped in Ruby idioms, it actually defines its own language that is executed entirely outside the context of the Ruby programming language.

The most popular library for building Natural Language DSLs in Ruby is Treetop, which lets you create grammars upon which a new Domain Specific Language can be crafted. It’s a highly interesting library with some amazing facilities, so be sure to check it out!

Conclusion

I hope that this introduction to yield and instance_eval has shown you just how easy it can be to build Domain Specific Languages in Ruby. The next time you find yourself repeatedly building the same kind of objects over and over, you might consider making a DSL to streamline the process as well as improving readability.

Feel free to ask questions and give feedback in the comments section of this post. Thanks and Good Luck!

Technorati Tags: , ,