Rails Envy Podcast – Episode #071: 03/18/2009

Episode 71. TMI Edition.

Subscribe via iTunes – iTunes only link.
Download the podcast ~20 mins MP3.
Subscribe to feed via RSS by copying the link to your RSS Reader


Sponsored by Hashrocket
The Rails Envy podcast is brought to you this week by Hashrocket. Hashrocket is an expert consultancy group that uses best-of-breed technologies like Ruby on Rails to deliver the highest quality software in the least amount of time.

Sponsored by New Relic
The Rails Envy podcast is also brought to you this week by NewRelic. NewRelic provides RPM which is a plugin for rails that allows you to monitor and quickly diagnose problems with your Rails application in real time. They also recently produced Rails Lab, which gives you expert advice on tuning and optimizing your Rails app.

RailsConf Speaker Interview: Michael Bleigh

Next up in my series of RailsConf speaker interviews is Michael Bleigh, who is Creative Director and Open-Source Activist at Intridea in Washington D.C.

Michael will be presenting Twitter on Rails at RailsConf this year, highlighting a small but growing trend of applications using Twitter as a communication platform.

You work on Present.ly and you’re
giving a talk on Twitter apps at RailsConf. Micro-update services is
obviously a topic that has you excited. Why is that?

I wasn’t a super-early adopter of Twitter; I honestly think it’s
something that takes some time to “get.” Then one day I woke up and
realized that without even trying I was getting all kinds of news and
information that I might not even have heard otherwise. The beauty of
Twitter (and Present.ly for teams and organizations) is that it’s a
fast, passive medium: you don’t have to make an effort to keep up to
date, it just sort of happens. It works with two-way communication,
too; on our team if I have a quick syntax or sanity check question
about my code, I post it up on Present.ly and get five responses
within three minutes. Micro-updates excite me because I feel like I’m
getting smarter just by glancing at growl notifications for a few
seconds every couple of minutes while I’m working. It’s effortless.

What are the coolest Twitter applications you’ve seen?

It’s hard to pick out just a few; from a client perspective I’ve been
using EventBox lately. It’s integrated
Twitter search feeds are really useful for keeping an eye on all of
Intridea’s brands (as well as my plugins and open-source projects).
While I don’t think anyone has nailed it perfectly yet, aggregators
like TweetMeme are interesting in their
attempts to bubble up content from the noise. Honestly, I don’t think
there are just a couple cool apps; I think that Twitter’s true
strength is this really energetic ecosystem around creating cool stuff
with the API. That’s what my talk is all about: lowering the barriers
to people making cool stuff with Twitter. With the hundreds of apps
currently available I still think we’ve only scratched the surface of
the utility that Twitter-based applications can provide.

What does it mean to be “Open Source Activist” for a company?
Should other companies fill this role?

As the “Open Source Activist” I’m basically just trying to push people
to package up what they’re doing and share it with the community,
whether it’s through a Ruby gem, a plugin, or just a blog post. My
colleagues are making cool stuff all of the time, and sharing that
cool stuff with the community is absolutely beneficial for the
company, the individual, and everyone who finds it useful. We’ve had
clients come to us and when they’re vetting our work it’ll be “Oh
yeah, I’ve used that plugin! You guys wrote that?”

I think that every company needs someone who is pushing for that
community involvement. Pushing for us to blog more often and release
more open source doesn’t just yield intangible reputation benefits:
I’ve learned really cool techniques from other people at Intridea that
they used for a project on which I’m not currently working. I’m always
excited to foster that kind of sharing environment because the rewards
are just great. I could go on for hours and I strongly encourage that
every company working with Ruby or Rails try to release at least a few
open-source libraries and write some blog posts to share your
knowledge and expertise with the world at large.

We really believe in putting our money where our mouth is on the
open-source front. When we wrote mobile applications for Present.ly on
five different platforms it was a no-brainer to me to release them
all as open-source
. Open source, open
APIs and open communication foster innovation in amazing ways, and
anything I can do to make that happen more often I will.

Why is Twitter Search special and/or interesting?

Twitter Search is special because it’s intrinsically a different beast
than Google and other search engines. It’s not indexing information,
it’s indexing conversation, and it does it in real time. If I’m
trying to find information about something that happened in the last
24 hours (like who was that old woman on Lost?), I don’t use Google
anymore; I turn to Twitter because it will have up-to-the-second
information that just isn’t available anywhere else. Obviously it
doesn’t replace the usefulness of other search engines, but it opens
up whole new channels of information discovery. Honestly search and
Twitter are such a natural and amazing fit that it’s surprising that
it took a third party (Summize) to come along and realize that
potential. I really look forward to where they (and third parties) are
going to take the technology in the next year or two.

As I understand it, Intridea started as a consulting group but has
quickly developed a set of products that have gotten some positive
press. How do you balance consulting and product development? Do they
support and feed off of each other?

The balance actually works really well. We have a few people full time
on products and then we rotate in services people either part-time or
full-time depending on our client engagements at the time. I think
it’s been great for both sides of the company to have both services
and products: we can try really bleeding edge and experimental things
on the products and then bring that knowledge to the client work we
do. Likewise, everyone at Intridea on the services side has an almost
inhuman work ethic and an ability to juggle several projects at once
so when they come on to the products they can pick it up fast and get
things done. Consulting has also allowed us to try a number of
products without ever having to accept outside funding.

A number of our products also grew out of needs that clients would
have again and again. For instance Scalr is a great
piece of software (don’t ask me about how it works, it’s over my
head!) that gives us the ability to provide amazing scalable cloud
hosting to our clients. MediaPlug
similarly provides an easy infrastructure that saves our clients time
and money. We scratch our own itches with our products and that
usually means that there are benefits for the consulting side of our
business with every product we make. I’m honored and privileged to
work with so many brilliant people and be allowed to pursue so many
amazing ideas.

Rails Envy Podcast – Episode #071: 03/18/2009

Episode 71. TMI Edition.

Subscribe via iTunes – iTunes only link.
Download the podcast ~20 mins MP3.
Subscribe to feed via RSS by copying the link to your RSS Reader


Sponsored by Hashrocket
The Rails Envy podcast is brought to you this week by Hashrocket. Hashrocket is an expert consultancy group that uses best-of-breed technologies like Ruby on Rails to deliver the highest quality software in the least amount of time.

Sponsored by New Relic
The Rails Envy podcast is also brought to you this week by NewRelic. NewRelic provides RPM which is a plugin for rails that allows you to monitor and quickly diagnose problems with your Rails application in real time. They also recently produced Rails Lab, which gives you expert advice on tuning and optimizing your Rails app.

Push and Pull Databases To and From Heroku

A frequent question people ask us is “how do I transfer my database between my local workstation and my Heroku app?”

This is an important question for several reasons. First, you always own your data on Heroku, and we want you to be able to get to it quickly and easily at any time. Also – as you may have noticed from previous posts – we’re obsessive about workflow. Whether you’re debugging an issue with production data or setting up a staging environment, being able to quickly pull/push data between environments is key to a smooth experience.

Previously, we offered yaml_db as a solution. We liked that it was simple and database agnostic, but parsing large YAML files is just too slow. We also wanted something that works with any framework compatible with our Rack-based platform. Ricardo, Blake and Adam came up with Taps, which was released last month as a standalone project. Having collected some quality feedback from the community, we’re now pleased to announce that Taps is officially baked into Heroku, allowing seamless and easy database transfer between Heroku apps and any external environment.

To try it out, install the latest Heroku gem. Then use the “db:pull” command to pull your database down from your Heroku app to your local workstation:

$ heroku db:pull
Receiving schema
Receiving data
8 tables, 591 records
users:         100% |================================| Time: 00:00:00
pages:         100% |================================| Time: 00:00:00
comments:      100% |================================| Time: 00:00:00
tags:          100% |================================| Time: 00:00:00
Receiving indexes
Resetting sequences

This loads the schema, data, indexes and sequences of the remote Heroku database down into the local database specified in config/database.yml. You can also specify the destination database using standard URI-syntax:

$ heroku db:pull mysql://root:mypass@localhost/mydb

Because Taps uses ActiveRecord (for schema) and Sequel (for data), it seamlessly transfers between different database vendors. In fact, if you don’t feel like running a local database server, just use SQLite:

$ heroku db:pull sqlite://path/to/my.db

Of course, the syntax for pushing your local database up to Heroku is equally simple:

$ heroku db:push
Sending schema
Sending data
users:         100% |================================| Time: 00:00:00
pages:         100% |================================| Time: 00:00:00
comments:      100% |================================| Time: 00:00:00
tags:          100% |================================| Time: 00:00:00
Sending indexes
Resetting sequences

That’s Taps in a nutshell. It’s live right now, so check it out and let us know how you like it. Full docs are available here.

Confreaks: MountainWest RubyConf 2009 – videos available

The confreaks guy processed many of the videos and they start to appear on the MWRC section of their site:
http://mwrc2009.confreaks.com/

Check it out, lot of good stuff. Thanks again to the MWRC organizers.

Confreaks: MountainWest RubyConf 2009 – videos available

The confreaks guy processed many of the videos and they start to appear on the MWRC section of their site:
http://mwrc2009.confreaks.com/

Check it out, lot of good stuff. Thanks again to the MWRC organizers.

RailsConf Speaker Interview: Obie Fernandez

Continuing my series of RailsConf speaker interviews, next up is Obie Fernandez

Obie started and runs the Rails consultancy Hashrocket and is the author of the best selling The Rails Way.

At RailsConf, Obie will be presenting a Blood, Sweat, and Rails, which if it’s anything like his talk at last year’s Rails Summit Latin America will no doubt be educational, thought provoking and entertaining. It might even include some Def Leppard music if we’re lucky.

I’ve had dreams of starting my own business and doing my own thing. I’ve learned over time that fear is the biggest thing getting in my way. Is that something you encountered to? How do you overcome it?

I started at least half-a-dozen real businesses between the ages of 15
and 21. Those experiences taught me that I didn’t know enough, either
about business or just in terms of plain ole life experience to
successfully run a business. So I waited, and waited, almost 15 years
until trying again in earnest. All the while wishing I was my own
boss, but not feeling ready to make the leap. Fear of failure was a
big part of that, as well as many other fears, like not being able to
pay my child support obligations.

As for overcoming the fear, I’m guessing that everyone is going to
have different tipping points. One commonality though, is probably to
give yourself distance from the culture of fear that permeates our
world. I did away with my “normal” television watching habits a long
time ago—nowadays our television goes weeks without being turned on
—most commercials prey on pervasive fear culture of our society.
Fear of death, fear of getting sick, fear of accidents, of not being
successful enough. Get rid of it!

If you really want to succeed, you have to distance yourself from
whiners and low-achievers too. We all have toxic people in our lives,
you gotta put space between you and them if you want to overcome the
fear and negativity that they breed.

My tipping point to success was building a solid reputation online via
my blog and getting so good at what I was doing that I didn’t have to
particularly worry about going back to fulltime employment if I
happened to fall flat on my face as an independent.

Is there a place for fixed bid projects in the world of Rails consulting? What would you say are the pros and cons? I’ve had very bad experience with fixed bid projects in large companies. I’m wondering if there’s a way to do it right.

I think fixed-bid contracting for custom web applications is a
horrible idea, overall. I slept on this question for a few days and
dug back in my memory over the last dozen years of consulting: I’ve
never heard of anyone being happy with the outcome of a fixed-bid
project.

The fundamental problem with fixed-bid is the fluid and living nature
of all but the tiniest single-purpose webapps. Once you start
development, there are going to be changes necessary. A lot of those
will feel “easy” and/or “logical” as if they should have been included
in the requirements to begin with. Pressure will be on to include
those changes in the original budget, especially if you’re focused on
customer service and keeping that client happy. Doubly so if the
client is on a tight budget and simply doesn’t have more money to pay.
Your hands are tied and odds are you’re going to suffer the
consequences.

I don’t think there’s a “right” way to do fixed-bid. You’d have to
have a really awesome and understanding client and implement a very
rigorous change-control policy that I suspect would encumber the whole
development process to a large and unenjoyable degree.

If possible, invest the time in helping your client to understand the
nature of variable scope and why it’s in their best interest.

(Note: Obie recently covered this topic on his own weblog here)

Hashrocket currently focuses on Rails projects. Do you see any technologies coming up that you might focus some or all of the team on?

No, not right now. I’m very focused on being the best of the best in
one particular niche: large-scale, custom web application development.
And I don’t think that there’s any technology that even comes close to
what you get with Rails for that. There’s always lots of cool and
exciting new things going on, the latest craze seems to be iPhone
development. We have a number of people interested in that, but I will
not take the company in that direction. It would be detrimental to
lose focus from what we do best.

What if I’m a developer who has no dreams of becoming an entrepreneur…what does it buy me to understand sales, marketing, managing client relationships and what-not?

Unless you’re independently wealthy, when it comes to landing a good
job and keeping it, you’re always going to have to hustle to promote
yourself. Lessons in sales and marketing apply whether you’re selling
a product, a company’s services or your own services. What are you
doing when you prepare a resume? When you go on an interview? Selling!
Get good at it, or suffer the crappy jobs and work environments that
you will get otherwise.

As for client relationships, if someone is paying you, they’re your
client. Doesn’t matter if you answer to another company or to a
manager. The skills of maintaining a healthy relationship and knowing
how to properly set expectations apply to everyone.

Agile processes are obviously a cornerstone of your company. How do you draw the line between passion and dogma?

That’s an interesting distinction. Passion is undeniably necessary for
success. Without passion, without the ability to inspire others, via
your words and/or actions, you’re not going to get very far in life. I
believe that Agile software development “just is” and I’m very
passionate about that. It just is the way that you do quality
software. Not doing quality software? That’s fine, but I won’t work
with you. Nowadays Agile is far enough along in mindshare that I don’t
feel I have to sell it very much or get anywhere near dogmatic about
it. As in everything, I try to keep an open mind and if a better
philosophy evolves, maybe it’s “Lean Sofware” or maybe it’s something
else, I’ll go with the flow. Until then, I stick to my principles.
Passionately.

What’s got you excited these days outside of your programming and entrepreneurial life?

I guess I’d have to say travel, more than anything else. Last year I
was blessed to be able to take my kids on a few good vacations,
including a fancy three-week trip around the world with stops in South
Africa and Far East. This year it’s looking like I’m going to be
spending a lot of time in Europe and South America, so I’m definitely
excited about that. It’s a big world out there. Get out and see it
while you still can!

Rails 2.3

The Rails 2.3.2 gem is now installed and available for use on Heroku. To learn more about what’s new and improved, check the official Rails blog post.

Enjoy!

Flex on Rails position at Rosetta Stone in Boulder.

At the RMAUG I met Kadri, the Director of Speech Technologies from Rosetta Stone, and he mentioned they where looking from a Flex/Rails developer.

You can see the job posting here.

Unfortunately I didn’t have more time to speak with Kadri so I can not tell you much about the position or the company but If you are in Boulder and fit the profile why not just contact them.

Enjoy!

Daniel.

HTTParty goes foreign

Just a quick post to get share something I was tinkering with this evening.

I came across this post by Gerald Bauer, which shows you how to use the Google Translation API with Ruby via Net::HTTP. I thought I’d play with the service with HTTParty.

class GoogleApi
  include HTTParty
  base_uri 'ajax.googleapis.com'

  def self.translate(string="", to="", from="en")
    get("/ajax/services/language/translate", :query => {:langpair => "#{from}|#{to}", :q => string, :v => 1.0})
  end
end

A few examples from playing with it.

>> GoogleApi.translate('bonjour', 'en', 'fr')
=> "{\"responseData\": {\"translatedText\":\"hello\"}, \"responseDetails\": null, \"responseStatus\": 200}"

>> GoogleApi.translate('Red wine', 'fr')
=> "{\"responseData\": {\"translatedText\":\"Vin rouge\",\"detectedSourceLanguage\":\"en\"}, \"responseDetails\": null, \"responseStatus\": 200}"

>> GoogleApi.translate('Where is the bathroom?', 'es')
=> "{\"responseData\": {\"translatedText\":\"\302\277D\303\263nde est\303\241 el ba\303\261o?\",\"detectedSourceLanguage\":\"en\"}, \"responseDetails\": null, \"responseStatus\": 200}"

>> GoogleApi.translate('Good morning', 'it')
=> "{\"responseData\": {\"translatedText\":\"Buon giorno\",\"detectedSourceLanguage\":\"en\"}, \"responseDetails\": null, \"responseStatus\": 200}"

What a party!

>> GoogleApi.translate('party', 'it')
=> "{\"responseData\": {\"translatedText\":\"festa\",\"detectedSourceLanguage\":\"en\"}, \"responseDetails\": null, \"responseStatus\": 200}"
>> GoogleApi.translate('party', 'es')
=> "{\"responseData\": {\"translatedText\":\"fiesta\",\"detectedSourceLanguage\":\"en\"}, \"responseDetails\": null, \"responseStatus\": 200}"

Look how easy that was. 🙂

For a previous post on using this gem, read The HTTParty has just begun.

Flex on Rails position at Rosetta Stone in Boulder.

At the RMAUG I met Kadri, the Director of Speech Technologies from Rosetta Stone, and he mentioned they where looking from a Flex/Rails developer.

You can see the job posting here.

Unfortunately I didn’t have more time to speak with Kadri so I can not tell you much about the position or the company but If you are in Boulder and fit the profile why not just contact them.

Enjoy!

Daniel.

55 minute video: Flex on Rails presentation at RMAUG on March 10th

This is the edited version of the talk Tony Hillerson and Daniel Wanja gave at the Rocky Mountain Adobe User Group on March 10th. There is an echo the first two minutes, then the sounds stabilize. We also had a software issue on Tony’s notebook and display port issue on Daniel’s, parts which I edited out. The demo gods where not with us that night :-), but we had fun and I hope we answered many of the questions the attendance had. Also we wanted to tailor the talk on Ruby on Rails rather than on Flex and be an open format, but we had more questions than anticipated and presented only 10% of the material we had.


Flex on Rails presentation at RMAUG on March 10th from daniel wanja on Vimeo.
Note: during the first 2 minutes there is an echo. Sound skips from time to time throughout the video.

The talk was recorded by RMAUG and posted on their blog, and can be viewed using Connect here.

Thanks again to everyone who attended!

Daniel.

#153 PDFs with Prawn

Prawn is an excellent Ruby library for generating PDF documents. Learn how to use it along with the Prawnto plugin in this episode.

#153 PDFs with Prawn

Prawn is an excellent Ruby library for generating PDF documents. Learn how to use it along with the Prawnto plugin in this episode.

Storing Your Files

        This is the second article in my series on file management, the third article will cover the challenges of handling uploads then we should be able to move on to some more advanced topics.


The second problem you’ll face when building an application to handle files is where and how to store them.  Thankfully there are lots of well-supported options, each with their own pros and cons.


<h2>The local file system</h2>


If your application only runs on a single server, the simplest option is to store them on the local disk of your web/application server.  This leaves you with very few moving parts, and you know that both your rails application and your webserver can see the same files, at the same location.  But even though this is a simple option there are a few things that you need to be careful of.


A common mistake I see is to <!--more--> a single directory to handle all of the users’ uploaded files.  So your directory structure ends up looking something like this:


<pre><code>/home/railsway/uploads/koz_avatar.png

/home/railsway/uploads/dhh_avatar.png
/home/railsway/uploads/other_avatar.png

The first, and most obvious, problem with this structure is that unless you’re careful you could end up with users overwriting each other’s files.  The second, and more painful problem is that you end up with <a href="http://www.google.com/search?hl=en&amp;q=directory+too+many+files&amp;btnG=Search">too many files in a single directory</a> which will cause you some pain when you try to do things like list the directory or start removing old files.


The best bet is to store the uploads in a directory which corresponds to the ID of the object which owns those files.  But something like the following will also leave you with a huge directory:


<pre><code>/home/railsway/uploads/1/koz_avatar.png

/home/railsway/uploads/2/dhh_avatar.png
/home/railsway/uploads/3/other_avatar.png

The best bet is to partition that directory into a number of sub directories like this:


<pre><code>/home/railsway/uploads/000/000/001/koz_avatar.png

/home/railsway/uploads/000/000/002/dhh_avatar.png
/home/railsway/uploads/000/000/003/other_avatar.png

Thankfully both of the popular file management plugins have built in support for partitioned storage <a href="http://giantrobots.thoughtbot.com/2008/3/18/for-attaching-files-use-paperclip#comment--614066248">:id_partition in paper clip</a> and <a href="http://github.com/technoweenie/attachment_fu/blob/ab1e4f7b0b9de85e0c9decf061d2ef5c1dc0feaa/README#L47">:partition in attachment_fu</a>.


<h2><span class="caps">NFS</span>, GFS and friends</h2>


Once you’ve grown beyond a single app / web server, using the file-system gets a little more complicated.  In order to ensure that all your app and web servers can see the same files you have to use a shared file system of some sort.  Setting up and running a shared file system is beyond the scope of this site, but a few words of caution.


It’s deceptively easy to set up a simple <span class="caps">NFS</span> server for your network and just run your application as you did when it was on a single disk, but some things which are cheap on local disk are slow and expensive over <span class="caps">NFS</span> and friends.  Make sure you stress test your file server and pay an expert to help you tune the system.  The bigger problem I’ve had with <span class="caps">NFS</span> and <span class="caps">GFS</span> is the impact of downtime or difficulties on your application.  Your <span class="caps">NFS</span> server becomes a single point of failure for your whole site, and a minor network glitch can render your application completely useless as all the processes get tied up waiting on a blocking read from an <span class="caps">NFS</span> mount that’s gone away.


You can solve all those kinds of problems by hiring a good sysadmin and / or spending a large amount of money on <a href="http://en.wikipedia.org/wiki/NetApp_Filer">serious</a> <a href="http://www.bluearc.com/html/products/titan-1100.shtml">storage hardware</a>.  It’s not a path that I personally choose, but it’s definitely an option you should consider.


<h2> Amazon S3</h2>


It’s not really possible to write about storage without touching on <a href="http://aws.amazon.com/s3/">Amazon S3</a>.  In case you’ve been living under a rock for a few years S3 is a <a href="http://aws.amazon.com/s3/#principles">hugely scalable</a>, <a href="http://aws.amazon.com/s3/#pricing">incredibly cheap</a> storage service.  There are <a href="http://amazon.rubyforge.org/">several</a> <a href="http://rightscale.rubyforge.org/right_aws_gem_doc/">good gems</a> to use with your applications and the major file management plugins provide semi-transparent S3 support.


S3 isn’t a file system so there are several things which you have to do differently, however there are alternatives for most of those operations.  For instance instead of using  <a href="http://www.therailsway.com/2009/2/22/file-downloads-done-right">X-Sendfile</a> to stream the files to your user, you redirect them to the signed url on amazon’s own service.  By way of example our download action from the earlier article would look like this if using S3 and <a href="http://amazon.rubyforge.org/">marcel’s s3 library</a> 


<pre><code>def download

redirect_to S3Object.url_for(‘download.zip’,
‘railswayexample’,
:expires_in => 3.hours)
end

But there are a few things you have to be careful with when using S3.  The first is that uploading to s3 is <strong>much</strong> slower than simply writing your file to local disk. Unless you want your rails processes to be tied up for ages, you’ll probably want to have a background job running which transfers the files from your server up to amazon’s.  Another factor is that when S3 errors occur your users will be greeted by a very ugly error page:


<pre><code></code></pre>


Finally there’s always the risk of amazon having another <a href="http://www.readwriteweb.com/archives/more_amazon_s3_downtime.php">bad day</a> which takes your application down for a few hours.  Amazon’s engineers are pretty amazing, but nothing’s perfect.


<h2>Other options</h2>


There are a few options I’ve not used before, but you could investigate:


<h3>BLOBs in your database</h3>


I’ve never been a fan of using BLOBs to store large files, however some people swear by them.  If you’re aware of great tutorial resources for BLOBs and rails, let me know and I’ll link to them from here.


<h3>Rackspace’s Cloud Files</h3>


When it was first announced <a href="http://www.mosso.com/cloudfiles.jsp">Cloud Files</a> from rackspace seemed like it was going to be a great competitor to S3.  However there’s currently no equivalent to S3’s signed-url authentication option which means downloads become <strong>much</strong> harder.  To use Cloud Files would require you to build a streaming proxy in your application, and use it to stream files from rackspace back out to the user.  You’d also have to pay for the bandwidth twice, once from rackspace, and once from your hosting provider.


This makes it <strong>much</strong> more complicated than S3 but hopefully this will be addressed in a future release.


<h3>MogileFS</h3>


<a href="http://www.danga.com/mogilefs/">MogileFS</a> is a really interesting option.  It has some similarities to S3 in that it’s a write-once file storage system which operates over <span class="caps">HTTP</span>.  But unlike S3 it’s open source software you can run on your own servers.  Unfortunately MogileFS is really thinly documented and quite difficult to get up and running.  If you know of a really good getting-started tutorial for MogileFS, let me know and I’ll link to it from here.


It also would require you to use perlbal for your load balancer or find an apache module that can support X-Reproxy-Url.


<h2> Conclusion</h2>


There are a bunch of different options you should consider when picking the storage for your file uploads.  Generally my advice would be to start with simple on-disk partitioned storage and grow from there.  Don’t rush straight to S3 because all the blogs tell you to, stay as simple as possible for as long you can.

Storing Your Files

This is the second article in my series on file management, the third article will cover the challenges of handling uploads then we should be able to move on to some more advanced topics.

The second problem you’ll face when building an application to handle files is where and how to store them. Thankfully there are lots of well-supported options, each with their own pros and cons.

The local file system

If your application only runs on a single server, the simplest option is to store them on the local disk of your web/application server. This leaves you with very few moving parts, and you know that both your rails application and your webserver can see the same files, at the same location. But even though this is a simple option there are a few things that you need to be careful of.

A common mistake I see is to use a single directory to handle all of the users’ uploaded files. So your directory structure ends up looking something like this:

/home/railsway/uploads/koz_avatar.png
/home/railsway/uploads/dhh_avatar.png
/home/railsway/uploads/other_avatar.png

The first, and most obvious, problem with this structure is that unless you’re careful you could end up with users overwriting each other’s files. The second, and more painful problem is that you end up with too many files in a single directory which will cause you some pain when you try to do things like list the directory or start removing old files.

The best bet is to store the uploads in a directory which corresponds to the ID of the object which owns those files. But something like the following will also leave you with a huge directory:

/home/railsway/uploads/1/koz_avatar.png
/home/railsway/uploads/2/dhh_avatar.png
/home/railsway/uploads/3/other_avatar.png

The best bet is to partition that directory into a number of sub directories like this:

/home/railsway/uploads/000/000/001/koz_avatar.png
/home/railsway/uploads/000/000/002/dhh_avatar.png
/home/railsway/uploads/000/000/003/other_avatar.png

Thankfully both of the popular file management plugins have built in support for partitioned storage :id_partition in paper clip and :partition in attachment_fu.

NFS, GFS and friends

Once you’ve grown beyond a single app / web server, using the file-system gets a little more complicated. In order to ensure that all your app and web servers can see the same files you have to use a shared file system of some sort. Setting up and running a shared file system is beyond the scope of this site, but a few words of caution.

It’s deceptively easy to set up a simple NFS server for your network and just run your application as you did when it was on a single disk, but some things which are cheap on local disk are slow and expensive over NFS and friends. Make sure you stress test your file server and pay an expert to help you tune the system. The bigger problem I’ve had with NFS and GFS is the impact of downtime or difficulties on your application. Your NFS server becomes a single point of failure for your whole site, and a minor network glitch can render your application completely useless as all the processes get tied up waiting on a blocking read from an NFS mount that’s gone away.

You can solve all those kinds of problems by hiring a good sysadmin and / or spending a large amount of money on serious storage hardware. It’s not a path that I personally choose, but it’s definitely an option you should consider.

Amazon S3

It’s not really possible to write about storage without touching on Amazon S3. In case you’ve been living under a rock for a few years S3 is a hugely scalable, incredibly cheap storage service. There are several good gems to use with your applications and the major file management plugins provide semi-transparent S3 support.

S3 isn’t a file system so there are several things which you have to do differently, however there are alternatives for most of those operations. For instance instead of using X-Sendfile to stream the files to your user, you redirect them to the signed url on amazon’s own service. By way of example our download action from the earlier article would look like this if using S3 and marcel’s s3 library

def download
  redirect_to S3Object.url_for('download.zip',
                               'railswayexample',
                               :expires_in => 3.hours)
end

But there are a few things you have to be careful with when using S3. The first is that uploading to s3 is much slower than simply writing your file to local disk. Unless you want your rails processes to be tied up for ages, you’ll probably want to have a background job running which transfers the files from your server up to amazon’s. Another factor is that when S3 errors occur your users will be greeted by a very ugly error page:

Finally there’s always the risk of amazon having another bad day which takes your application down for a few hours. Amazon’s engineers are pretty amazing, but nothing’s perfect.

Other options

There are a few options I’ve not used before, but you could investigate:

BLOBs in your database

I’ve never been a fan of using BLOBs to store large files, however some people swear by them. If you’re aware of great tutorial resources for BLOBs and rails, let me know and I’ll link to them from here.

Rackspace’s Cloud Files

When it was first announced Cloud Files from rackspace seemed like it was going to be a great competitor to S3. However there’s currently no equivalent to S3’s signed-url authentication option which means downloads become much harder. To use Cloud Files would require you to build a streaming proxy in your application, and use it to stream files from rackspace back out to the user. You’d also have to pay for the bandwidth twice, once from rackspace, and once from your hosting provider.

This makes it much more complicated than S3 but hopefully this will be addressed in a future release.

MogileFS

MogileFS is a really interesting option. It has some similarities to S3 in that it’s a write-once file storage system which operates over HTTP. But unlike S3 it’s open source software you can run on your own servers. Unfortunately MogileFS is really thinly documented and quite difficult to get up and running. If you know of a really good getting-started tutorial for MogileFS, let me know and I’ll link to it from here.

It also would require you to use perlbal for your load balancer or find an apache module that can support X-Reproxy-Url.

Conclusion

There are a bunch of different options you should consider when picking the storage for your file uploads. Generally my advice would be to start with simple on-disk partitioned storage and grow from there. Don’t rush straight to S3 because all the blogs tell you to, stay as simple as possible for as long you can.

Rails Envy Podcast – Episode #070: 03/13/2009

Episode 70. I was going to try and copy Gregg from last week but my dog can’t talk yet to do the introduction. Darn obedience classes. Obie Fernandez from Hashrocket co-hosts this week and we have a blast. I hope you all enjoy the show and am sorry for the delay. But it’s a good one, I promise!

Subscribe via iTunes – iTunes only link.
Download the podcast ~35 mins MP3.
Subscribe to feed via RSS by copying the link to your RSS Reader


Sponsored by Hashrocket
The Rails Envy podcast is brought to you this week by Hashrocket. Hashrocket is an expert consultancy group that uses best-of-breed technologies like Ruby on Rails to deliver the highest quality software in the least amount of time.

Sponsored by New Relic
The Rails Envy podcast is also brought to you this week by NewRelic. NewRelic provides RPM which is a plugin for rails that allows you to monitor and quickly diagnose problems with your Rails application in real time. They also recently produced Rails Lab, which gives you expert advice on tuning and optimizing your Rails app.

MWRC – Thanks you, that what awesome!

We are at the airport on the way back to Denver from the MountainWest RubyConf. This was the best conference I went to in years. Fast passed talk with just incredible material and every presenter was just excellent. So to the organizers and presenters….Thank you! I’ll be back!

This was the Official Meme:

Check out some of the comments from the last few hours or check all tweets regarding #mwrc.

MWRC – Thanks you, that what awesome!

We are at the airport on the way back to Denver from the MountainWest RubyConf. This was the best conference I went to in years. Fast passed talk with just incredible material and every presenter was just excellent. So to the organizers and presenters….Thank you! I’ll be back!

This was the Official Meme:

Check out some of the comments from the last few hours or check all tweets regarding #mwrc.

On my first tattoo


Some people have asked why I’ve got a Hashrocket tattoo on my calf. The reasons are pretty biographical; ‘ware ye the history contained herein. Credit for the photo goes to Travis Schmeisser.


Each Wednesday Hashrocket has a midweek get-together called Hashrocket Hot Hackers Hump Day Happy Hour (or 6H). It was a chilly January evening and there were almost a dozen rocketeers milling about at the local martini bar when Sal casually asked if I wanted to go get a Hashrocket tattoo with him.

Of course, inebriated as I was, there wasn’t much chance I was going to turn down the idea of getting a Hashrocket tattoo.

Yet, there was a time when I considered tattoos silly things that a person gets to show how edgy he or she is, or to indicate an extreme level of I-will-kick-your-ass. Now I’ve got one. So why choose to get a Hashrocket tattoo?

I’ve become quite enamored of Hashrocket since I arrived in Atlantic Beach at the end of March in 2008. I was still a recovering burn-out when I came down here, and somehow Hashrocket refilled my spiritual-coding cup. That sounds extreme, but it’s just the way things are.

I spent five years at a county-level government IT shop as a web programmer. I serviced sixteen department websites and also wrote an intranet from the ground up that served 1700 employees while I was there. My boss was lost on anything past FrontPage and for help I had only a string of limited-engagement, part-time assistants of varying levels of skill.

I burned out. Totally and completely. I threw away all of my computer gear and went back to auditing hotels overnight and contemplatively staring at the moon. After about a year, I had the realization that software is what I do, and no matter how burned I felt or how much I wished otherwise, it seemed that was the value I would provide to society.

I began the slow road back to development by working as a part-time webmaster at a non-profit. Then Tiger turned me on to Ruby on Rails, and things started to happen inside of me. A strange sensation that, after experiencing a few times I was able to place: happiness. I was happy writing ruby code. I was happy using the rails framework. Just typing out each line of code somehow made me feel good.

That’s how I came to be a Rails consultant in Madison, Wisconsin. My first paid site was completed in November, 2007, and I’ve never looked back.

When Tiger invited me to come down and see how Hashrocket does things in March of 2008, I was excited to see the magic sauce that had both he and Lark raving about the company. I didn’t expect to be offered a job, but I was, and it’s been the best thing that’s happened to me.

In many of the same ways that ruby and rails took away the pain of coding for me, Hashrocket has taken away the pain of work and replaced it with happiness. Pair programming has made me a more effective and efficient programmer. Communicating all the time has taken away all the bad conversations, because nothing has time to fester. Test-driven development gives me a level of confidence in my code that makes me unafraid to change even systems I haven’t looked at in ages. I feel encouraged to excel as an individual rockstar within the community, even on the Hashrocket clock. As Les Hill (in the photo, on the right) is wont to remind us in his blog posts, working at Hashrocket is like attending an ongoing seminar.

All of these reasons, from my discovery of Ruby on Rails through joining Hashrocket and even becoming a presenter at local user groups, this is why I have a Hashrocket tattoo on my calf. If I never have another experience that I want to commemorate with a tattoo, I’m glad I’ve had this one.