Happy Thanksgiving: 50% off ebooks, All Week.

Happy Thanksgiving 2013: 50% off ebooks, All Week.

Why Google’s Go Programming Language is the Future?

This guest post is by Adam Prattler who owns a post-me.org. He has many other such spaces around the USA. Now-a-days, he is consulting people affected by typhoon in the Philippines. He has a lot of pictorial stuff during his consultation.

Golang
Photo by Claire Thompson / Flickr
By now, any programmer worth their codes would’ve already stumbled onto Google’s rather new Go programming language. Go, or sometimes referred to as “golang” in search engines, is the brainchild of a bunch of brave visionaries over at Google.Google defined Go as a strong-typed, compiled language and built to showcase concurrency and strength in thread-safety support, allowing programmers to build concurrent applications effortlessly and quickly (most programmers  have tried Go with brilliant results – they said it makes them more productive and speedy).

Why Go is fast becoming a crowd favorite

It is said that to be a truly proficient programmer, you need to immerse yourself in all the languages that exist. Java, check. C++, duh! Python, right-o. So why not Google Go?

As long as a programmer is proficient in C++ or Java, chances are that you’ll be able to pick up Go fairly quickly. The same can’t be said for those who lack a background in C or Java, and want to shift from Python to Go.

The transition to Go isn’t hard, either. Even the most die-hard Python or Ruby fanboys found it easy to go with #golang (and this after trying JavaScript/Node, Scala, Conjure and Erlang – they loved how concurrency is a fundamental essence built deep into Go. This helped some early adopters push through even if it’s hard to do at the very beginning, what with the lack of community and success stories or case studies to lean on. There was an initial fear about whether companies who chose Go are able to hire suitable talents to work on their projects, but the fear was unfounded as these companies became inundated with applications from highly capable candidates (not a few were armed with a diploma of information technology in Australia) who were only too eager to work with Go.

There are brave programmers, and then there are truly COURAGEOUS ones who saw a challenge in using Go even though they were used to benefitting from an expansive framework (such as Ruby on Rails) where a lot of tasks are made easy and convenient. Many rose to the challenge of working with Go from the core up, and found that they love the total freedom and control as they are able to include features they prefer and even have Go format their code, leaving them with more time to design and implement the webapp to perfection.

The future sure looks bright for Go and its users. Perhaps one of the deciding factors that
persuaded programmers to switch to Go is the fact that it’s not primarily built for web applications, but it’s a whole new language that will help conceive applications in general.

What programmers are saying

Graham King, blogger and programmer at DarkCoding.net expressed his thoughts on the future of Google Go articulately, saying that this is indeed a modern language to watch out for. It is serious, “adult-like” and holds plenty of potential especially for those who are already proficient in Java and C++ as Go would help up their productivity level.

Darren Coxall at DarrenCoxall.com reported his progress of choosing Go over Ruby on Rails when working on a webapp that consists of an API and a JavaScript front-end. Many features on Ruby on Rails are not needed to create the API needed so he favored the no-frills Go over this versatile framework.

BONUS! Go crash course

Want to learn more about Go? Here are some links that will help you get started quickly

The final verdict

There are programmers who lean towards Python and are rather reluctant to ditch their favorite language in favor of Go, but most have expressed their admiration with #golang and opined that while they would still use Python before fully turning to Go as the latter is a “young” language with limited library availability. Even so, there’s plenty of room for improvement and plenty of time to grow, and if Google saw the benefit of creating their own programming language and saw it fit to follow through, then it pays to be early adopters and see Go grow from being raw to polished in the near future.

After all, it’s the fruit of Google’s labor.


(Powered by LaunchBit)

Reducing MySQL’s memory usage on OS X Mavericks

Recently, I found myself re-installing everything from Homebrew and began to notice that MySQL was consuming nearly half a gig of memory. Given that I don’t do too much with MySQL on a regular basis, I opted to override a handful of default configuration options to reduce the memory footprint.

As you can see, a fresh MySQL install via homebrew was consuming over 400mb of memory.


<img src="http://robbyonrails.com/files/mysqld-before.jpg" alt="" />


Here is how I reduced my memory footprint:


<code>$ mkdir -p /usr/local/etc</code>


Unless you already have a custom MySQL config file, you will want to add one into this directory.


<code>$ vim /usr/local/etc/my.cnf</code>


We&#8217;ll then paste in the following options into our file&#8230; and save it.
  # Robby's MySQL overrides
  [mysqld]
  max_connections       = 10

  key_buffer_size       = 16K
  max_allowed_packet    = 1M
  table_open_cache      = 4
  sort_buffer_size      = 64K
  read_buffer_size      = 256K
  read_rnd_buffer_size  = 256K
  net_buffer_length     = 2K
  thread_stack          = 128K
Finally, we&#8217;ll restart MySQL.


<code>$ <div class="post-limited-image"><img src="http://robbyonrails.com/files/mysqld-after.jpg" alt="" /></div>

Continue reading “Reducing MySQL’s memory usage on OS X Mavericks”

Reducing MySQL’s memory usage on OS X Mavericks

Recently, I found myself re-installing everything from Homebrew and began to notice that MySQL was consuming nearly half a gig of memory. Given that I don’t do too much with MySQL on a regular basis, I opted to override a handful of default configuration options to reduce the memory footprint.

As you can see, a fresh MySQL install via homebrew was consuming over 400mb of memory.

Here is how I reduced my memory footprint:

$ mkdir -p /usr/local/etc

Unless you already have a custom MySQL config file, you will want to add one into this directory.

$ vim /usr/local/etc/my.cnf

We’ll then paste in the following options into our file… and save it.

  # Robby's MySQL overrides
  [mysqld]
  max_connections       = 10

  key_buffer_size       = 16K
  max_allowed_packet    = 1M
  table_open_cache      = 4
  sort_buffer_size      = 64K
  read_buffer_size      = 256K
  read_rnd_buffer_size  = 256K
  net_buffer_length     = 2K
  thread_stack          = 128K

Finally, we’ll restart MySQL.

$ mysql.server stop

If you have MySQL setup in launchctl, it should restart automatically. After I did this, my MySQL instance was now closer to 80mb.

So far, this has worked out quite well for my local Ruby on Rails development. Mileage may vary…

Having said that, how much memory are you now saving?

Powering the Internet of Customers with Heroku1

Editor's Note: We are cross-posting this article from the Salesforce Blog. It shows how we are bringing Heroku to a new market and audience – Salesforce customers – using a new product and message. If you are a user of both Heroku and Salesforce and are interested in connecting them, check out Heroku1.


Apps are an essential part of the Internet of Customers. They are the dashboards to people’s lives. They allow your customers to be part of your business’ workflows, and for you to engage with them on an unprecedented level. Customer connected apps are the next phase of how companies are innovating and gaining competitive advantage.

Today, we are launching Heroku1, a complete service for building and scaling the next generation of customer connected apps.

Heroku1 is purpose-built for the Salesforce1 platform. It is a new, fully-loaded edition of Heroku’s Platform-as-a-Service designed specifically for building apps that are fully integrated with your Salesforce data. Because it leverages your existing investment in Salesforce.com, it’s now faster and easier than ever to build web, mobile, and connected-device apps that will delight your customers.

Heroku1 solves the three biggest problems that enterprises face in building connected applications:

  1. Connected data: Too many web and mobile applications are islands, and don’t connect with core customer and operational information.
  2. Speed to market: Managing infrastructure and using clumsy tools makes it hard to get apps to market fast.
  3. Ongoing operations: The hardest part of having successful apps isn’t building them, it is managing them.

Heroku is already used by thousands of startups and cutting-edge enterprises. It runs some of the largest and most innovative apps on the Internet, including Paper by FiftyThree, Automatic, Urban Dictionary, Lyft, and Asics. Heroku1 makes the power of Heroku available to companies that have embraced the Salesforce1 Platform.

Customers Apps Done Right with Heroku Connect

The cornerstone of Heroku1 is the new Heroku Connect technology, which leverages your Salesforce investment by giving your developers a simple and efficient way to connect all your customer-facing Heroku apps to your Salesforce environment.

Heroku Connect:

  • Automatically synchronizes your Heroku Postgres database with your customer and operational data in Salesforce.
  • Provides your Force.com data in a SQL database, which is compatible with all leading web and mobile app dev environments, such as Ruby, Java, Node.js and PHP.
  • Seamlessly scales to handle even the most demanding traffic levels.

Now your customer-facing apps can be responsive to your business workflows. For example, you marketing team can prepare offers within Salesforce and have those offers automatically presented to customers inside your app. And the connection is two-way, so data that your customer’s provide by using your app is synced back to Salesforce records automatically, making it easy to take actions and create dashboards based on how your customers use your app.

Better Apps with the Cloud

Heroku1 lets your software developers focus on building great experiences for your customers, not on managing infrastructure. Heroku1 removes the need to requisition, setup, and manage servers. This allows developers to focus on what they do best: Delivering better apps faster.

Heroku1 supports the most popular programming languages in use today, including Java, Ruby, Node.js, PHP, and Python. This means developers can choose the best tool for the job, delivering better software faster.

Sleep at Night, While we Manage Ongoing Operations

You know that launching an app is just the first step. After it’s live, you need to scale, monitor, and operate it. Heroku1 is a platform as a service, so app operations are built-in. We have a team of engineers who monitor your app to keep it up and running smoothly. Security updates and patches are handled automatically. And scaling is as simple as a single click to provision more resources.

Innovate with Heroku1

Are you ready to innovate? Heroku1 is in limited availability today to charter customers, and will be generally available in the first half of 2014. If you want to learn more about Heroku1, apply to be a charter customer, or be notified when it is generally available, please visit www.heroku.com/1.

Node.js the Right Way; Create Mobile Games with Corona

Node.js the Right Way; Create Mobile Games with Corona now in print

Tools for integrating Heroku apps with Salesforce.com

At our core, Heroku's goal is to make it easier for developers to build great apps. We do this by creating tools which allow developers to focus on writing code, rather than wasting time on managing infrastructure. To coincide with this week's Dreamforce event, we are launching several tools targeted at developers who write apps on Heroku that integrate with Salesforce.com.

If you aren't part of the Salesforce world, don't worry. We remain 100% committed to our core audience of web and mobile developers and will continue to release great new features and functionality like websockets and high-availability databases.

Force.com, a full stack platform for building employee-facing apps, provides a RESTful interface into Salesforce's sales, support, and marketing SaaS products. The three tools that we are launching today make it easier and more productive to build and connect to apps using Force.com. They are a Force.com CLI, Force.com Client Libraries for ruby and node.js, and Heroku Connect.

Force.com CLI

Previously, building Force.com apps required logging into Salesforce.com's web interface. But for developers who live on the command line, this can break flow. So we created the Force.com CLI. It allows you to interact directly with the data in Salesforce, in a lightweight and unobtrusive fashion:

$ force login matt@heroku.com

View the record types available:

$ force sobject list
Account
Campaign
Contact
Event
Group
Lead
Opportunity
Task

See information about a record type:

$ force field list Contact
AccountId: reference (Account)
AssistantName: string
AssistantPhone: phone
Birthdate: date
FirstName: string
LastName: string

Run a SOQL query:

$ force select id, name from user

 Id                 | Name          
--------------------+---------------
 005i0000002DYYQBB4 | Bob Smith 
--------------------+---------------
 (1 records)

The Force CLI is open source and is available to download now.

Force.com Client Libraries

In addition to the CLI tool, we are releasing Force.com Client Libraries for Ruby and Node.js. These libraries are based on existing open source efforts and are available on github.

Install the force.com ruby gem:

$ gem install force

… or the Node.js library:

 $ npm install force

Documentation on using the libraries is available on their Github pages.

Introducing Heroku Connect

Our final announcement for Dreamforce is Heroku Connect. It syncs data from Salesforce into Heroku Postgres automatically, handling many of the common issues of using an API such as local caching and conflict resolution. Because many application frameworks are optimized for using an SQL database (Rails, Django, etc), this makes it incredibly fast and easy to build apps that connect with Salesforce.

Heroku Connect is available as part of our Heroku1 edition and is in limited availability now. It be generally available in the first half of 2014. If you are interested in becoming and early customer, sign up here.

Come see us at Dreamforce

If you would like to learn more about Heroku1 or the Force.com tools, please visit us at Dreamforce this week. Herokai (Heroku employees) are stationed on the second floor of Moscone West at the Heroku Demo Station, Connected Devices Lab, and at the Hackathon.

Ruby Programming 48th Batch: Registrations are now open

Registrations are now open for RubyLearning’s popular Ruby programming course. This is an intensive, online course for beginners that helps you get started with Ruby programming. The course starts on Saturday, 30th Nov. 2013 and runs for eight weeks, with a one week break for Christmas and New Year.

Course Fee

Please create a new account first and then pay US$ 49.95 (for the first 15 participants, after which the course fee will be US$ 69.95) by clicking on the PayPal button Paypal


Download ‘Advice for Ruby Beginners’ as a .zip file.

Here is what Sandra Randall (Butler), a participant who just graduated, has to say – “You kindly offered me the opportunity to join your Ruby course. I’m new to development and found the course, even though basic for programmers, a little tricky for me. I managed to complete all of the assessments and really learnt a lot. Thank you very much for the opportunity. It has really given me the push I needed to learn Ruby and I’m currently treading my way through both the pickaxe and Agile Development books and enjoying it. I’ve recently been offered a position as a Junior Systems Developer at a local Software house in South Africa – all thanks to the push you gave me which gave me the motivation and drive to get going.”

What’s Ruby?

Ruby

According to http://www.ruby-lang.org/en/ – “Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. Ruby’s elegant syntax is natural to read and easy to write.”

Yukihiro Matsumoto, the creator of Ruby, in an interview says –

I believe people want to express themselves when they program. They don’t want to fight with the language. Programming languages must feel natural to programmers. I tried to make people enjoy programming and concentrate on the fun and creative part of programming when they use Ruby.

What Will I Learn?

In the Ruby programming course, you will learn the essential features of Ruby that you will end up using every day. You will also be introduced to Git, GitHub, HTTP concepts, RubyGems, Rack and Heroku.

Some Highlights

RubyLearning’s IRC Channel

Some of the mentors and students hang out at RubyLearning’s IRC (irc.freenode.net) channel (#rubylearning.org) for both technical and non-technical discussions. Everyone benefits with the active discussions on Ruby with the mentors.

Google Hangouts

There is a Hangout Event that is open for students, for drop-in hangouts where students can pair program with mentors or with each other. This is often where you can get help with your system, editor, and general environment. Anything that can help you with your coding environment that you are having problems with are usually discussed interactively here.

Git Repositories

Shared (private) repositories available for those that want to learn git and the revision controlled programming workflow. This allows students that want to collaborate while learning. This is a great way to record your progress while learning Ruby.

eBook

The course is based on the The Ultimate Guide to Ruby Programming eBook. This book is priced at US$ 9.95. However, the Kindle edition of the eBook is available for US$ 6.

Challenges and Side Tracks

This is course material not found in the RubyLearning Study Notes nor in the E-Book! Depending on participation levels, we throw a Ruby coding challenge in the mix, right for the level we are at. We have been known to give out a prize or two for the ‘best’ solution.

Who’s It For?

A beginner with some knowledge of programming.

You can read what past participants / online magazines have to say about the course.

Mentors

Satish Talim, Michael Kohl, Satoshi Asakawa, Victor Goff III and others from the RubyLearning team.

Dates

The course starts on Saturday, 30th Nov. 2013 and runs for eight weeks, with a one week break for Christmas and New Year.

How do I register and pay the course fees?

  • You can pay the course fees either by Paypal or send cash via Western Union Money Transfer or by bank transfer (if you are in India). The fees collected helps RubyLearning maintain the site, this Ruby course, the Ruby eBook, and provide quality content to you.

To pay the Course Fee:

Please create a new account first and then pay US$ 49.95 (for the first 15 participants, after which the course fee will be US$ 69.95) by clicking on the PayPal button Paypal

How does the course work?

For details on how the course works, refer here.

At the end of this course you should have all the knowledge to explore the wonderful world of Ruby on your own.

Remember, the idea is to have fun learning Ruby.

Technorati Tags: , , ,


(Powered by LaunchBit)

Heroku at Dreamforce – Nov 18 – 21

It’s hard to believe the scale or imagine the energy that is Dreamforce. As part of the Salesforce Platform, a platform with a growing developer community and an amazing range of technologies, Heroku will join the party November 18-21 in San Francisco. This is a big deal for us.

DevZone

A few weeks ago we announced the Salesforce $1 Million Hackathon. By the way, that’s $1 million cash, the single largest hackathon prize in history. The response from our developer community has been fantastic – the winning app will be undoubtedly amazing.

Heroku will also be a big part of developer workshops, the genius bar, several demo stations and a whole list of sessions. Extra bonus, the DevZone includes an area dedicated to the Internet of Things. We won't give away all the details, but the Salesforce Platform is pretty excited to have joined forces with connected product gurus, Xively, and long-time Heroku partner Ionia, to create a connected vintage pinball arcade. Locks have been picked, sensors installed…Death Star ready for battle.

Death

Heroku Sessions

There are over 1,250 sessions across the full event. Here are a few worthy session highlights:

– Application Security: Secure Heroku Apps with HP Fortify on Demand
Along with the release of their Heroku add-on, the HP Fortify on Demand team will walk developers through their trusted security monitoring cloud service.

– Using Heroku Postgres to Manage Your Salesforce Data
On the heels of the Heroku Postgres 2.0 release, we'll show developers how to leverage Heroku Postgres to bring new data management capabilities to Salesforce Platform integrations.

– 5 Ways Connected Products Will Transform Your Business
We've engineered a very cool connected product to demonstrate just why companies need to become connected companies. Here’s a sneak peek of some of our handiwork.

Dreamforce is going to be a great show, hope to see you there.

Now in print: Build Awesome Command-Line Applications; Developing Android on Android; The Dream Team Nightmare

Now in print: Build Awesome Command-Line Applications; Developing Android on Android; The Dream Team Nightmare

OAuth as Single Sign On

Today, we're announcing the release of a key part of our
authentication infrastructure – id.heroku.com – under the MIT license.
This is the service that accepts passwords on login and manages all
things OAuth for our API. The repo is now world-readable at
https://github.com/heroku/identity . Pull requests welcome.

While OAuth was originally designed to allow service providers to
delegate some access on behalf of a customer to a third party, and we
do use it that way too, Heroku also uses OAuth for SSO. We'd like to
take this opportunity to provide a technical overview.

A quick bit of terminology

We use the term "properties" to refer to public-facing sites owned and
operated by Heroku. See our post on Fancy Pants SSL Certificates on how to identify a Heroku property. As a concrete example of a property in this document, I use addons.heroku.com.

When a server makes some form of API call to another server, and is doing so in response to a browser request, this server-to-server request is a "backpost". An example is popular cat-picture-injecting service Meowbify. Requests to http://cats.www.heroku.com.meowbify.com/ result in Meowbify making a backpost to www.heroku.com.

The colloquial term "OAuth Dance" refers to the sequence of browser redirects which communicate an OAuth authorization code from service provider to consumer. It should also be read to include the backpost from consumer to provider that exchanges the code for an access & refresh token.

Single Sign On

Login Flow

When you click "login" on addons.heroku.com, you've probably noticed you get redirected to id.heroku.com. This creates a longer-lived cookie on id.heroku.com, and kicks off an OAuth dance with addons.heroku.com. If you log in to dashboard.heroku.com first, you'll also notice you can go to addons.heroku.com, click "login", and it'll also trigger an OAuth dance with addons rather than prompt you again for your password.

SSO Flow

Our internal use of OAuth for SSO is necessary in part due to legacy concerns. For most of our history, Heroku's Aspen and Bamboo stacks served customer applications out of the *.heroku.com domain, the same domain we use ourselves for all public-facing Heroku properties like www.heroku.com and dashboard.heroku.com. This significantly complicates our use of cookie-based authentication, as discussed in [1]. While we have retired Aspen and disabled the creation of new Bamboo applications, existing Bamboo apps continue to pose cookie-stuffing concerns and curtail our use of domain-wide cookies for any sensitive information.

This brings us to our first debatable deviation from the spec: pre-approval. A normal OAuth flow requires the customer be prompted to approve or deny the grant to an OAuth consumer. All Heroku properties are whitelisted to suppress this prompt, and implicitly grant. Additions to the list are tightly controlled, and strictly limited to our own properties. The rationale is that asking a customer to approve each individual property is a bit silly, since there is no third party.

When addons completes the OAuth dance, the refresh token is immediately discarded. The access token is then saved in the _heroku_addons_session cookie. The payload is encrypted with AES-256-CBC, random IV. The ciphertext is then HMAC'd with SHA256 for tamper protection. Secrets are held only by the addons app itself. Also present in the payload is a timestamp, telling addons to expire the cookie in six hours. All interactions between addons and our main API server are authenticated using that access token. For example, if you use addons.heroku.com to add Redis To Go or New Relic to your app, that results in addons.heroku.com decrypting your session cookie (after verification of the HMAC tag), extracting the OAuth access token, and making a backpost to api.heroku.com to execute the change. We use this same "save token in encrypted cookie" approach across all Heroku properties that require API access.

Addons Doing Stuff

Yo dawg, I heard you like oauth…

It's worth noting that identity itself also follows this access-token-in-session-cookie approach. Identity itself is "just" an OAuth consumer, albeit one that takes passwords. Due to the complexities of OAuth, we've split it off from the "real" service provider – api.heroku.com. Identity does not retain refresh or access tokens in a server-side database – those are held only by API. It does have two super-powers, though. Identity can manage OAuth grants and revocations on behalf of a user, but all such management requires the user's OAuth access token (again: that lives in the user's session cookie). It can also request and receive access tokens with an exceptionally long lifespan – up to 30 days. This extended lifespan allows the user to remain logged in for an extended period without the need to store or retain a never-expiring refresh token.

The other SSO

Logout
As you can imagine, all this introduces a single-sign-OUT problem. If I log in to dashboard, then visit addons, I now have two encrypted cookies with OAuth tokens in them, plus a third for Identity itself. If I click "logout" on dashboard, my expectation as a user is that this logs me out of all Heroku properties. For the most part, that's true. Regardless of what site you're logged in to, clicking logout results in the revocation of both your cookie based session and its OAuth token on id.heroku.com.

This is our second spec deviation: on logout, Identity has API revoke the OAuth access tokens of all Heroku properties that were issued for that browser. This revocation happens server-side, which prevents any long redirect chains that the browser must follow (e.g., google.com's redirect to youtube.com on login/logout). The encrypted access tokens of course remains present in the session cookies of addons and other properties. The next time addons receives a request with this now-stale cookie that requires communicating with API, API will 403 the revoked token, which addons interprets as a logout. The customer is then thrown back into an OAuth dance with identity, and must retry the request. Currently, this "revoke the access tokens on logout" behavior is limited to Herkou properties, and is not publicly available.

However, replaying an old/stolen cookie against an addons URL that does not perform a backpost to API, you still appear to be logged in. The most interesting case for addons is that this can be done to get a list of your applications. What's going on? For performance reasons, addons.heroku.com uses memcache to cache the list of applications owned by a user. The network round-trip to memcache is much faster than the roundtrip to API, plus a call to API's database. However, that means changes on API aren't immediately reflected on addons. Because the goal of the cache is to avoid an API call, that means not every URL on addons does a call to API, and addons doesn't properly realize the cookie has a dead access token until it actually makes an API call. The result is that it's possible to appear still logged in to addons, even after you log out.

Enter the Nonce

Having to retry a request is a horrible user experience. As a secondary measure to mitigate the lack of a global single-sign-off involves the use of a second cookie, scoped to *.heroku.com, called heroku_session_nonce. This cookie is issued by Identity upon login, and as the name indicates it contains a random nonce. The session nonce is reset on logout. All Heroku properties, when completing their OAuth dance, observe the current heroku_session_nonce value and save it in their private session cookie. On all subsequent authenticated requests, the private nonce is compared with the domain-global heroku_session_nonce cookie. A mismatch is treated as an authentication failure, and the browser is redirected to id.heroku.com to do a fresh oauth dance. The use of a domain-global cookie to indicate logout allows us to avoid any additional database roundtrips, and lets us avoid forcing a backpost to API on every request to every property.

Nonce Mismatch

But wait, I said at the start we're not using a global cookie because of our legacy of Bamboo, and untrusted people having access to all such cookies. Doesn't heroku_session_nonce suffer from the same problem? Of course it does! Controlling a user's nonce cookie has two noteworthy security implications:

  • Denial-of-Service: An attacker able to coerce a victim into visiting
    a site under their control can use a malicious Bamboo app to
    continuously delete or overwrite the nonce with gibberish. This can
    forcibly log out the victim, making it difficult to interact with
    Heroku. While this is undesirable, we've decided it's an acceptable
    risk given the complexity of any other solution.
  • Session Fixation (kinda-sorta): An attacker who observes a victim's
    nonce can set a tracking cookie on the browser to uniquely identify
    the victim. When the nonce changes (i.e., on logout), the attacker
    can continuously re-set the nonce to the old value. This would
    result in addons and other properties believing that, until the
    property's session cookie expires, the victim is still logged in.
    Without the presence of an XSS or similar vuln, however, the
    attacker is unable to leverage this further. In the shared-browser
    threat model (e.g., internet cafes in developing regions), this
    becomes slightly more interesting. However, a plethora of more
    serious attacks come into play in that case, such as keystroke
    logging. Given that, and the lack of a useful attack, we are again
    OK with this risk.

We realize this is still sub-optimal, and certainly aesthetically displeasing to security folks. An elegant, performant, provably secure solution to handling distributed cache invalidation is a special case of one of the two hard problems in computer science[2]. If you're able to solve it, you'll probably get a Turing Award, our field's closest thing to a Nobel Prize. Until this happens, we're stuck with a series of workarounds and complex interactions as I describe above.

Isn't this confidential??? What if there's a security vulnerability?

Heroku would not exist without open source. Other security sensitive open source software we use include "Rails" and "The Linux Kernel". While we use GitHub's issue tracker extensively, as always we ask security researchers submit vulnerability reports to security@heroku.com. The Security Team's PGP key is available in our vulnerability reporting guidelines.

Vulns like this?

https://github.com/heroku/identity/pull/49 – we had a CSRF issue in the approve/deny page. This was originally reported to us from an independent security researcher. It's a good example of some ambiguity in the OAuth spec. From RFC 6749, sec 3.1:

The authorization server MUST support the use of the HTTP "GET"
method [RFC2616] for the authorization endpoint and MAY support the
use of the "POST" method as well.

There's two ways to read this:

  1. The OAuth consumer redirects the end user to identity to start the dance, and should be able to use an HTTP 302 temporary redirect to do so. Browsers do a GET for all 301 and 302's, so the identity needs to accept a GET to display the approve/deny page.

  2. By "endpoint", they mean the state-changing URL that the browser posts to when the end user pushes the "Approve" button. Normally, since this is a state-changing action, you'd only use POST (or arguably, PUT). But, the spec says supporting GET is an RFC MUST, so GET it is.

As you can imagine, an extra line in the spec to differentiate what "endpoint" it means would go a long way.

In closing

Given our limited resources, the low impact of the information available to an attacker, the compensating measure of the six-hour cookie lifetime, and the high difficulty of being able to execute an attack like this without also being able to gain greater access to a victim's computer that would result in more rewarding attacks (e.g., a keystroke logger to capture the user's password), we are comfortable releasing identity. As we continue to build and improve Heroku as a platform, we are constantly looking for how to incorporate security needs into our underlying architecture.

If you have any feedback or suggestions on this, we would be delighted to hear them. We have considered the "obvious" solution of having API do a callback to all properties to notify them to do any logout-related cleanup (i.e., flush their caches). We haven't gone down that path yet because this would be a moderately large endeavor, and something that should really be handled in the OAuth protocol specification itself. Because it's not complicated enough, you see.

Acknowledgments

While our adoption of OAuth for SSO was a major team-spanning effort, identity was principally written by my colleague Brandur Leach. All thanks go to him, but any factual errors here are mine alone.

Heroku Security Hall-of-Famer Tejash Patel's report to us in July 2013 was the impetus behind this blog post and the open-sourcing of identity.

The key idea of decentralizing credentials out to the browser, and thus making id.heroku.com a less tempting target, originally came from Scott Renfro, my former coworker and mentor in paranoia.

Notes

  1. "Origin Cookies: Session Integrity for Web Applications", Web 2.0 Security and Privacy Workshop, 2011. A. Bortz, A. Barth, A. Czeskis. http://w2spconf.com/2011/papers/session-integrity.pdf
  2. "There are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors." -Paraphrase of saying generally attributed to the late Phil Karlton.

-Tom Maher
Heroku Security Team

Functional Programming Patterns in print; Backbone Marionette

Functional Programming Patterns in print; Backbone Marionette now available

Interview with Heroku’s Mattt Thompson: The Incredibly True Story of Why an iOS Developer Dropped His CS Classes and Eventually Learned How to Fly

Editor's note: This is a guest post from Rikki Endsley.

In this exclusive interview, iOS developer Mattt Thompson opens up about the moment when he realized he'd become a programmer, why he dropped his computer science classes, and what he does AFK.

Mattt and a Unicorn

Had Mattt Thompson followed in his parents' footsteps, he'd be a musician now instead of a well-known iOS developer working as the Mobile Lead at Heroku. Matthew “Mattt” Thomas Thompson was born and raised in the suburbs of Pittsburgh, Pennsylvania, by parents who are both musicians, play in the symphony, and teach music. Whereas his sister took to music growing up, Mattt kept going back to his computer. He says he couldn't help it. “I'd spend hours in Photoshop and GoLive, learning all of the tricks to making websites. These were the days before CSS, when <table>,<font>, and spacer.gif were state of the art.” Back then, he pictured himself as a designer, which is what he did for the web design company he and a developer friend started while still in high school.

“Do a quick Google search, and you'll find that Matt Thompson is an extremely common name,” Mattt says. When he decided to get his own URL, Mattt wasn't surprised to discover that mattthompson.com was already registered, so the path of least resistance seemed clear—just add another “t”. Thus, Matt became Mattt and the owner of matttthompson.com.

“Weird as it is, the extra 't' has become a convenient low-pass filter for people getting in touch about some opportunity or another,” Mattt says. “It's like my own personal brown M&M rider, to see if people are paying attention. I didn't intend this at first, but it's remarkable how consistently it works.”

Along Came Ruby

“I spent the Summer of 2005 poring through a Ruby book that I bought randomly at a bookstore liquidation,” Mattt says. And that's when his interest shifted from design to programming. By the time Mattt started his freshman year in the Carnegie Mellon Computer Science department, he was staying up late into the night focusing on his new interest. He even remembers the moment when he realized he had become a programmer, and there was nothing standing in the way of him making whatever he wanted. “All of those apps and games that I had always wanted to make were now plausible,” he recalls.

Much of Mattt's early work was writing courseware, making apps for school projects, and hacking weekend projects. Although he enjoyed programming outside of class, doing it for class was different.

“Computer Science classes never caught my passion like this, so I decided to drop out of the program, pursuing undergraduate degrees in Philosophy, Linguistics, and Art,” Mattt explains. “I firmly believe that those disciplines specifically, and a liberal arts education in general, provide an intellectual rigor for understanding problems on a fundamental level—something that CS alone can't begin to approach.”

With a background in the web and Ruby communities, Mattt thinks programming is as much a profession as it is a passion, so developing open source software made sense. “When you make something cool, you should show it off and share it with others,” he says. “The first thing I made that got any attention was Chroma-Hash, a JavaScript visualization of password strength. That rush of hitting the top of Hacker News and attracting attention on Twitter was addictive, and it's been a contributing factor ever since.”

Attention is great, but that's not what keeps Mattt involved with open source. “What really keeps me active in the community is the opportunity to make things that make the lives of others better, no matter how niche the audience or marginal the improvement,” he says. “Code costs nothing to share, creates good will, and contributes to the gift economy. What's not to like?”

From Gowalla to Heroku

After college, Mattt worked as a Rails developer for iKnow, a Tokyo startup, which is where he got into iOS development. As his chops in iOS programming improved, his interest in another iPhone app, Gowalla, grew, which inspired him to cold call Gowalla co-founder and CTO Scott Raymond. Mattt joined Gowalla as an iOS developer in 2010. Facebook acquired Gowalla in late 2011.

“When it came time to look around for my next step, I asked myself: 'Which company is solving the most interesting problems?'” Mattt says. Heroku, which was acquired by Salesforce in 2010, had just announced the launch of its Cedar stack. Mattt thought the new launch put the company years ahead of anyone else in terms of understanding and executing the potential of cloud application platforms. “As a long-time customer myself, I had fallen in love with their design aesthetic and pragmatic approach to development, as articulated in co-founder Adam Wiggins' Twelve-Factor App manifesto,” Mattt says.

So he called someone he knew at the company and asked whether Heroku wanted an iPhone developer.

“When we talk about developing mobile applications,” Mattt explains, “What we're really talking about is cloud applications. Look at your phone's home screen. If you remove all of the apps that require the Internet to be useful, you're left with what? Phone? Clock? Calculator? It's an Internet connection that makes a phone smart.”

Most mobile clients communicate with servers over an API, and Mattt explains that those web applications are increasingly being deployed on cloud platforms like Heroku. “Mobile is not different in this respect, of course—rich web content, built in Ember.js or Backbone.js follows this same pattern.” He says that mobile exemplifies the case for cloud technologies. “Overnight, your mobile app might go from 100 to 100k users, with a few million by the end of the week. Rather than be a victim of your own success, spending your time fighting server fires while you attempt to keep pace, Heroku takes care of this for you, and allows you to focus on developing your product to make it even better.”

Since he joined Heroku, Mattt's responsibility has been to improve the Heroku mobile development experience. “Whether that meant working on open source projects like AFNetworking, Helios, Postgres.app, and Nomad; writing mobile development articles for the Heroku Dev Center; speaking at conferences and local meetups; helping out on support tickets; or working with Heroku Add-on Providers to deliver the essential services apps rely on,” he explains. “I became the point person for all things mobile at Heroku.”

Now Mattt has a new focus. “Heroku is the best way to develop, deploy, and scale software on the internet,” he says. “Salesforce, meanwhile, is known for being the world's premiere CRM solution, but also happens to be built on top of a development platform, handling billions of requests every day.” He says there's a huge opportunity to bring the agility and flexibility of Heroku to the Salesforce platform.

Mattt is working with the community to produce a set of first-party libraries for languages such as Ruby, Node.js, and Objective-C, which developers can use to interact with the Force.com APIs. He's also working with a colleague, David Dollar, on a command-line utility for Salesforce. Mattt expects the client libraries and CLI to radically improve the development experience for millions of Salesforce developers.

Away From Keyboard

Although Mattt is enthusiastic about programming and his role at Heroku, his time away from the keyboard is particularly interesting. “Back in May, after an exhausting month-long trip through Europe, I decided that I was tired of waiting to do all of the things that I had wanted to do,” he says, “So I started getting my pilot's license, flying Cessna 172s out of San Carlos on weekends.” He also picked up the trumpet and practices every day to get his chops ready for sitting in at jazz clubs.

Cessna 172 Instruments

“I went sky diving to impress a girl, and I'm not sure if that crossed some wires in my head or what, but I've been hooked on air sports ever since, driving down to Hollister, California, to work on my hang-gliding certification.” And he likes to cook, which is why he's watched the entirety of Alton Brown's Good Eats a few times. “The perfect Sunday evening involves cooking up something new from whatever I picked up at the farmer's market, followed by a classic movie on Netflix with my girlfriend.” As passionate as he is about programming, Mattt sees a future away from it. “There's a very real chance that I'll eventually up and leave the tech world to be a pilot or flight instructor of some sort,” he admits.

Dirty Details for Developers

What does Mattt's workstation look like?

“I keep things simple. MacBook Air and an Apple Cinema Display when I'm at the office, or just the laptop and a pair of Bose QC 15s when working from a coffee shop. Typical setup for the hipster hacker set."

Japanese Keyboard

"The main difference that throws anyone else using my computer is its Japanese keyboard layout. I switched from QWERTY when I moved to Tokyo for my first job out of college, and have been loving it ever since. It's a bit disorienting at first, but the little touches, like Caps-Lock being shoved out of the home row, an over-sized Return key, or dedicated characters for @, ^, and other coding essentials that make it—in my opinion—the best keyboard layout for any programmer (especially for Objective-C). Also, each key has a Hiragana character next to the English, which looks really cool.”

Touch type? Or hunt and peck?

“Touch Typing. Mavis Beacon would be proud.”

If Mattt could contribute to another open source project, which would it be?

“I'm not really all that shy or afraid to get my hands dirty, so there aren't too many projects out there that I've felt held back from contributing to. That said, I'd love to do more with Go and Rust, which both seem like great new languages.”

Which project is Mattt most proud of?

AFNetworking is by far the most substantial and popular thing I've ever made, and I take a certain amount of pride in that. I'm somebody that finds it easy to start new projects, but have difficulty following through past a certain point. So it's nice to be able to point to AFNetworking—a project I've actively maintained for the last two and a half years—as a counter-example.”

How does Mattt explain to non-technical friends and relatives what he does for a living?

“I have no idea. It's hard enough to get them to pronounce the name of the companies I've worked for. Seriously, Gowalla? Heroku? But honestly, saying that I work with computers is enough. I try not to talk much about work when I'm AFK.”

Meet Mattt at Dreamforce

If you want to meet Mattt in person, you will find him speaking in the Developer Zone at Dreamforce 2013, which will be held November 18-21 in San Francisco. First, you'll have to catch him. “Between helping out with the Developer Keynote, presenting a session about mobile development on Salesforce, and running a workshop on leveraging Heroku for Force.com, I'll be all over the place.”

Announcing Heroku Postgres 2.0

Today we’re excited to announce an evolution of what it means to be a database as a service provider – introducing Heroku Postgres 2.0.

Along with new features such as the ability to roll back your database to an arbitrary point in time and high availability, we now provide an entirely new level of operational expertise that’s built right in. This new level of service allows you to feel at ease while taking advantage of your database in ways you never expected. All of this is rolled into new tiers which make it as easy as ever to choose what’s right for you.

Heroku Postgres 2.0 brings:

You can read more about this announcement over on the Heroku Postgres blog.

Troubleshooting Down the Logplex Rabbit Hole

Adventures of a Heroku Routing Engineer

My name is Fred and I spend most of my time on Logplex. Since joining Heroku in March 2013, I've become the main developer on that product and handle most of the maintenance and support that goes with it. In this post, I'll explain what the Heroku routing team needed to do to make Logplex more stable, decrease our workload, and keep our mornings quiet and productive.

I'm a remote employee on the Heroku routing team, and I live on the East coast, which means I'm usually the first one in the routing room in the company's HipChat. Later, Tristan, who lives in Illinois, joins me. Then a few hours later, the rest of the team joins us from the West coast, where most of Heroku's employees are located. I usually have a few silent hours to work without interruptions, e-mails, questions, scheduled meetings, and other distractions, which is a relaxing and productive way to start the day.

The Routing Team Badge

Customer support tickets that get escalated to the team, or Nagios and PagerDuty alarms, which I try to intercept before they wake my on-call teammate, are the only interruptions to my otherwise productive mornings. Generally, the alarms are rare, and my mornings are becoming increasingly uneventful. Back in June and May, however, an array of alarm interruptions was exhausting our entire routing team.

Alarms were coming from all stacks, while customer support tickets also started rolling in. The problems compounded to make a bad situation worse. Alarms interrupt the team's work flow, which then slows down our ability to resolve the problems. Issues cropped up at all hours of the day and night, so the team, especially the engineer on call, was worn down and becoming less productive, and our time was consumed by interruptions and fires to extinguish.

Although we have other areas of focus, the routing team mainly works on Heroku's HTTP routing stack and the logging infrastructure, centered around Logplex. A sizeable number of alarms were related to individual Logplex nodes that kept crashing with increasing frequency, each time generating alerts in our chat system, via e-mail, and ultimately through the routing team's worst enemy, PagerDuty.

What Logplex Does

Before going further, I should explain what Logplex does, otherwise this entire post may be a bit confusing.

The simplest way to explain Logplex is that it takes log data coming from everywhere in Heroku (dynos, routers, and runtimes) and pushes the data around to other endpoints. More specifically, Logplex accepts syslog messages from either TCP or HTTP(s) streams on any node in the cluster. Each of these nodes will store the log messages in Redis buffers (1,500 lines) to be accessed by heroku logs, allow a distributed merging of all the incoming streams for a given application in order to be displayed live with heroku logs --tail, or will forward them over through drains.

Drains are buffers on each of the Logplex nodes. They accumulate messages received for each node, and then forward them to remote endpoints designated for an application by a user or an add-on provider. This can be done over TCP syslog, or a custom HTTP(s) syslog format when required.

The workflow, to make it simple, looks a bit like this:

Logplex Data Flow

There's also an HTTP API that lets tools and users manipulate the drains, tails, and sessions to be used in the system.

When dealing with logs, we get unpredictable loads, which may be bursty (for example, when an application crashes and generates 50,000 lines of stack traces). Thus, overload is a constant issue with Logplex.

Generally speaking, overload can be dealt with in three ways:

  1. Scale up forever by adding machines.
  2. Block the input to slow down the producers.
  3. Shed load by dropping requests.

The first option is good when you are in over your head, because no matter what you do, the current system can never handle the load and the quality of service is bad.

The second option, on the other hand, is one we want to avoid within Logplex. We don't want an overloaded Logplex node to slow down Heroku applications that are trying to log data, or applications that are busy crashing.

This leaves us with the third option, shedding load. How to handle overload is an important decision to make when you are first designing a system, because it will impact the data flow throughout the process. A system that chooses to block must be synchronous in most of its operations so that the bottleneck located deep within the app can impact the accept rate of the API at the edge of the system. An app that chooses to shed load must be mostly asynchronous internally and its slowest components, where data tends to accumulate, must be able to drop the overload.

For Logplex, which is written in Erlang, we do this load shedding in each process that writes to the outside world: the tail buffers, and we have the Redis buffers locally and within each single drain. To work this way, the drain can be thought of as a combination of two state machines:

  1. a state machine representing the connection or protocol negotiation state for the socket to the remote endpoint;
  2. a state machine handling the buffering of log lines (both on input and output).

For the first type, the most basic machine is:

Connected-Disconnected FSM (simple)

This means that a drain can be disconnected or connected to a given endpoint and transition between the two.

The second type, for buffering, can be represented as:

Accumulating-Extracting FSM

Both machines are conceptually simple, and HTTP drains, for example, use one process for each of them. There is some message passing overhead, but the model is easy to reason about.

To conserve resources, TCP drains will execute both state machines within a single Erlang process. The "accumulating" and "extracting" buffer operations can be made implicit to every operation, but because the input stream to a process is never stopping and impossible to regulate directly, they should never be blocked by other operations without compromising the stability of the system. This means that operations such as sending a message must be made asynchronous, prompting for an additional state:

Connected-Disconnected FSM (complex)

In practice, both drain implementations are far more complex as a result of a plethora of small business rules and behaviors we want to enforce, such as batching, retrying failed messages, loss reporting, send time outs, exponential backoffs, and so on. The key points are that HTTP buffers are conceptually simpler (and thus safer), but cost more in terms of resources; and TCP buffers are more efficient, but trickier to reason about and consequently easier to mess up by accident.

So that's the sky-high view of what I work on when I dive into in Logplex.

Garfield Would Hate All Days if They Had PagerDuty Alarms

As I mentioned, alarms were rolling in, which was particularly frustrating to our team because the system had been designed so that its architecture could cope with individual node failures. The problem was that we received alerts that were useless because they were not actionable. When a Logplex node would go down, I'd get a page, acknowledge it, wait for it to clear up, then go to sleep until the next day, when I would start investigating.

We ended up turning off the alarms for node failures, but we still wanted to reduce the failure rates. Users' logs that were in transit over the node are lost when it crashes, and although losing logs (as opposed to losing customer information) isn't that big of a deal because you can always generate more, it still makes for a bad user experience.

To make a forced analogy, the system was on painkillers, but we still had to fix what was causing the pain. Thus, I needed to figure out why the nodes kept dying (almost randomly), and then fix the problem so it stopped happening so frequently.

Fixing Crashes Through Refactoring

First Attempt at Fixing the Problem

Investigating the Cause of the Error

I was able to wait for a day before investigating a crash because Erlang nodes that go down generate a crash dump, which is a file containing basic information (for example, a slogan that says why the node died, such as its inability to allocate memory) for the entire node. The dump also provides the state of all processes that were running at the time, minus a few details. Each dump can range from a few hundred MBs up to a few GBs in size, and rummaging through them for relevant details can be challenging.

At first the dumps are intimidating, and they remain so for a while. But as you get used to your nodes dying, you get a better understanding of crash dumps. Common patterns pop up, and often these are message queues of processes that are excessively large, the number of processes or open ports and sockets, the size of some processes in memory, and so on. Most failures are going to be caused by one of these conditions. Finding patterns in the crash dump will lead you to a specific part of the code, which will help you reproduce the bug later.

After finding the patterns, I wrote a script that quickly scans large crash dumps to find and report them as plain text. Using this script and manual digging in the Logplex crash dumps, I could see the most frequent pattern, which was a single process with a large mailbox causing the node to go out of memory. By looking at the faulty process's call stack and ancestors in the dump, I could see that the problem was almost always a TCP Syslog drain.

I had never seen an HTTP(s) drain at fault for these failures, but the failures could be because we have nearly an order of magnitude more TCP Syslog drains than HTTP(s) drains, or it could be due to our TCP Syslog drains implementation blocking.

Without knowing what to attack and experiment with, I couldn't do much debugging. The Splunk and Graphite data we have showed that the node's memory would progressively go up sometime between 10 and 40 minutes. This, combined with a large mailbox, told me that the problem was related to slow input, rather than fast input the drain couldn't keep up with. Given that data set, I decided to blame the TCP Syslog drain implementation for our problems.

Trying to Refactor Drains

Even though hunting down individual blocking elements in the TCP Syslog drains is possible, the routing team and I decided to try refactoring it to use two distinct finite state machines, like the HTTP(s) drains did. This would make the code infinitely simpler, and promised lower latency and memory at the price of higher drop rates; if messages stay shorter in the process's mailbox so they can be dropped, there are fewer held back and gobbling memory at any point in time.

After taking a few days implementing the changes, and running tests and benchmarks, I returned to the team with a solution that seemed to achieve the goals we'd outlined. We agreed to send the code to staging, where it matured and proved to be absolutely stable for a while. The problem with staging, however, is that it does not need to sustain the same load production does, and generating the kind of daily traffic Logplex must deal with locally or in staging would be a serious undertaking. So far, generating that kind of traffic has not been worth it for the routing team in terms of developer resources.

I crossed my fingers and added one production node to run the modified code over a few days so I could monitor it and assess that the code could sustain production loads correctly.

Hiccuping so Hard You Die

When we looked at our team's internal tools that displayed live statistics for all Logplex nodes, the new one performed terribly. The node still had acceptable throughput, but we found a stop-start cycle of roughly 15 seconds, for a few milliseconds of pause at a time. This stop-start cycle meant that the node was having a hard time scheduling all of its Erlang processes in time for them to do their necessary work. Under heavy load, this may be understandable, and eventually the node may recover, but the load was quite regular at this point, and my new node was doing considerably worse than all of its older siblings.

After a few hours, some of the freezes would last longer and longer, and eventually, the node would come crashing down.

Second Attempt at Fixing the Problem

Searching for the Cure (for Hiccups)

I wanted to fix things, so I dove into the application logs. There was a considerably long event sequence to comb through. A typical Logplex node will log anywhere between 100 and 350 messages a second internally. A crashing one can bring this up to multiple thousands, and the events that were most useful to diagnose the failure could have happened 15 to 30 minutes earlier.

After studying the logs for a while, I was able to attribute the initial failing calls to cowboy's acceptors, which is a library we use for our HTTP input stream. In short, the cowboy server had an architecture a bit like this:

socket manager

This architecture is a regular pattern for TCP servers in Erlang. The manager (or a supervisor) opens up a listen socket, and shares it with a group of acceptors. Then each acceptor can concurrently accept new connections and handle them, or pass them off to a third party. This usually allows for far better connection times with shorter delays than doing things sequentially with one single process.

The special thing about this cowboy implementation, however, was that every time an acceptor accepted a connection, it reported back to the manager, a central process, to track the connection and check for configuration changes. This manager was a bottleneck for the execution of the program. Loïc, Cowboy's maintainer, knew about this bottleneck, which also showed up in benchmarks. To fix the problem, albeit temporarily, he did a trick that few Erlang applications should ever do: he raised the process priority.

To understand the implications of the fix, you should be aware that Erlang's VM does preemptive scheduling of all processes, and does so fairly across all processes based on a) the work they have accomplished, b) how busy the node is, and c) the work they will have to accomplish (e.g., if their mailbox is full). Because of this balancing, some important system processes often need a larger share of the processing power. The Erlang VM therefore supports four process priorities:

  1. low
  2. normal (default)
  3. high
  4. max

Every process that has a "high" priority and has to run code right now will do so before any of the "normal" processes can run. All the "normal" processes will run before the "low" processes can. This can work reasonably well when you know the load characteristics of the system you will have, and that higher priority tasks are truly higher priority; however, in the case of cowboy, the priorities were merely a work-around for speed in artificial benchmarks.

The supposition for all our woes became rather evident to me:

  • My refactoring nearly doubled the number of processes on the node, creating more contention for schedulers.
  • Cowboy was hogging the scheduler time to accept connections and communicate with acceptors.
  • Other processes (those handling the requests) were starved for CPU time.

Before I blame cowboy for all the issues, keep in mind that Logplex was running an old version of the server (0.6.1), and that newer versions (0.8+) have gone through architectural changes that entirely removed that central bottleneck for most calls and dropped the priority back to normal.

I first tested the hypothesis by manually changing the scheduler priority of the cowboy process, without otherwise altering the server's design. The crashes became a bit less frequent, but the new nodes were still not reliable enough to lead to solid conclusions from tests.

Despite our team's best efforts, I had little data and no way to reproduce the hiccups anywhere other than in production. I still held onto the idea that cowboy's architecture might be a problem; it still had a central bottleneck, despite the scheduler priorities being right. Now it was time to upgrade to a newer Cowboy version.

Upgrading the Servers and Nodes

We settled on Cowboy 0.8.5, which had a new internal architecture. Unfortunately, this required upgrading the Erlang version from R15 to R16. Erlang versions tend to be backwards compatible, and deprecations come only with a few warnings, which I wanted to eliminate. The biggest issues were the interface changes in the cowboy library itself, which required extensive testing to ensure full backwards compatibility for seamless deployments.

R16 also contained what promised to be sweet scheduler optimizations that we hoped to get our hands on, and I optimized the work done with ETS tables to be more parallel and require less data copying, and also shipped a few important bug fixes to make sure it was worth upgrading. This ended up being a bit complex because some of the bugs were urgent and deployed on the production nodes running R15, then were ported forward to work on R16, which I then forked to make work with the new cowboy version, which had to be merged back with the other branch that … and so on. At one point I was running and monitoring four different versions of Logplex at the same time in production to see which one would win. This wasn't the smartest option, but, given how difficult isolating variables is, it was the most practical option at the time.

One apparent result was that the regular R16 nodes were doing well, without the stop-start cycles. Eventually, I replaced all R15 production nodes with R16 ones, and kept four nodes running R16 with the split TCP Syslog drains.

Don't Stop Me Now

After monitoring the system for a few weeks, I only saw minor improvements from the split drains nodes. They seemed to lock less frequently, and far less violently. Then, every four or five days, one of the new nodes would go radio silent. They would appear to be running, but nothing would come in and out of the VM for 20-30 minutes at a time, then a thousand lines of logs, then nothing again. Connecting to the node through a remote shell or killing it in a way that generated a crash dump was impossible. Nodes never would recover fully from this, always flapping between "poorly usable" and "not usable at all."

I decided to give up on weeks' worth of work because there was no way to run the code at the time. Maybe, now that the routing team changed some internal mechanisms for the system and the VM still made more progress with its schedulers, running the code would be possible, but back then I was at a dead end. I didn't admit total defeat, however, because time proved that all the other nodes in the fleet (those updated to R16, but without changing the drain model) had stopped crashing almost entirely. I ported the rest of the optimizations that weren't there yet to that branch, and we made that the standard for the entire cluster. The boosts in reliability were such that I could archive the trello cards related to the tasks and could start working on other stuff.

Roughly three weeks later, Nagios started screaming in the routing team's internal chat room every five minutes, for days at a time. Some nodes in the cluster had their memory bubble up, and never gave it back to the OS. The nodes wouldn't crash as fast as before; instead, they'd grow close to the ulimit we'd set and hover there, taunting Nagios, and us by extension.

Clearly, I needed to do more work.

I Just Keep on Bleeding and I Won't Die

First Attempt at Fixing THAT Problem

Garbage Garbage Collection

At first I decided to wait it out and see when the node would crash, as it had to do at some point. I hoped that a crash dump would hold details about memory that could help. Maybe the Logplex nodes were resentful that I wanted to work on something else, but they wouldn't ever go over the memory limit. They'd wait and nag and die (or refuse to) for my attention.

Eventually I logged onto the misbehaving production node and ran the following expression within the Erlang shell:

[erlang:garbage_collect(Pid) || Pid <- processes()].

This effectively goes through all processes on the Erlang node, and forces a garbage collection on them. The alarms stopped.

Manually forcing the garbage collection turned out to work wonders. The question was: Why? Before answering that, though, I needed data. The node hadn't crashed, the garbage was gone, and before a node could display that annoying behavior took weeks. I had little choice but to wait.

Fortunately, the phenomenon happened more frequently over time. A few nodes eventually managed to go overboard and die for me. The analysis of the crash dumps revealed that there was no single process holding a criminal amount of memory, and neither was a single mailbox (largest: 26) looking like it would explode. Some processes had high levels of memory, but live profiling didn't give any single process as a culprit.

As the plot thickened, I kept manually calling for garbage collection when a Nagios memory alert cropped up, but something looked off. Here's a table of one of the nodes' memory consumption as reported by Erlang and by the OS before and after forced global garbage collection:

logplex.82559 erlang:memory(total) beam process node total
Pre GC 9.87 GB 12.9 GB 14.15 GB
Post GC 5.89 GB 11.8 GB 12.9 GB
Delta -3.98 GB -1.1 GB -1.25 GB

And here's what a relatively fresh node looks like:

logplex.83017 erlang:memory(total) beam process node total
Memory 6.4 GB 6.7 GB 7.6 GB

That was fairly weird and highlighted two big things:

  1. Garbage collection seems to have trouble doing its job for the entire node without being prompted.
  2. There is memory not allocated directly by Erlang (the language) that seems to stay around and grow with time.

I decided to focus on the first point (why GC was having trouble) because it was immediately actionable and showed tangible results that could prevent errors and crashes. I had no idea what the culprit was, but at some point, one of the nodes started triggering errors. I sat at the terminal, waiting to see changes in metrics. While the OS would report the following:

graphite memory chart

the VM would internally report the following (the scales are different and confusing):

splunk memory chart

Here are the numbers for the memory statistics over time:

logplex@ip.internal)1> [{K,V / math:pow(1024,3)} || {K,V} <- erlang:memory()].
[{total,10.441928684711456},
 {processes,3.8577807657420635},
 {processes_used,3.8577076755464077},
 {system,6.584147918969393},
 {atom,3.5428348928689957e-4},
 {atom_used,3.483891487121582e-4},
 {binary,2.7855424359440804},
 {code,0.008501693606376648},
 {ets,3.745322160422802}]

%% waiting 20 minutes %%

logplex@ip.internal)1> [{K,V / math:pow(1024,3)} || {K,V} <- erlang:memory()].
[{total,11.024032421410084},
 {processes,3.953512007370591},
 {processes_used,3.9534691693261266},
 {system,7.070520414039493},
 {atom,3.5428348928689957e-4},
 {atom_used,3.483891487121582e-4},
 {binary,3.2071433141827583},
 {code,0.008501693606376648},
 {ets,3.8099956661462784}]

This shows heavy memory growth, most of it for binary memory, going from 2.79GB up to 3.21GB. Remembering Mahesh Paolini-Subramanya's blog post on binary memory leaks, I decided to try confirming that this was indeed the root cause.

Erlang's binaries are of two main types: ProcBins and Refc binaries. Binaries up to 64 bytes are allocated directly on the process's heap, and take the place they use in there. Binaries bigger than that get allocated in a global heap for binaries only, and each process holds a local reference in its local heap. These binaries are reference-counted, and the deallocation will occur only once all references are garbage-collected for all processes that held a binary.

In 99% of the cases, this mechanism works entirely fine. In some cases, however, the process will either:

  • do too little work to warrant allocations and garbage collection;
  • eventually grow a large stack or heap with various data structures, collect them, then get to work with a lot of refc binaries. Filling the heap again with binaries (even though a virtual heap is used to account for the refc binaries' real size) may take a lot of time, giving long delays between garbage collections.

In the case of Logplex, the latter case was the one occuring. I confirmed it by polling processes of a node with process_info(Pid, binary), which returns a list of all the binary references of a process in a list. The length of the list can be used to know which processes hold the most references, but that's not quite enough yet; these references may be valid. Building that list, calling for global garbage collection on the node, then building a new list and calculating the delta between both to know which processes held the most out-of-date references needed to be done next:

(logplex@ip.internal)3> MostLeaky = fun(N) ->
(logplex@ip.internal)3>     lists:sublist(
(logplex@ip.internal)3>      lists:usort(
(logplex@ip.internal)3>          fun({K1,V1},{K2,V2}) -> {V1,K1} =< {V2,K2} end,
(logplex@ip.internal)3>          [try
(logplex@ip.internal)3>               {_,Pre} = erlang:process_info(Pid, binary),
(logplex@ip.internal)3>               erlang:garbage_collect(Pid),
(logplex@ip.internal)3>               {_,Post} = erlang:process_info(Pid, binary),
(logplex@ip.internal)3>               {Pid, length(Post)-length(Pre)}
(logplex@ip.internal)3>           catch
(logplex@ip.internal)3>               _:_ -> {Pid, 0}
(logplex@ip.internal)3>           end || Pid <- processes()]),
(logplex@ip.internal)3>      N)
(logplex@ip.internal)3> end,
(logplex@ip.internal)3> MostLeaky(10).
[{<0.9356.0>,-158680},
 {<0.4782.0>,-113260},
 {<0.21671.0>,-31253},
 {<0.17166.0>,-23081},
 {<0.16754.0>,-20260},
 {<0.5595.0>,-19292},
 {<0.7607.0>,-18639},
 {<0.716.0>,-18184},
 {<0.19981.0>,-17747},
 {<0.21843.0>,-15908}]

I have since added that function to the recon library so that nobody is required to do these calls by hand.

The little data dump above showed that some processes held more than 100,000 stale references to refc binaries, and a lot of them held more than 10,000. This told me that some processes held a lot of binaries, and investigating individual processes revealed they were all drains or buffers of some kind. This was bad news because it meant that the way Logplex is built is more or less playing right into one of the few rare cases where the Erlang GC isn't delivering results on par with what is promised for the general case. The Logplex application, which looked like a perfect match for Erlang, got trapped into an implementation detail that made it a pathological case for the VM.

Picking Up the Trash

Generally, refc binaries memory leaks can be solved in a few different ways:

  • call garbage collection manually at given intervals (icky);
  • manually track binary sizes and force GC, which defeats the purpose of having garbage collection in the first place and may do a worse job than the VM's virtual binary heap;
  • stop using binaries (not desirable);
  • or add hibernation calls when appropriate (possibly the cleanest solution).

I decided to put a quick fix in place, which still lives in production to this day.

The simple module basically loops on itself and at given intervals polls for the reported memory, checks to see whether it goes past a threshold, and if so, garbage collects the node. If required, the module also allows manual calls.

The script worked as expected, and high memory warnings were quickly tamed, logged (with the deltas), and waiting to be inspected later.

The Attempt Fails, Somewhat

After a few weeks looking at the logs and seeing that no incident with the nodes could be related to memory issues, logs showed garbage collections happening as required, and everything looked great. Resorting to emergency measures as the only way to get the node to drop high amounts of memory isn't ideal, however, and we never know how fast a spike in usage will happen. I decided to add a bunch of hibernation calls in non-intrusive locations (inactive drains, or when disconnecting from a remote endpoint), which would allow us to garbage collect globally much less frequently, while keeping memory lower overall when people register mostly inactive drains.

Everything went fine, except that after five weeks, one node crashed despite the fixes being in place. The global garbage collection didn't even get triggered. Looking at logs from the OS and from the Logplex node internally revealed that the OS allocated 15GB of RAM to the Logplex node, while it internally reported using less than half of that. We had a serious memory leak, which was incredibly frustrating.

Second Attempt at Fixing THAT Problem

How Erlang's Memory Stuff Works

At this point, I was hitting the limits of what I knew about the Erlang virtual machine, and I suspected that either memory leaks were going on outside of the memory the VM reported, or the nodes were victim to memory fragmentation. (We're using NIFs for lzf decompression, but the VM could have been at fault, too.)

Not knowing what to do, I contacted Lukas Larsson. A few years back, when I spent my first two weeks at Erlang Solutions Ltd. in London, Lukas was also there for a few days and acted as my guide through the city and company. Since then, Lukas has moved on (internally) to consult for the OTP team at Ericsson, and I moved on to AdGear, and then to Heroku. We still connect occasionally at conferences and over IRC, and Lukas has always helped answer my tricky questions about the Erlang VM.

I asked Lukas how I could pinpoint what was wrong—memory leak or fragmentation—and I showed him some of my collected data. I'm sharing what I learned in the process because, in addition to be interesting, the information is not documented anywhere other than the source.

The amount returned by erlang:memory/0-1 is the amount of memory actively allocated, where Erlang terms are laid in memory; this amount does not represent the amount of memory that the OS has given to the virtual machine (and Linux doesn't actually reserve memory pages until they are used by the VM). To understand where memory goes, one must first understand the many allocators being used:

Allocators map

  1. temp_alloc: does temporary allocations for short use cases (such as data living within a single C function call).
  2. eheap_alloc: heap data, used for things such as the Erlang processes' heaps.
  3. binary_alloc: the allocator used for reference counted binaries (what their 'global heap' is).
  4. ets_alloc: ETS tables store their data in an isolated part of memory that isn't garbage collected, but allocated and deallocated as long as terms are being stored in tables.
  5. driver_alloc: used to store driver data in particular, which doesn't keep drivers that generate Erlang terms from using other allocators. The driver data allocated here contains locks/mutexes, options, Erlang ports, etc.
  6. sl_alloc: short-lived memory blocks will be stored there, and include items such as some of the VM's scheduling information or small buffers used for some data types' handling.
  7. ll_alloc: long-lived allocations will be in there. Examples include Erlang code itself and the atom table, which stay there.
  8. fix_alloc: allocator used for frequently used fixed-size blocks of memory. One example of data used there is the internal processes' C struct, used internally by the VM.
  9. std_alloc: catch-all allocator for whatever didn't fit the previous categories. The process registry for named process is there.

The entire list of where given data types live can be found in the source.

By default, there will be one instance of each allocator per scheduler (and you should have one scheduler per core), plus one instance to be used by linked-in drivers using async threads. This ends up giving you a structure a bit like the drawing above, but split it in N parts at each leaf.

Each of these sub-allocators will request memory from mseg_alloc and sys_alloc depending on the use case, and in two possible ways. The first way is to act as a multiblock carrier (mbcs), which will fetch chunks of memory that will be used for many Erlang terms at once. For each mbc, the VM will set aside a given amount of memory (~8MB by default in our case, which can be configured by tweaking VM options), and each term allocated will be free to go look into the many multiblock carriers to find some decent space in which to reside.

Whenever the item to be allocated is greater than the single block carrier threshold (sbct), the allocator switches this allocation into a single block carrier (sbcs). A single block carrier will request memory directly from mseg_alloc for the first 'mmsbc' entries, and then switch over to sys_alloc and store the term there until it's deallocated.

So looking at something such as the binary allocator, we may end up with something similar to:

binary allocators example

Whenever a multiblock carrier (or the first 'mmsbc' single block carriers) can be reclaimed, mseg_alloc will try to keep it in memory for a while so that the next allocation spike that hits your VM can use pre-allocated memory rather than needing to ask the system for more each time.

When we call erlang:memory(total), what we get isn't the sum of all the memory set aside for all these carriers and whatever mseg_alloc has set aside for future calls, but what actually is being used for Erlang terms (the filled blocks in the drawings above). This information, at least, explained that variations between what the OS reports and what the VM internally reports are to be expected. Now we needed to know why our nodes had such a variation, and whether it really was from a leak.

Fortunately, the Erlang VM allows us to get all of the allocator information by calling:

[{{A, N}, Data} || A <- [temp_alloc, eheap_alloc, binary_alloc, ets_alloc,
                          driver_alloc, sl_alloc, ll_alloc, fix_alloc, std_alloc],
                   {instance, N, Data} <- erlang:system_info({allocator,Allocator})]

The call isn't pretty and the data is worse. In that entire data dump, you will retrieve the data for all allocators, for all kinds of blocks, sizes, and metrics of what to use. I will not dive into the details of each part; instead, refer to the functions I have put inside the recon library that will perform the diagnostics outlined in the next sections of this article.

To figure out whether the Logplex nodes were leaking memory, I had to check that all allocated blocks of memory summed up to something roughly equal to the memory reported by the OS. The function that performs this duty in recon is recon_alloc:memory(allocated). The function will also report what is being actively used (recon_alloc:memory(used)) and the ratio between them (recon_alloc:memory(usage)).

Fortunately for Logplex (and me), the memory allocated matched the memory reported by the OS. This meant that all the memory the program made use of came from Erlang's own term allocators, and that the leak came from C code directly was unlikely.

The next suspected culprit was memory fragmentation. To check out this idea, you can compare the amount of memory consumed by actively allocated blocks in every allocator to the amount of memory attributed to carriers, which can be done by calling recon_alloc:fragmentation(current) for the current values, and recon_alloc:fragmentation(max) for the peak usage.

By looking at the data dumps for these functions (or a similar one), Lukas figured out that binary allocators were our biggest problem. The carrier sizes were large, and their utilization was impressively low: from 3% in the worst case to 24% in the best case. In normal situations, you would expect utilization to be well above 50%. On the other hand, when he looked at the peak usage for these allocators, binary allocators were all above 90% usage.

Lukas drew a conclusion that turned out to match our memory graphs. Whenever the Logplex nodes have a huge spike in binary memory (which correlates with spikes in input, given that we deal with binary data for most of our operations), a bunch of carriers get allocated, giving something like this:

allocators full

Then, when memory gets deallocated, some remnants are kept in Logplex buffers here and there, leading to a much lower rate of utilization, looking similar to this:

allocators leak

The result is a bunch of nearly empty blocks that cannot be freed. The Erlang VM will never do defragmentation, and that memory keeps being hogged by binary data that may take a long time to go away; the data may be buffered for hours or even days, depending on the drain. The next time there is a usage spike, the nodes might need to allocate more into ETS tables or into the eheap_alloc allocator, and most of that memory is no longer free because of all the nearly empty binary blocks.

Fixing this problem is the hard part. You need to know the kind of load your system is under and the kind of memory allocation patterns you have. For example, I knew that 99% of our binaries will be smaller or equal to 10kb, because that's a hard cap we put on line length for log messages. You then need to know the different memory allocation strategies of the Erlang virtual machine:

  1. Best fit (bf)
  2. Address order best fit (aobf)
  3. Address order first fit (aoff)
  4. Address order first fit carrier best fit (aoffcbf)
  5. Address order first fit carrier address order best fit (aoffcaobf)
  6. Good fit (gf)
  7. A fit (af)

alloc examples #1

For best fit (bf), the VM builds a balanced binary tree of all the free blocks' sizes, and will try to find the smallest one that will accommodate the piece of data and allocate it there. In the drawing above, having a piece of data that requires three blocks would likely end in area 3.

Address order best fit (aobf) will work similarly, but the tree instead is based on the addresses of the blocks. So the VM will look for the smallest block available that can accommodate the data, but if many of the same size exist, it will favor picking one that has a lower address. If I have a piece of data that requires three blocks, I'll still likely end up in area 3, but if I need two blocks, this strategy will favor the first mbcs in the diagram above with area 1 (instead of area 5). This could make the VM have a tendency to favor the same carriers for many allocations.

Address order first fit (aoff) will favor the address order for its search, and as soon as a block fits, aoff uses it. Where aobf and bf would both have picked area 3 to allocate four blocks, this one will get area 2 as a first priority given its address is lowest. In the diagram below, if we were to allocate four blocks, we'd favor block 1 to block 3 because its address is lower, whereas bf would have picked either 3 or 4, and aobf would have picked 3.

alloc examples #2

Address order first fit carrier best fit (aoffcbf) is a strategy that will first favor a carrier that can accommodate the size and then look for the best fit within that one. So if we were to allocate two blocks in the diagram above, bf and aobf would both favor block 5, aoff would pick block 1. aoffcbf would pick area 2, because the first mbcs can accommodate it fine, and area 2 fits it better than area 1.

Address order first fit carrier address order best fit (aoffcaobf) will be similar to aoffcbf, but if multiple areas within a carrier have the same size, it will favor the one with the smallest address between the two rather than leaving it unspecified.

Good fit (gf) is a different kind of allocator; it will try to work like best fit (bf), but will only search for a limited amount of time. If it doesn't find a perfect fit there and then, it will pick the best one encountered so far. The value is configurable through the mbsd VM argument.

A fit (af), finally, is an allocator behavior for temporary data that looks for a single existing memory block, and if the data can fit, af uses it. If the data can't fit, af allocates a new one.

Each of these strategies can be applied individually to every kind of allocator, so that the heap allocator and the binary allocator do not necessarily share the same strategy.

The Memory Is Coming from Inside the House

Lukas recommended we go with the address order best fit strategy (aobf), along with a reduction in the size of our average mbcs for the binary allocator. With this strategy, we used more CPU to pick where data would go, but the VM hopefully would favor existing free blocks in more cases, meaning that we would have much fewer near-empty mbcs sitting around after a usage spike.

I enabled these settings for a few nodes in production, and then I waited. The problem with these settings is that failures could take up to five weeks to show up in regular nodes when we have multiple dozens of them, and then slowly ramp up in frequency. Measuring the success of the experiments I put in production took an excessively long time, during which the nodes used more CPU to do their thing. After three or four weeks without a crash, I decided to push the experiment further. I pushed the change in production to all nodes, but for these options to kick in, each node needed to be restarted. Usually we use Erlang's hot code loading features to deploy software without terminating a single connection. Instead of restarting the nodes, I waited for them to crash, which took a few weeks. Until the crash, roughly 25% of the cluster was running with the new memory allocation options, and 75% ran with the old ones. The first few nodes passed the time when issues cropped up, and seemed stable.

All the new memory-related crashes were happening on the older nodes, and the now 30-35% new nodes never seemed to crash. I considered the experiment successful, while still knowing there was a non-negligible probability that age was the sole reason the older nodes, but not the new ones, kept dying. Eventually, new Erlang versions came out and I gave it a final push with a cluster roll over the course of a few hours. All production Logplex nodes are now running the newest stable Erlang version with the tweaked memory allocation settings.

Problem: Solved

After a few weeks (or months, depending on the age of the node verified), I found out that in practice the reduction isn't perfect; fragmentation is still occurring, but we see an improvement. Whereas most binary allocators that saw a significant amount of usage before the fixes would have usage rates between 3% and 25%, the new nodes seeing a significant amount of usage tend to have at least 35% to 40% usage, with some of them having well above 90% across the cluster. This more than doubles our efficiency in memory and usage high enough that we have yet to lose a VM without first being able to trigger a global garbage collection call. In fact, we haven't lost memory due to Out-Of-Memory errors that were directly attributable to drains and how Logplex is built. This doesn't mean the cluster components never fail anymore; I still see failures resulting from bad load-balancing between the nodes or connections and services going down that we incorrectly built around (and are now remediating).

I also saw cases in which the nodes would not fail, but have degraded quality of service. The most glaring example of this was some bad behavior from some nodes under heavy load, with high drop rates in Logplex messages to be delivered to users.

Nodes Are Blocking and See Poor Performance

We Solve This One Nice and Easy

The Structure of IO on a Node

During the investigation, our routing team learned that one of the processes on the Erlang node that tended to be at risk was the user process. Erlang's IO is based around the idea of "group leaders." A group leader is a process that will take charge of handling IO and forwarding it around. I already went through why this is useful for Erlang's shell to work, so I will focus on the state of affairs.

Every OTP application running on the VM has an application master, a process that acts as a secret top-level supervisor for each application. On top of the VM, there's a process named user (the at-risk process) that handles all of the IO and is in charge of the standard I/O ports. In between, a varying number of processes, depending on how you're interacting with the VM, may or may not be there. Moreover, every IO call is synchronous between the IO Client (the process that calls io:format/2) and the IO Server (the application master, or ultimately, user).

In a default VM, logging with regular IO calls results in the following structure:

basic io

or, if your commands come from the Erlang shell:

shell io

For our log volume, we made the reasonable optimization of telling io:format to communicate directly with the user process (by calling io:format(user, Fmt, Args)), which removed middlemen and allowed faster communication with less overhead. At peak times, if the VM (or the OS, given we're in the cloud) hiccuped, many connections could time out at once. And if we logged all of these events, we'd get a storm in which the individual processes would block by waiting for the confirmation of messages, which also created a memory bubble. This meant that the logging we wanted was the source of the problems we had. With more waiting, the buffers would accumulate messages in drains that couldn't be sent in time before filling up, resulting in dropped messages.

Buffering Up

First we replaced all of our logging calls with the lager library, which is a fantastic library that I highly recommend. The lager library uses a middleman process, but this middleman acts as a buffer on its own and allows all communication to be asynchronous, up until a certain point. Switching to lager worked well, except that on a few nodes, occassionally (and unpredictably), IO slowed to a crawl, and lager would have thousands of messages backlogged. When that happens, IO switched in synchronous mode to avoid going out of memory. This is entirely sane behavior, but for Logplex it meant that critical paths in our code (for example, the TCP Syslog drains, which should never block) would suddenly lock up, endangering the node in other ways because there was now a risk of overflowing the mailboxes of additional processes.

Going synchronous simply shifted the danger around, which was trouble. I saw two options: to log less, which would temporarily solve the problem at the cost of visibility; or to try buffering and batching log messages asynchronously, which could let us keep all the logs, but wasn't sure to work. Batching has long been a recommended solution for throughput issues. Raising latency a bit leads to better resource usage by regrouping similar operations together and getting more throughput. I decided to try the buffering option because implementing it is not excessively complex, and it promised a quick fix that would be a win in terms of speed and data logged. The idea was to take all the log messages, send them to a middleman buffer process pair that would accumulate them (and optionally drop them), merge the log strings into larger pages, and send these larger pages to the user process.

I would be replacing the earlier diagram describing the IO flow by:

batchio io

If the root cause of the problem was the message passing overhead, this solution could work. If the problem was the total bandwidth of logs Logplex was producing, little could be done to help.

Problem Solved

The solution worked. I promptly released the BatchIO library, which contains the buffer process pair and wraps it behind a glorified io:format/2 call.

By batching operations, there's an estimated reduction in the number of messages sent across the VM for IO of as much as 75% (according to a back-of-the-envelope calculation of mine) without decreasing the amount of data logged. In case of overflow, the library will drop messages instead of blocking.

So far, no messages have needed to be dropped, even though we moved to a fully asynchronous model. We've also kept lager active on the node for error logs, as opposed to the default Erlang handlers. BatchIO couldn't handle error logs; lager is better at handling error logs than the default error handlers on the node, and it will prevent overload in a better way for them.

Denouement

Metrics and Stuff

Through this months long debugging session, the routing team has gained much higher time between individual component failures. This means fewer interruptions for us, fewer silently lost logs for our users, and additional bandwidth to let Heroku focus on more pressing non-operational issues.

The improvements gleaned from this project went far further than the direct reliability of the system; additional system performance improvements came at the right time as demand increased, and the end results were satisfying. Over the course of the project, the number of log messages in transit through Logplex increased by nearly 50%, whereas the number of messages dropped was reduced by an order of magnitude during the same time period.

As a result of our efforts, I also have released a few libraries to the Erlang community:

  • Recon: Contains scripts and modules to help diagnose Erlang issues, including scripts to deal with crash dumps, memory allocation diagnostics, and other general debugging functions that can be safely used in production.
  • BatchIO: Our IO buffering library. Although this is not as good as a full logging library, I hope it can be used as a model to help with existing logging libraries, if they were to offer alternatives to going synchronous when overloaded.
  • POBox: A generalization of the Logplex buffers for HTTP(s) drains designed by Geoff Cant, the technical leader of the routing team. This library is used as the core of BatchIO, and we've started using it in other internal projects that require batching or load-shedding.

I still plan to work on debugging in Logplex. For example, I'm in the process of helping Logplex be more fault-tolerant to external services failure. And although overload can still take nodes down, at least it now takes more load to overload Logplex.

Conclusion

Every project that lives in production for a while and requires scaling up ends up having weird, complicated issues. They're issues that cannot be attributed to one root cause, are difficult to reproduce, and that nobody else in the community can encounter or solve for you. These issues crop up no matter what language, stack, framework, or hardware you have, or how correct you think your software is.

These kinds of production problems are the modern version of bridges collapsing after too many people cross at once, creating heavy mechanical resonance. Although most modern bridges are able to handle these issues safely, past failures (and even newer failures) led to safer structures. Unlike bridge design, software engineering is still a young field. Hopefully this article offers a glimpse of the daily work Heroku engineers perform, and sharing our experience will be helpful to the greater development community.

$1 Million Hack – $99 pass FREE for a limited time

As we previously announced, salesforce.com is hosting a $1 Million Hackathon for the most awesome mobile app built using Salesforce Platform, which includes Heroku. It's taking place now and culminates at Dreamforce in San Francisco.

For a limited time we are making it FREE ($99 value) to participate in the Salesforce $1 Million Hackathon. Sign up for the Hacker Pass and you will get access to the Hackathon plus all the great content and activities in the Developer Zone at Dreamforce.

Additionally, the first 500 people to use the promo code HEROKU when registering will receive a $200 Heroku credit on-site at the Hackathon.

Get your FREE Hacker Pass and $200 credit when you register with code HEROKU for the Salesforce $1 Million Hackathon today!

HTML5 and CSS3: Level Up with Today’s Web Technologies 2nd Ed. in print, November PragPub on sale

HTML5 and CSS3: Level Up with Today’s Web Technologies 2nd Ed. in print, November PragPub on sale