<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>app</title>
  <id>http://127.0.0.1</id>
  <updated>2011-03-26T00:00:00Z</updated>
  <author>
    <name>Santosh Kumar</name>
  </author>
  <entry>
    <title>Understanding the this reference in Javascript</title>
    <link href="http://127.0.0.1/2012/09/15/understanding-the-this-reference-in-javascript/" rel="alternate"/>
    <id>http://127.0.0.1/2012/09/15/understanding-the-this-reference-in-javascript/</id>
    <published>2012-09-15T00:00:00Z</published>
    <updated>2012-09-15T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;The &lt;em&gt;this&lt;/em&gt; reference in Javascript is probably one of the most confusing scoping concepts for people coming new to the language. Understanding, the &lt;em&gt;this&lt;/em&gt; reference lets you unlock the mysteries of Object Oriented Javascript and gain a better/more intuitive understanding of how the object model is structured&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;The &lt;em&gt;this&lt;/em&gt; reference in Javascript is probably one of the most confusing scoping concepts for people coming new to the language. Understanding, the &lt;em&gt;this&lt;/em&gt; reference lets you unlock the mysteries of Object Oriented Javascript and gain a better/more intuitive understanding of how the object model is structured.&lt;/p&gt;

&lt;h3&gt;Explicitly setting the this reference&lt;/h3&gt;

&lt;p&gt;Here&amp;rsquo;s a little code to get us going&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function setVar(arg) {
  this.var = arg;
}
var obj = Object.create(null);
obj.setVar = setVar;
obj.setVar('hello');
console.log(obj.var); // =&amp;gt; 'hello'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What&amp;rsquo;s going on here?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- We are declaring a function called &amp;ldquo;setVar&amp;rdquo; that sets the &amp;ldquo;var&amp;rdquo; property of whatever the &amp;ldquo;this&amp;rdquo; reference is pointing to, to the argument it receives&lt;/li&gt;
&lt;li&gt;- We then construct an object called &amp;ldquo;obj&amp;rdquo;, assign it&amp;rsquo;s &amp;ldquo;setVar&amp;rdquo; property to the setVar function&lt;/li&gt;
&lt;li&gt;- Finally we call the setVar function &lt;em&gt;via&lt;/em&gt; the obj object&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Executing this code, you&amp;rsquo;ll see that the &lt;em&gt;obj&lt;/em&gt; object does now have a &lt;em&gt;var&lt;/em&gt; property set to &amp;lsquo;hello&amp;rsquo;. Even though, no where in our code we are doing:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;obj.var = 'hello';
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;How is this? When you call &lt;strong&gt;object.method()&lt;/strong&gt; the &lt;em&gt;this&lt;/em&gt; reference is set to &lt;em&gt;object&lt;/em&gt; within the confines of &lt;em&gt;method&lt;/em&gt;. In the above case, since we called &lt;em&gt;obj.setVar&lt;/em&gt; the &lt;em&gt;this&lt;/em&gt; reference was set to point to &lt;em&gt;obj&lt;/em&gt; within the confines of the setVar method. And since, setVar adds a property called var to whatever it&amp;rsquo;s &lt;em&gt;this&lt;/em&gt; is, obj was gifted with the &lt;em&gt;var&lt;/em&gt; property.&lt;/p&gt;

&lt;h3&gt;Setting the this reference using &amp;ldquo;call&amp;rdquo; or &amp;ldquo;apply&amp;rdquo;&lt;/h3&gt;

&lt;p&gt;Javascript let&amp;rsquo;s you specify what you would like the &lt;em&gt;this&lt;/em&gt; reference to be when calling a function, by using either the &lt;em&gt;call()&lt;/em&gt; or &lt;em&gt;apply()&lt;/em&gt; functions. The format of &lt;em&gt;call()&lt;/em&gt; is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SomeFunction.call(the_this_object, *arguments)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Rewriting the previous example to use &lt;em&gt;call()&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function setVar(arg) {
  this.var = 'hello';
}
var obj = Object.create(null);
setVar.call(obj, 'hello');
console.log(obj.var); // =&amp;gt; 'hello'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see, this feels a little more flexible and is sometimes better than the &lt;em&gt;object.method()&lt;/em&gt; approach. The &lt;em&gt;apply()&lt;/em&gt; function is identical to &lt;em&gt;call()&lt;/em&gt; with the only difference between that the last argument is an array instead of your arguments being laid out comma seperated. If we were to use apply() instead it would be:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;setVar.apply(obj, ['hello']);
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Capturing the this reference&lt;/h3&gt;

&lt;p&gt;You might often see code that does:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var that = this;
....
Some code follows
        ....
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And you might wonder why are they doing this. Inside every function call, the &lt;em&gt;this&lt;/em&gt; reference is reset. So for example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function outer() {
  this.foo = 'blah';
  function inner() {
    console.log(this.foo);
  }
  inner();
}
var obj = Object.create(null);
outer.call(obj);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Running this, results in &lt;em&gt;undefined&lt;/em&gt; being logged. Why is that? The &lt;em&gt;this&lt;/em&gt; reference in outer is set to &lt;em&gt;obj&lt;/em&gt; since we are explicitly doing this by the &lt;em&gt;call()&lt;/em&gt; function. However, in &lt;em&gt;inner()&lt;/em&gt; the &lt;em&gt;this&lt;/em&gt; reference points to a different object, which is why logging the &lt;em&gt;foo&lt;/em&gt; property of &lt;em&gt;this&lt;/em&gt; in &lt;em&gt;inner()&lt;/em&gt; resulted in undefined.&lt;/p&gt;

&lt;p&gt;How do we fix this? One way is to capture the &lt;em&gt;this&lt;/em&gt; reference and use it.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function outer() {
  var that = this;
  that.foo = 'blah';
  function inner() {
    console.log(that.foo);
  }
  inner();
}
var obj = Object.create(null);
outer.call(obj);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Another way, would&amp;rsquo;ve been to call &lt;em&gt;inner()&lt;/em&gt; using the &lt;em&gt;call()&lt;/em&gt; function passing the &lt;em&gt;this&lt;/em&gt; of outer to it. Personally, I think this is more readable.&lt;/p&gt;

&lt;h3&gt;Using the bind function&lt;/h3&gt;

&lt;p&gt;Javascript also let&amp;rsquo;s you &lt;em&gt;bind&lt;/em&gt; the this reference to a function, so all future invocations of that function will use the object you had initially bound the function to. This is useful when you don&amp;rsquo;t want to keep remembering to use the &lt;em&gt;call()&lt;/em&gt; function. An example of &lt;em&gt;bind()&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function setFoo() {
  this.foo = 'bar';
}
var obj = Object.create(null);
var boundSetFoo = setFoo.bind(obj);
boundSetFoo();
console.log(obj.foo); // 'bar'
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here, we are creating a bounded reference to the &lt;em&gt;setFoo()&lt;/em&gt; function called &lt;em&gt;boundSetFoo&lt;/em&gt; with the object &lt;em&gt;obj&lt;/em&gt; being the &lt;em&gt;this&lt;/em&gt;. This way of capturing &lt;em&gt;this&lt;/em&gt; is particularly helpful when working with events and event handlers.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;function eventHandler(evt) {
  if(!this.eventsReceived) {
    this.eventsReceived = [];
  }
  this.eventsReceived.push(evt);
  console.log('event is %s', evt);
}
var evtEmitter = Object.create(events.EventEmitter.prototype);

var obj = Object.create(null);
evtEmitter.on('some_event', eventHandler.bind(obj));

var someOtherObj = Object.create(null);
evtEmitter.on('some_event', eventHandler.bind(someOtherObj));

obj.emit('some_event', 'hi there');
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we have two objects &lt;em&gt;obj&lt;/em&gt; and &lt;em&gt;someOtherObj&lt;/em&gt; both of which have subscribed to the &lt;em&gt;some_event&lt;/em&gt; event on the &lt;em&gt;evtEmitter&lt;/em&gt; object. Notice, how we reuse the &lt;em&gt;eventHandler&lt;/em&gt; function but tweak it to bind to the right object just when we need it to. Neat, heh?&lt;/p&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;

&lt;p&gt;Javascript is super malleable, letting you bend it to your will. This malleability lets you come up with incredibly elegant solutions for seemingly intractable problems. But this malleability means, it&amp;rsquo;s pretty easy to write poor code too.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Using the watch command to maintain consistency of your Redis dataset</title>
    <link href="http://127.0.0.1/2012/08/09/using-the-watch-command-to-maintain-consistency-of-your-redis-dataset/" rel="alternate"/>
    <id>http://127.0.0.1/2012/08/09/using-the-watch-command-to-maintain-consistency-of-your-redis-dataset/</id>
    <published>2012-08-09T00:00:00Z</published>
    <updated>2012-08-09T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;As you start using &lt;a href="http://redis.io"&gt;Redis&lt;/a&gt; more, you soon find yourself delving into &lt;a href="http://redis.io/topics/transactions"&gt;redis' transactions&lt;/a&gt;. A traditional RDBMS' view of CAS (compare-and-set) transactions is:&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;As you start using &lt;a href="http://redis.io"&gt;Redis&lt;/a&gt; more, you soon find yourself delving into &lt;a href="http://redis.io/topics/transactions"&gt;redis' transactions&lt;/a&gt;. A traditional RDBMS' view of CAS (compare-and-set) transactions is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- Lock down the entire database to writes, allowing just the one connection in the transaction to write&lt;/li&gt;
&lt;li&gt;- Perform a query to figure out which things need to change (Compare Step)&lt;/li&gt;
&lt;li&gt;- Change those things (Set step)&lt;/li&gt;
&lt;li&gt;- Release the lock (this happens automaticaly as part of your transaction completing)&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Redis, on the other hand uses &lt;a href="http://en.wikipedia.org/wiki/Optimistic_locking"&gt;Optimistic Locking&lt;/a&gt; which makes CAS transactions look like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- Start tracking stuff you think could change while you are in your transaction&lt;/li&gt;
&lt;li&gt;- Perform a query to figure out which things need to change (Compare Step)&lt;/li&gt;
&lt;li&gt;- Execute the transaction to change those things (Conditional Set)&lt;/li&gt;
&lt;li&gt;- Check to see if the transaction completed successfully&lt;/li&gt;
&lt;li&gt;- Repeat from step 1 to re-run transaction or just abort&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Step 1, where you start &lt;em&gt;tracking&lt;/em&gt; stuff prior to doing anything is where &lt;em&gt;Optimistic Locking&lt;/em&gt; and the more traditional &lt;em&gt;Pessimistic Locking&lt;/em&gt; diverge in a pretty big way.&lt;/p&gt;

&lt;h3&gt;Optimistic Locking&lt;/h3&gt;

&lt;p&gt;The general idea behind optimistic locking is that you need to know &lt;strong&gt;before hand&lt;/strong&gt; what you think might change while you are perform a transaction and watch out for that. Pessimistic locking on the other hand, is a more heavy handed approach where you don&amp;rsquo;t want &lt;em&gt;anything&lt;/em&gt; to change while you are in the middle of a transaction. Pessimistic locking, as you might&amp;rsquo;ve guessed, is more punishing on performance and for &lt;em&gt;write heavy&lt;/em&gt; datastores like Redis that need to maintain high performance, it just is not an option. The downside with optimistic locking, though is that more of the heavy lifting falls on the engineer, who needs to put in a little more thought while dealing with transactions. Redis ships with a &lt;a href="http://redis.io/commands/watch"&gt;watch command&lt;/a&gt; that lets you specify what keys you want to keep an eye on, prior to running a multi-exec transaction.&lt;/p&gt;

&lt;h3&gt;An example&lt;/h3&gt;

&lt;p&gt;All this sounds great, but nothing beats a real-world example to see how to work with this and why it might be harder than you think. Recently, I was working on a task that required me to do just this &amp;mdash; when a user logs into our app, figure all of that users facebook friends who are logged in, in our app and send that over.&lt;/p&gt;

&lt;h4&gt;The setup&lt;/h4&gt;

&lt;p&gt;The data in our app is structured in the following format:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;- Users are hashes of the format &amp;ldquo;user|&amp;lt;user-id&gt;&amp;rdquo;&lt;/li&gt;
&lt;li&gt;- All currently logged-in users have their facebook ID&amp;rsquo;s stored in a &amp;ldquo;logged_in_fb_ids&amp;rdquo; set&lt;/li&gt;
&lt;li&gt;- There is a facebook ID to user ID reverse look up hash map &amp;ldquo;fb_id_to_user_id_hash&amp;rdquo;&lt;/li&gt;
&lt;li&gt;- Every user has a set of facebook friends ids that contains facebook id&amp;rsquo;s of people they are friends with on facebook &amp;mdash; &amp;ldquo;fb_friend_ids_for_user|&lt;user-id&gt;&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;So finding the name of someone given their facebook id would look something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;fb_id = 123456
user_id = redis.hget("fb_id_to_user_id_hash", fb_id)
user_name = redis.hget("user|#{user_id}", "name")
&lt;/code&gt;&lt;/pre&gt;

&lt;h4&gt;First stab&lt;/h4&gt;

&lt;p&gt;Given the above structure of data in the redis dataset a first shot at fetching all of the user_ids for all of my facebook friends who are currently logged in the app right now, might look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def fetch_loggedin_fb_friends_for_user(user_id)
  fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")
  user_ids_of_my_fb_friends = redis.multi do |multi|
    fb_ids_of_my_friends_who_are_loggedin.each do |fb_id|
      multi.hget("fb_id_to_user_id_hash", fb_id)
    end
  end
  user_ids_of_my_fb_friends
end
&lt;/code&gt;&lt;/pre&gt;

&lt;h4&gt;Race condition&lt;/h4&gt;

&lt;p&gt;The race condition with this approach is that between the time I figure out the facebook id&amp;rsquo;s of my facebook friends who are currently logged in to my app:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And then figuring out the user_id&amp;rsquo;s for those facebook id&amp;rsquo;s, a user could have logged off and I&amp;rsquo;d be incorrectly reporting someone as being logged on when they are not. Granted, in this case, displaying stale data is not a big deal but there is an easy fix.&lt;/p&gt;

&lt;p&gt;What I really want is between the between the time I start querying &lt;em&gt;logged_in_fb_ids&lt;/em&gt;, to the time I compute the user_ids for those people, &lt;em&gt;no changes should have happened in the logged_in_fb_ids&lt;/em&gt; set. If no changes, were made to this set while I was running my transaction I know that the user_ids_of_my_fb_friends data is accurate.&lt;/p&gt;

&lt;h4&gt;Fix&lt;/h4&gt;

&lt;p&gt;The fix in this case, is as simple as &lt;strong&gt;watch'ing&lt;/strong&gt; the logged_in_fb_ids set before kicking off the multi-exec transaction, and then re-running the transaction if something changed.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def fetch_loggedin_fb_friends_for_user(user_id)
  redis.watch("logged_in_fb_ids")
  fb_ids_of_my_friends_who_are_loggedin = redis.sinter("fb_friend_ids_for_user|#{user_id}", "logged_in_fb_ids")
  user_ids_of_my_fb_friends = redis.multi do |multi|
    fb_ids_of_my_friends_who_are_loggedin.each do |fb_id|
      multi.hget("fb_id_to_user_id_hash", fb_id)
    end
  end
  user_ids_of_my_fb_friends
end

failed_cnt = 0
RETRY = 3
my_loggedin_fb_friends = fetch_loggedin_fb_friends_for_user(my_user_id)
while failed_cnt &amp;lt; RETRY &amp;amp;&amp;amp; !my_loggedin_fb_friends
  my_loggedin_fb_friends = fetch_loggedin_fb_friends_for_user(my_user_id)
end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here, we&amp;rsquo;ve chosen to retry 3 times, someone else might choose to abort the transaction. Your strategy for handling failed CAS transactions would depend on the situation.&lt;/p&gt;

&lt;h3&gt;TLDR;&lt;/h3&gt;

&lt;p&gt;Redis' &lt;a href="http://redis.io/commands/watch"&gt;watch command&lt;/a&gt; is what you should be using for CAS transactions.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Fight against Software Complexity</title>
    <link href="http://127.0.0.1/2012/05/20/fight-against-software-complexity/" rel="alternate"/>
    <id>http://127.0.0.1/2012/05/20/fight-against-software-complexity/</id>
    <published>2012-05-20T00:00:00Z</published>
    <updated>2012-05-20T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;Why does software &lt;em&gt;in the industry&lt;/em&gt;  suck so much? Having worked at couple of startups and big companies as well, I can quite honestly say that of all the projects/repos that I had worked on, there was but one that didn&amp;rsquo;t suck. The one that didn&amp;rsquo;t suck, was I&amp;rsquo;m sure in no small part due to the fact that it was written by just one person, the CTO of the company, and since it was a license-based SDK, there was a vested interest in keeping things organized. Open source software projects, when compared with industry projects, are way easier to understand and in much better shape&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;Why does software &lt;em&gt;in the industry&lt;/em&gt;  suck so much? Having worked at couple of startups and big companies as well, I can quite honestly say that of all the projects/repos that I had worked on, there was but one that didn&amp;rsquo;t suck. The one that didn&amp;rsquo;t suck, was I&amp;rsquo;m sure in no small part due to the fact that it was written by just one person, the CTO of the company, and since it was a license-based SDK, there was a vested interest in keeping things organized. Open source software projects, when compared with industry projects, are way easier to understand and in much better shape.&lt;/p&gt;

&lt;p&gt;Why is that? It&amp;rsquo;s my belief that there are a multitude of organizational factors, that come into play, that will connive to make your company&amp;rsquo;s project complexity shoot through the roof, if you &amp;lsquo;meh&amp;rsquo; the fight against complexity. And no, I&amp;rsquo;m not going to delve too deeply micro things such as variable names, TDD, refactoring etc. Those things work great in the micro, but for being able to enfore them in the micro, you need some &lt;strong&gt;macro&lt;/strong&gt; things to be in place first. It&amp;rsquo;s those macro things that I&amp;rsquo;d like to take a step back and rant about.&lt;/p&gt;

&lt;h3&gt;Fight against complexity begins from the top-down&lt;/h3&gt;

&lt;p&gt;The CTO of your software company, is probably the most important hire you can make. That person, should:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;a) Know the technology stack you are going to be using, like 
    the back of their hand.
b) Have had experience managing projects and watching them 
    fall under the weight of their own complexity.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It&amp;rsquo;s really important, that when you hand off the tech responsibility for a project/product to someone, that they have complexity really high on their list of things to watch out for. This is the &lt;strong&gt;foundation&lt;/strong&gt; on which your software is being built, so  this person better be someone who is paranoid about complexity and will not stand for it.&lt;/p&gt;

&lt;h3&gt;Hiring process&lt;/h3&gt;

&lt;p&gt;When hiring engineers, the focus should be on one thing and one thing only &amp;mdash; &lt;strong&gt;code clarity&lt;/strong&gt;. No eff'ing puzzles, gotchas, any other crap. Give them a simple problem that requires them to model couple of objects/compose functions if it&amp;rsquo;s a functional language, and have them write up a program. Look at how readable this is. Someone who takes their craft seriously, almost always keeps things dead simple. They choose variable names that make sense. They are anal about indenting their code. All these things count &lt;strong&gt;a lot&lt;/strong&gt;. Way more than, some crazy probability question that you think is going to suss out the geniuses. You don&amp;rsquo;t need geniuses &amp;ndash; leave them be in their university labs. You need someone who can write clear, well thought out code. If they have a Github/other open source projects, take time to read through them to get a sense of their skill. Bottom line, it&amp;rsquo;s really easy to open the door to someone and have them shit all over your code-base, if you are not careful.&lt;/p&gt;

&lt;h3&gt;Team size&lt;/h3&gt;

&lt;p&gt;I&amp;rsquo;ve seen this time and again, the larger the team gets that is working on a single repo, the more crappy it becomes. And it&amp;rsquo;s an exponential curve. If you are a startup, you probably are a little under the gun by some VC hot-shots to hire more, and &lt;em&gt;look big&lt;/em&gt;. Screw that, keep your head count small. Hire only the people you &lt;strong&gt;really need&lt;/strong&gt;. If Instagram was able to get by with 7 people (not just engineers), you probably can as well. The more an engineer has a feeling of ownership over a repo/project, the more likely they are going to be to take time and do things right. I know there are some obvious cases to be made against, having just one person/few people work on a repo. Believe me, if you have been anal about your hiring process, they probably have kept things super organized in that repo, thereby making a transition to a new engineer almost trivial. The new engineer, would probably also have a really great first impression of working with a clean and well organized repo. This is crucial, since that new person would probably think twice about screwing things up when they have been given a clean well maintained repo.&lt;/p&gt;

&lt;h3&gt;Choice of language&lt;/h3&gt;

&lt;p&gt;I&amp;rsquo;m not interested in starting a flame war here &amp;mdash; &lt;em&gt;Coffeescript rocks&lt;/em&gt; or &lt;em&gt;Ruby on Rails still rules&lt;/em&gt; or &lt;em&gt;Python is just great&lt;/em&gt; or &lt;em&gt;Clojure is awesome&lt;/em&gt; whatever.. My suggestion is simple, the higher the level of the programming language you choose when prototyping your app, the less complex it&amp;rsquo;s going to be. I personally, have a preference for Ruby for it&amp;rsquo;s clarity &amp;mdash; but you should choose the highest level language most people on your team know. Good engineers take their tools very seriously, so the better the tools you are using the better the quality of engineers you are going to find. As you find speed becoming an issue, instrument first to confirm out if it&amp;rsquo;s the language, and then switch over to something a little lower in the stack. It&amp;rsquo;s way easier going from a higher to lower level in the programming stack.&lt;/p&gt;

&lt;h3&gt;Development speed&lt;/h3&gt;

&lt;p&gt;Finally, it&amp;rsquo;s often really tempting to give into the &lt;strong&gt;mad dash&lt;/strong&gt; mode of software development. Where all you are doing, is just cranking out code and hoping it sticks. Given a choice, between adding a ton of features in one insane sprint, just to get ready for usability testing &amp;mdash; think about cutting down some of the excess feature fat and focus on just the core feature set. It&amp;rsquo;s a helluva lot easier to add features to a simple repo, than to one that is a rat&amp;rsquo;s nest. All of the technical debt accrued in the mad dash sprint, is going to slow you down sooner than you think and start grumbles in your engineering team.&lt;/p&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;

&lt;p&gt;I think the complexity creep can be nipped by a) being aware that it exists and b) taking some common sense approaches to fight it.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Memory leaks during Event emission</title>
    <link href="http://127.0.0.1/2012/03/06/memory-leaks-during-event-emission/" rel="alternate"/>
    <id>http://127.0.0.1/2012/03/06/memory-leaks-during-event-emission/</id>
    <published>2012-03-06T00:00:00Z</published>
    <updated>2012-03-06T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;Javascript has garbage collection, so there&amp;rsquo;s no way we can leak memory right? Wrong. Memory leaks are pretty easy to create in Javascript much like in any other language that has &lt;em&gt;garbage collection&lt;/em&gt;. The problem isn&amp;rsquo;t that bad when running Javascript on the client &amp;ndash; unless you are having a blatantly obvious memory leak that kills your browser everytime the page loads. Writing Javascript on the server on the other hand, means you really have to focus on making sure you are not leaking any memory since your server process is going to be pretty long running&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;Javascript has garbage collection, so there&amp;rsquo;s no way we can leak memory right? Wrong. Memory leaks are pretty easy to create in Javascript much like in any other language that has &lt;em&gt;garbage collection&lt;/em&gt;. The problem isn&amp;rsquo;t that bad when running Javascript on the client &amp;ndash; unless you are having a blatantly obvious memory leak that kills your browser everytime the page loads. Writing Javascript on the server on the other hand, means you really have to focus on making sure you are not leaking any memory since your server process is going to be pretty long running.&lt;/p&gt;

&lt;h3&gt;Creating the leak&lt;/h3&gt;

&lt;p&gt;Here&amp;rsquo;s a little javascript snippet:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var events = require('events');
var emitter = Object.create(events.EventEmitter.prototype);

var i = 0;
(function createObject() {
    var temp_func = function() { console.log('hi'); };
    emitter.on('some_event', temp_func);
    if(++i &amp;lt; 10) {
        process.nextTick(createObject);
    }
}());
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, when you run this it should be just fine, but believe me it leaks memory. The line with the leak is:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;emitter.on('some_event', temp_func);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Basically, this line grabs the function object &lt;em&gt;temp_func&lt;/em&gt; and pushes it onto an array stored somewhere deep in the bowels of the emitter object. What this means, is that the &lt;em&gt;temp_func&lt;/em&gt; object is never going to be garbage collected unless the &lt;em&gt;emitter&lt;/em&gt; object is and is the only one having a reference to it. Now you might think, temp_func is a pretty small function but remember, it&amp;rsquo;s also storing a closure to &lt;em&gt;every&lt;/em&gt; var declared in &lt;em&gt;every function&lt;/em&gt; it&amp;rsquo;s nested in. This is fine, if that&amp;rsquo;s what you want. Often times though, you find your object&amp;rsquo;s have a &lt;em&gt;lifespan&lt;/em&gt; after which they really needn&amp;rsquo;t be subscribed to events. In which case, you&amp;rsquo;ll find your unneeded objects still hanging around eating memory. If you know when to unsubscribe from listening to an event, it&amp;rsquo;s as simple as:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;emitter.removeListener('some_event', temp_func);
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;

&lt;p&gt;Anytime you find yourself subscribing to events, try figuring out when to unsubscribe at that same time. Remember, &lt;strong&gt;removeListener&lt;/strong&gt; is your friend.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Custom events in NodeJS using prototypal OOP</title>
    <link href="http://127.0.0.1/2012/02/07/custom-events-in-nodejs-using-prototypal-oop/" rel="alternate"/>
    <id>http://127.0.0.1/2012/02/07/custom-events-in-nodejs-using-prototypal-oop/</id>
    <published>2012-02-07T00:00:00Z</published>
    <updated>2012-02-07T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;I&amp;rsquo;ve been playing around recently with NodeJS and have found the sheer performance to be insane. Apparently it &lt;a href="http://venturebeat.com/2011/08/16/linkedin-node/"&gt;isn&amp;rsquo;t just me&lt;/a&gt; and a lot of other places have been able to significantly reduce their bloated infrastructures down by a lot by making the switch&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;I&amp;rsquo;ve been playing around recently with NodeJS and have found the sheer performance to be insane. Apparently it &lt;a href="http://venturebeat.com/2011/08/16/linkedin-node/"&gt;isn&amp;rsquo;t just me&lt;/a&gt; and a lot of other places have been able to significantly reduce their bloated infrastructures down by a lot by making the switch.&lt;/p&gt;

&lt;p&gt;Anyways, the point of this post is not to sell NodeJS, but rather to show how to use it&amp;rsquo;s event&amp;rsquo;s API. I confess, I&amp;rsquo;m not a fan of the &lt;strong&gt;Pseudo Classical OOP&lt;/strong&gt; pattern in Javascript. It just doesn&amp;rsquo;t feel right, and prototypal OOP works and feels right. Basically, if you are doing something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var honda = new Car();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You are doing the pseudo classical way. If you are doing something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var honda = car.create();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You are doing prototypal inheritance and you rock!&lt;/p&gt;

&lt;p&gt;That said, trying to figure out how in the hell do I emit events in NodeJS sent me on a crazy wild goose chase, &lt;a href="http://stackoverflow.com/questions/6892428/node-js-best-method-for-emitting-events-from-modules"&gt;look at this Stack Overflow article&lt;/a&gt;, since they all were requiring the use of the hated &lt;strong&gt;new&lt;/strong&gt; keyword. Instead, just do the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;var events = require('events');
var objectWithEvents = Object.create(new events.EventEmitter);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And then you can do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;objectWithEvents.on("yay", function(){
  console.log("Feign enthusiasm!");
});
objectWithEvents.emit("yay");
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Hope this helps and yea stick with Prototypal Inheritance in JS. It&amp;rsquo;s much simpler.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>On the future of Software Engineering</title>
    <link href="http://127.0.0.1/2012/01/28/on-the-future-of-software-engineering/" rel="alternate"/>
    <id>http://127.0.0.1/2012/01/28/on-the-future-of-software-engineering/</id>
    <published>2012-01-28T00:00:00Z</published>
    <updated>2012-01-28T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;Silicon Valley and the rest of the tech industry is going through a &lt;a href="http://www.google.com/search?sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;q=tech+crunch+shortage+of+talent#pq=tech+crunch+shortage+of+talent&amp;amp;hl=en&amp;amp;sugexp=pfwl&amp;amp;cp=31&amp;amp;gs_id=1s&amp;amp;xhr=t&amp;amp;q=shortage+of+talent+silicon+valley&amp;amp;pf=p&amp;amp;sclient=psy-ab&amp;amp;source=hp&amp;amp;pbx=1&amp;amp;oq=shortage+of+talent+silicon+vall&amp;amp;aq=0w&amp;amp;aqi=q-w1&amp;amp;aql=&amp;amp;gs_sm=&amp;amp;gs_upl=&amp;amp;bav=on.2,or.r_gc.r_pw.,cf.osb&amp;amp;fp=e589aff7775bf00f&amp;amp;biw=1319&amp;amp;bih=670"&gt;very well documented crisis&lt;/a&gt; and no, I&amp;rsquo;m not referring to another tech bubble (although it may be happening). What I&amp;rsquo;m instead talking about is a chronic shortage of engineers and designers&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;Silicon Valley and the rest of the tech industry is going through a &lt;a href="http://www.google.com/search?sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;q=tech+crunch+shortage+of+talent#pq=tech+crunch+shortage+of+talent&amp;amp;hl=en&amp;amp;sugexp=pfwl&amp;amp;cp=31&amp;amp;gs_id=1s&amp;amp;xhr=t&amp;amp;q=shortage+of+talent+silicon+valley&amp;amp;pf=p&amp;amp;sclient=psy-ab&amp;amp;source=hp&amp;amp;pbx=1&amp;amp;oq=shortage+of+talent+silicon+vall&amp;amp;aq=0w&amp;amp;aqi=q-w1&amp;amp;aql=&amp;amp;gs_sm=&amp;amp;gs_upl=&amp;amp;bav=on.2,or.r_gc.r_pw.,cf.osb&amp;amp;fp=e589aff7775bf00f&amp;amp;biw=1319&amp;amp;bih=670"&gt;very well documented crisis&lt;/a&gt; and no, I&amp;rsquo;m not referring to another tech bubble (although it may be happening). What I&amp;rsquo;m instead talking about is a chronic shortage of engineers and designers.&lt;/p&gt;

&lt;p&gt;While a lot of well known VC&amp;rsquo;s, angel investors (insert other hot-shot name here) think it&amp;rsquo;s going to just go away the same way the dot-com bubble burst, I&amp;rsquo;m not so sure.&lt;/p&gt;

&lt;p&gt;This is not 2000 and thinking of today in terms of the 2000 dot-com bubble is wrong. Here&amp;rsquo;s why&amp;hellip;&lt;/p&gt;

&lt;h4&gt;No cloud computing&lt;/h4&gt;

&lt;p&gt;This is an ENORMOUS deal. There was no virtualization software back then. No Xen, no vmware. As a result no software as a service products. No EC2, no S3, no rackspace. Nothing. Nada. Web-scale companies such as Facebook, Twitter, &lt;a href="http://www.heroku.com"&gt;Heroku&lt;/a&gt; (on which I am hosting this blog) just wouldn&amp;rsquo;t have been able to get off the ground. Or even if they did, it would have been a super rocky path.&lt;/p&gt;

&lt;h4&gt;Modern tools and methods&lt;/h4&gt;

&lt;p&gt;If you were working back in 2000, you were probably using a bloated piece of crappy software like Java. There still are a lot of places that just don&amp;rsquo;t get what it means to get things done soon using productive tools, but I&amp;rsquo;ll leave that rant aside for now. Working with a bloated crappy tool meant, slower turn around times between when someone from product would like to see something happen and the engineering doing and it being sput out from QA.&lt;/p&gt;

&lt;p&gt;If you&amp;rsquo;ve worked or spoken with anyone working with Ruby/Python/Javascript I&amp;rsquo;m sure you&amp;rsquo;d have been amazed at how fast they got things done. This isn&amp;rsquo;t because they are some kind of magical breed of human beings, who are able to operate at vulcan speed. It&amp;rsquo;s because the tools they use are just that good. Look at &lt;a href="http://www.paulgraham.com/avg.html"&gt;Paul Graham&amp;rsquo;s essay&lt;/a&gt; on how his company was able to get stuffed ship before their competitors just by making the choice to use the right tools.&lt;/p&gt;

&lt;p&gt;Here are just a couple of the awesome tools, we have now that didn&amp;rsquo;t exist back then:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;- Git/Mercurial (distributed version control eff'ing rocks!)
- Github and it's enormous contribution to open source software
- Rails and similar frameworks for Python
- iOS SDK and the App Store
- Agile/Scrum way of working instead of waterfall
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Couple these with the cloud computing power from point #1 and you have a recipe for being able to build, argue over, ship your product out in days. The &lt;em&gt;argue&lt;/em&gt; step could slow down the workflow just a tad bit, but that&amp;rsquo;s an unknown :). This stuff takes months in bigger companies/banks and other obsolete corporations!&lt;/p&gt;

&lt;h4&gt;The employment bottle-neck&lt;/h4&gt;

&lt;p&gt;Since things are so great and the cost to launch a company and raise capital so little and so easy, a lot of engineers are just going it their own and launching their own startups. You are not going to like any boss as much as yourself, am I right? :)&lt;/p&gt;

&lt;h4&gt;What now, what comes next&lt;/h4&gt;

&lt;p&gt;I think the way this glut resolves itself, is that startups find a (to borrow an overused phrase in the financial world) &lt;strong&gt;new normal&lt;/strong&gt;. What this means is startups will start seeing and getting used to having just one person do the role of five from the past. This is further made possible by today&amp;rsquo;s tools and tech. I spend my day doing sysadmin, developer, machine learning stuff amongst others. This would have been done by atleast 3 different people just a couple of years back. The tools and tech will further evolve to make this &lt;strong&gt;new normal&lt;/strong&gt; possible and efficient.&lt;/p&gt;

&lt;p&gt;Once startups get used to the feeling of just having a super minimal head count the demand for new talent will ease up and the market will &lt;strong&gt;correct itself&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;Who gains the most&lt;/h4&gt;

&lt;p&gt;The gulf between what it means to be a startup employee and a non-startup employee will widen. If you are a startup founder and are looking for one tech person, you would want that one person to have solved a lot of the tech problems you will probably be running into. In other words, you need someone who has worked in a startup before. This demand, for someone with a specific set of up to date skills forces what economists would call a &lt;strong&gt;specialized economy&lt;/strong&gt; and the one&amp;rsquo;s benefitting the most from it would be the ones that can meet those specialized demands, i.e. startup engineers and designers.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Comparison of MRI and JRuby GC's for a MMO Game Server</title>
    <link href="http://127.0.0.1/2012/01/02/comparison-of-mri-and-jruby-gcs-for-a-mmo-game-server/" rel="alternate"/>
    <id>http://127.0.0.1/2012/01/02/comparison-of-mri-and-jruby-gcs-for-a-mmo-game-server/</id>
    <published>2012-01-02T00:00:00Z</published>
    <updated>2012-01-02T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;I&amp;rsquo;ve been looking at using Ruby for building up a stateless game server in a MMO setting. Ruby to many people, means Rails, which in turn means, a web app. The typical web app request when viewed through the lens of memory (allocation &amp;amp; deallocation) and IO (DB writes &amp;amp; reads) looks as follows:&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;I&amp;rsquo;ve been looking at using Ruby for building up a stateless game server in a MMO setting. Ruby to many people, means Rails, which in turn means, a web app. The typical web app request when viewed through the lens of memory (allocation &amp;amp; deallocation) and IO (DB writes &amp;amp; reads) looks as follows:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://img.skitch.com/20120103-cqcbeqxe8je3m28c8px95tt32h.medium.jpg" alt="Web Request Profile" /&gt;&lt;/p&gt;

&lt;p&gt;Now, depending on the app you are building and the database you are using and the language you are using you &lt;em&gt;may&lt;/em&gt; hit a bottleneck either while allocating or deallocating memory (GC slowness) or while doing IO (latency, DB size etc.). A request to a game server on the other hand looks something like this:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://img.skitch.com/20120103-p8rntmtm7iceny388c3kn4r4yc.medium.jpg" alt="Game Server Request Profile" /&gt;&lt;/p&gt;

&lt;p&gt;As you can see in the case of a game server, the garbage collector (GC) is really under the gun to perform under extreme stress. With this in mind, I decided to evaluate jruby-1.6.5 and ruby-1.9.2 to see which of these two rubies performs better under the IO characteristic the game server is going to be put through. I created the following two scripts: one for &lt;a href="https://gist.github.com/1553926"&gt;MRI&lt;/a&gt; and the other for &lt;a href="https://gist.github.com/1553930"&gt;Jruby&lt;/a&gt;. The only difference being, that in Jruby I was using Threads while in MRI I&amp;rsquo;m using straight up Kernel processes, since I want to do real parallelism and ruby thread&amp;rsquo;s in MRI don&amp;rsquo;t do real parallelism, as they still have a GIL.&lt;/p&gt;

&lt;h3&gt;Results&lt;/h3&gt;

&lt;p&gt;I knew the JVM has a way better GC, but boy did I underestimate it:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://img.skitch.com/20120103-814kaae48a55chs8511fhcyj7d.medium.jpg" alt="Results" /&gt;&lt;/p&gt;

&lt;p&gt;When 20,000 Objects get created and reaped in every request MRI 1.9.2 breaks down at 300 requests per second, while JRuby is at a swaggering 800 req/sec. Once you start getting down to the 3,000 Objects created &amp;amp; reaped per request region, JRuby really starts to differentiate itself at 3600 req/sec while MRI stalls at 1760 req/sec. The JVM&amp;rsquo;s GC is just way more performant!&lt;/p&gt;

&lt;h3&gt;Settings&lt;/h3&gt;

&lt;p&gt;I ran these using &lt;a href="https://rvm.beginrescueend.com/"&gt;RVM&lt;/a&gt; on a Mac Book Air 1.6 Ghz Intel i5 4GB RAM, fwiw.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Exploring Java's concurrent packages via JRuby</title>
    <link href="http://127.0.0.1/2011/12/28/exploring-javas-concurrent-packages-via-jruby/" rel="alternate"/>
    <id>http://127.0.0.1/2011/12/28/exploring-javas-concurrent-packages-via-jruby/</id>
    <published>2011-12-28T00:00:00Z</published>
    <updated>2011-12-28T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;Threads are &lt;a href="http://confreaks.net/videos/709-rubyconf2011-threading-versus-evented"&gt;back in style&lt;/a&gt; these days. If you were like me, a Java engineer getting into Ruby couple years back, you probably would have seen your fair share of JVM bashing by a few people who relished beating on all things Java, while peddling their half-assed gems as software masterpieces. This is not to say that Java, especially it&amp;rsquo;s community, loves to over-engineer the crap out of everything they can lay their hands on, but (and this is a pretty big BUT) the JVM &lt;strong&gt;is&lt;/strong&gt; a seminal piece of software. Okay, if you are still hanging around after this rant, I thank you for affording me this indulgence&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;Threads are &lt;a href="http://confreaks.net/videos/709-rubyconf2011-threading-versus-evented"&gt;back in style&lt;/a&gt; these days. If you were like me, a Java engineer getting into Ruby couple years back, you probably would have seen your fair share of JVM bashing by a few people who relished beating on all things Java, while peddling their half-assed gems as software masterpieces. This is not to say that Java, especially it&amp;rsquo;s community, loves to over-engineer the crap out of everything they can lay their hands on, but (and this is a pretty big BUT) the JVM &lt;strong&gt;is&lt;/strong&gt; a seminal piece of software. Okay, if you are still hanging around after this rant, I thank you for affording me this indulgence.&lt;/p&gt;

&lt;p&gt;Having gotten that out of the way, for the next couple of posts I&amp;rsquo;ll work on exploring some of the concurrency packages Java ships with, along with discussing the Java concurrency model, all the while using &lt;a href="http://www.jruby.org"&gt;JRuby&lt;/a&gt; to drive my code.&lt;/p&gt;

&lt;h4&gt;Why JRuby?&lt;/h4&gt;

&lt;p&gt;As much as I hate the irrational bashing of another language, I&amp;rsquo;ve really grown to love working with Ruby. It&amp;rsquo;s just great for &amp;ldquo;putting your ideas down on paper&amp;rdquo;, with minimal ceremony. While for many, this ceremony may not feel like a big deal, &lt;em&gt;it actually is&lt;/em&gt;. &lt;a href="http://confreaks.net/videos/724-rockymtnruby2011-cognitive-psychology-and-the-zen-of-code"&gt;This talk on cognitive pyschology&lt;/a&gt; goes into further detail as to how our brain processes information, and how boiler plate code comes in the way of other engineers understanding your code. Using JRuby also has a unintended benefit of really forcing me to concentrate just on the &lt;em&gt;concurrency and parallelism&lt;/em&gt; concepts, without wasting a lot of time of on all &lt;em&gt;those&lt;/em&gt; Java'ish things such as &lt;em&gt;final&lt;/em&gt; variables or &lt;em&gt;private static&lt;/em&gt; methods, amongst others. It&amp;rsquo;s also going to be a lot more fun focussing only on the awesome things of the Java concurrency packages, while skipping all of the code bloat that is part and parcel of working with Java.&lt;/p&gt;

&lt;h4&gt;Concurrency and Parallelism&lt;/h4&gt;

&lt;p&gt;The terms concurrency and parallelism get thrown around a lot and I for one didn&amp;rsquo;t really understand the difference until I bothered looking it up recently. Here&amp;rsquo;s a diagram that should help clear it up:&lt;/p&gt;

&lt;p&gt;&lt;img src="https://img.skitch.com/20111228-tar8qi6crxbmdkwwu795nqt7d9.medium.jpg" alt="Concurrency Diagram" /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, concurrency doesn&amp;rsquo;t mean squat. In the concurrent graph, Thread A runs from time 0-10 and then the CPU scheduler schedules Thread B to run from time 10-20, after which it switches back to Thread A and so on and so forth. So the machine is not actually executing CPU instructions &lt;em&gt;at the same time&lt;/em&gt;. In the parallel graph, the machine &lt;strong&gt;does execute instructions on both threads at the same time&lt;/strong&gt; between time 10 and time 30. All this is assuming you have a multi-core CPU of course (if you don&amp;rsquo;t you probably don&amp;rsquo;t really want to be reading this or any other blog post on parallelizing work).&lt;/p&gt;

&lt;h4&gt;Thread Safety&lt;/h4&gt;

&lt;p&gt;Now that we have understood that parallelism, is really what we are after, we can look at the various &lt;em&gt;problems&lt;/em&gt; that arise when you have your code run in a multi-threaded parallel environment. These &lt;em&gt;problems&lt;/em&gt; that crop up are all grouped under the broad umbrella called &lt;em&gt;thread safety&lt;/em&gt;. So before we look at each of these problems let&amp;rsquo;s first try and get a sense of what it means to be thread safe.&lt;/p&gt;

&lt;p&gt;The simplest way to define what it means to be thread safe is, the &lt;strong&gt;invariant&lt;/strong&gt; of your code is preserved even while running in a multi-threaded environment. What this means, is that, whatever behavior your code said it would exhibit does not change whether it&amp;rsquo;s running in a single or multi-threaded environment. An example, will help make this clear:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class Incrementer
  def increment
    @val ||= 0
    @val += 1
  end
end

incrementer = Incrementer.new
incrementer.increment
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now as you can see, the behavior this class is promising to uphold is an &lt;a href="http://en.wikipedia.org/wiki/Arithmetic_progression"&gt;arithmetic series&lt;/a&gt; where every element the &lt;em&gt;increment&lt;/em&gt; function returns is 1 greater than the last element it returned. This is the &lt;em&gt;invariant&lt;/em&gt; of this code. Now, with a little unlucky timing, in a multi-threaded environment, this invariant can be broken and this function can skip numbers or return the same number it did in it&amp;rsquo;s previous call. The following line:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    @val += 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Is not actually an atomic operation, i.e. a computer operation that is exactly one instruction. Rather, it can be thought of as two discrete operations:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;    temp = @val + 1   - (1)
    @val = temp       - (2)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With a little unlucky timing, line (1) could complete by Thread A and then Thread B could run and re-execute line (1) in which case it&amp;rsquo;s reading a &lt;em&gt;stale&lt;/em&gt; value of @val. There is also the initialization of the variable, which is not Thread safe, but I&amp;rsquo;ll leave that for a later discussion. So how do we make this code thread safe?&lt;/p&gt;

&lt;h4&gt;Synchronization and locks&lt;/h4&gt;

&lt;p&gt;One of the easiest ways we can make this code thread safe is by requiring that whenever a given Thread is scheduled to run by the CPU scheduler it needs to have a &lt;em&gt;lock&lt;/em&gt;. If it does not, it should do nothing and go back to waiting. Java has a keyword called &lt;em&gt;synchronized&lt;/em&gt; that let&amp;rsquo;s a Thread try to acquire a lock on an Object. The thread safe version of the incrementer with this in, looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;require 'java'

class Incrementer &amp;lt; java.lang.Object
  def increment
    self.synchronized do
      @val ||= 0
      @val += 1
    end
  end
end

incrementer = Incrementer.new
incrementer.increment
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The call to &lt;strong&gt;self.synchronized&lt;/strong&gt; tries to acquire a lock on &lt;em&gt;self&lt;/em&gt; which in this would be the instance &lt;em&gt;incrementer&lt;/em&gt; of the &lt;em&gt;Incrementer&lt;/em&gt; class. Couple of things to note here:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;- In order to be able to synchronize on an Object, that Object 
  should have a *java.lang.Object* object somewhere it's ancestry tree
- Ensure that all your threads synchronize on the same Object, or 
  they'll be getting/releasing different locks
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code is actually a case of over-synchronization, and over-synchronization is bad in that, it is slow. To give you a taste of some of the cool, Java packages we can use, here&amp;rsquo;s an altered version:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;  def increment
    self.synchronized do
      @val ||= java.util.concurrent.atomic.AtomicInteger.new
    end
    @val.incrementAndGet
  end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we still need to guard a section of our code where we initialize our instance variable. But as soon as we have verified we have a valid reference, we release the lock and call the incrementAndGet method made available to us by Java&amp;rsquo;s AtomicInteger objects.&lt;/p&gt;

&lt;h4&gt;Conclusion&lt;/h4&gt;

&lt;p&gt;Thread safety is when the behavior of your code does not change when it&amp;rsquo;s run in either a single or multi-threaded environment. Java&amp;rsquo;s synchronized keyword is the simplest form of Thread safety, that uses a concept called &lt;strong&gt;locking&lt;/strong&gt; where, only a Thread that has acquired a lock is permitted to run. JRuby gets for free a lot of the Java concurrent packages and this is great news for engineers looking to build parallelizable code but are not yet ready to give up on ruby.&lt;/p&gt;

&lt;h4&gt;Up next&lt;/h4&gt;

&lt;p&gt;Sharing data, thread visibility, volatile variables, thread safe initialization and immutable objects and how they rock!&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Philosophy and Test Driven development</title>
    <link href="http://127.0.0.1/2011/11/23/philosophy-and-test-driven-development/" rel="alternate"/>
    <id>http://127.0.0.1/2011/11/23/philosophy-and-test-driven-development/</id>
    <published>2011-11-23T00:00:00Z</published>
    <updated>2011-11-23T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;For someone coming to test driven development for the first time &amp;amp; even for those who have been dogmatically following it through the course of their work, the question &amp;ldquo;What do I test?&amp;rdquo;, still remains. In this blog, I am going to take a shot at answering that question using tools that Philosophy provides, specifically those employed by &lt;a href="http://en.wikipedia.org/wiki/Immanuel_Kant"&gt;Immanuel Kant&lt;/a&gt; in his ground-breaking &lt;a href="http://en.wikipedia.org/wiki/Critique_of_Pure_Reason"&gt;First Critique of Pure Reason&lt;/a&gt;. Those coming from a philosophical background, can see me drawing a hugely controversial line in the sand, and choosing to remain on the side &lt;em&gt;against&lt;/em&gt; the &lt;a href="http://plato.stanford.edu/entries/hume"&gt;empiricists&lt;/a&gt; by my previous statement. Over the course of this blog, I will go on to defend this position and expand on why an empirical approach to Test Driven development is flawed&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;For someone coming to test driven development for the first time &amp;amp; even for those who have been dogmatically following it through the course of their work, the question &amp;ldquo;What do I test?&amp;rdquo;, still remains. In this blog, I am going to take a shot at answering that question using tools that Philosophy provides, specifically those employed by &lt;a href="http://en.wikipedia.org/wiki/Immanuel_Kant"&gt;Immanuel Kant&lt;/a&gt; in his ground-breaking &lt;a href="http://en.wikipedia.org/wiki/Critique_of_Pure_Reason"&gt;First Critique of Pure Reason&lt;/a&gt;. Those coming from a philosophical background, can see me drawing a hugely controversial line in the sand, and choosing to remain on the side &lt;em&gt;against&lt;/em&gt; the &lt;a href="http://plato.stanford.edu/entries/hume"&gt;empiricists&lt;/a&gt; by my previous statement. Over the course of this blog, I will go on to defend this position and expand on why an empirical approach to Test Driven development is flawed.&lt;/p&gt;

&lt;h3&gt;The Scientific Method and Test Driven Development&lt;/h3&gt;

&lt;p&gt;Over the course of it&amp;rsquo;s history, the term &amp;ldquo;scientific method&amp;rdquo; has gotten horribly and inexorably linked with empiricism. Here&amp;rsquo;s why &amp;ndash; the scientific method to most people means, measuring your way to some kind of truism. The empiricists, rejoice at this incorrect linkage, since this allows them to quiet conveniently slide in a bunch of probabilities into the mix and state confidently that somehow those probablities make that truism all that more true! The fact remains, that science (outside the realm of quantum mechanics which is still under &amp;ldquo;study&amp;rdquo;) follows Newtonian physics, which is distinctly anti-empirical. You don&amp;rsquo;t remember &amp;ldquo;A body continues in it&amp;rsquo;s state of rest or constant motion unless an external force acts on it &lt;em&gt;with 99% probablity&lt;/em&gt;&amp;rdquo;. Instead, if you actually peel back a couple layers from the &amp;ldquo;scientific method&amp;rdquo; onion, you&amp;rsquo;ll see scientists quiet often have &amp;ldquo;theories&amp;rdquo; (which is a polite way of saying they make stuff up) and then look at measurements as a way of testing the viablity of this theory. Only then, do they try getting the math right to make it acceptable. In many cases, theories just remain unprovable (proving alternate interior angles are equal in geometry for instance) and just remains as something we see and &lt;em&gt;feel&lt;/em&gt; to be true, hence it is true.&lt;/p&gt;

&lt;p&gt;So how does all this tie in with Test Driven Development? The answer is painfully simple &amp;ndash; tests are &lt;em&gt;measurements&lt;/em&gt;. So, when working with measurements we can choose to either follow the empirical approach of keeping the focus on measurements, or the Kant'ian and Common Sense approach of giving the benefit of doubt to the thing we think is true, unless we can prove measurably that it isn&amp;rsquo;t.&lt;/p&gt;

&lt;p&gt;So where does this lead us? It leads us to writing more &lt;em&gt;negative&lt;/em&gt; tests that cover the edge-cases where the behavior of the system we are building is undefined or at the very least not something you think it&amp;rsquo;s going to be. In mathematics parlance, if you are modelling in software a &lt;a href="http://en.wikipedia.org/wiki/Continuous_function"&gt;non-continuous function&lt;/a&gt; you need to be writing tests for where the &amp;ldquo;continuity&amp;rdquo; breaks down. An example,&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;        -
F(x) = | 2,  x != 1
       | 10, x = 1
        -
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A quick refresher, this is not a continuous function because the limit as this function approaches 1 from either negative or positive infinity is going to be 2, when in fact the value at 1 is 10. Using this as a guide, we see our test for this function should explicitly state the behavior of this function at 1.&lt;/p&gt;

&lt;h3&gt;Tying it all together&lt;/h3&gt;

&lt;p&gt;So how do we employ the Kant'ian world view in our software tests to make more maintainable, understandable software? Don&amp;rsquo;t write tests for the obvious. If you are writing a function that sums up two numbers, don&amp;rsquo;t write a test for that. Let that remain a law that&amp;rsquo;s just understood to be true within the confines of your system. Software engineering after all is making abstract representations of reality. Reality, is that which impinges on our senses and gives us a model in space time of our surroundings. How do we know a table is a table? Does it fulfill some measurements that deem it to be a table? Nope, it just &lt;strong&gt;is&lt;/strong&gt; a table. Similarly, how do we know that we call a sum() function that we have in written into our software system, it actually does add two numbers? We don&amp;rsquo;t and we shouldn&amp;rsquo;t be foolishly writing tests to state the obvious, it just &lt;strong&gt;is&lt;/strong&gt;. On the other hand, given the following function:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;def sum(a, b)
  if a != 0 &amp;amp;&amp;amp; b != 0
    a + b
  else
    1.thousand
  end
end
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here this is most definitely not a continuous function and not something that we can just leave be. Write a test for the case where either a or b or both are zero.&lt;/p&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;

&lt;p&gt;As software engineers, we spend way too much time buried in the minutae of our world. Step out a several million feet, every once in a while and explore how philosopher&amp;rsquo;s &amp;amp; thinker&amp;rsquo;s grappled with such fundamental concepts as reality &amp;amp; truth&amp;rsquo;s thereby laying the foundation for modern science. I can guarantee, you will get a far better perspective of what it is you and I are doing and more importantly what it is that we can take for granted and what it is that we shouldn&amp;rsquo;t.&lt;/p&gt;
</content>
  </entry>
  <entry>
    <title>Persistance options with Redis</title>
    <link href="http://127.0.0.1/2011/09/23/persistance-options-with-redis/" rel="alternate"/>
    <id>http://127.0.0.1/2011/09/23/persistance-options-with-redis/</id>
    <published>2011-09-23T00:00:00Z</published>
    <updated>2011-09-23T00:00:00Z</updated>
    <author>
      <name>Santosh Kumar</name>
    </author>
    <summary type="html">&lt;p&gt;Most people thinking of Redis, think of it is an in-memory datastore. This is totally true. However, there is a lot of misconception with the &amp;ldquo;in memory&amp;rdquo; part; as being if my redis server crashes I lose all of my data. This part is most definitely &lt;strong&gt;not&lt;/strong&gt; true. Redis persists your data to disk and provides you with all of the knobs you are going to need to need to fine tune how often you&amp;rsquo;d like this persist to take place, while still eeking out the performance you&amp;rsquo;d like to get out of redis&amp;hellip;&lt;/p&gt;
</summary>
    <content type="html">&lt;p&gt;Most people thinking of Redis, think of it is an in-memory datastore. This is totally true. However, there is a lot of misconception with the &amp;ldquo;in memory&amp;rdquo; part; as being if my redis server crashes I lose all of my data. This part is most definitely &lt;strong&gt;not&lt;/strong&gt; true. Redis persists your data to disk and provides you with all of the knobs you are going to need to need to fine tune how often you&amp;rsquo;d like this persist to take place, while still eeking out the performance you&amp;rsquo;d like to get out of redis.&lt;/p&gt;

&lt;h3&gt;The two storage options&lt;/h3&gt;

&lt;p&gt;A picture speaks a thousand words, so here goes:&lt;/p&gt;

&lt;h4&gt;Option 1: Binary .rdb file&lt;/h4&gt;

&lt;pre&gt;&lt;code&gt; ----- 
|  R  |
|  E  |    Option 1     ---------------------
|  D  | -------------&amp;gt; | Binary File (.rdb)  |
|  I  |                 ---------------------
|  S  |
 ----- 
&lt;/code&gt;&lt;/pre&gt;

&lt;h4&gt;Option 2: A text file popularly known as an Append-Only File (AOF)&lt;/h4&gt;

&lt;pre&gt;&lt;code&gt; ----- 
|  R  |
|  E  |    Option 2     ---------------------
|  D  | -------------&amp;gt; |  Text File (.aof)  |
|  I  |                 ---------------------
|  S  |
 ----- 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;With both these options, you can create a brand new redis server with the data from your old redis server by simply copying the .rdb of .aof file and pointing your new redis server at the copied file.&lt;/p&gt;

&lt;h3&gt;What&amp;rsquo;s the difference?&lt;/h3&gt;

&lt;p&gt;Besides the obvious fact, that one is in binary the other in text format, you may ask what&amp;rsquo;s the difference between them? The Append Only File (AOF) is basically a log of all the commands your redis-server has run. Every single operation that your redis server has executed gets written to the AOF. So if you did:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;INCR USER_COUNT 1
INCR USER_COUNT 1
INCR USER_COUNT 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In your AOF you will see all three increment operations. On the other hand, the binary .rdb file is basically a snapshot of all the keys and values in your redis server.&lt;/p&gt;

&lt;p&gt;As you may have guessed, the AOF can grow really large since it&amp;rsquo;s logging every operation; which is why redis has a command called &lt;a href="http://redis.io/commands/bgrewriteaof"&gt;BGREWRITEAOF&lt;/a&gt; that as the name suggests rewrites the AOF in the background. The re-write results in a file of way smaller size. Taking the example from above, let&amp;rsquo;s say that after the third INCR USER_COUNT is now 10. The AOF file will have something along the lines of:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;SET USER_COUNT 10
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Why even have two formats?&lt;/h3&gt;

&lt;p&gt;I mean, isn&amp;rsquo;t it going to be a pain to remember to keep doing BGREWRITEAOF; or you risk running out of disk space? The AOF let&amp;rsquo;s you do something special which the binary .rdb file just is not going to be able to do. Again, a picture will explain this way more clearly:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt; R1  --&amp;gt; /var/db/redis/file_one.aof
 R2  --&amp;gt; /var/db/redis/file_two.aof
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So you have two redis servers, each writing to their own files. You can now do:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;cat /var/db/redis/file_one.aof /var/db/redis/file_two.aof &amp;gt;&amp;gt; /var/db/redis/file_three.aof
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And now bring up a new Redis server pointing at file_three.aof:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;R3  --&amp;gt; /var/db/redis/file_three.aof
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And this new redis server will have all of the data from servers R1 &amp;amp; R2. How cool is that?!&lt;/p&gt;

&lt;h3&gt;Merging data across all of your redis servers&lt;/h3&gt;

&lt;p&gt;Going off on a tangent for a bit here and trying to re-hash the trick discussed above. Let&amp;rsquo;s say you have multiple redis servers running and they are storing in binary format. You would now like to merge all of their data together, while the servers are running with zero downtime. Here&amp;rsquo;s one way to go about it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;redis-cli -p &amp;lt;first-server-port&amp;gt; CONFIG SET appendonly yes
redis-cli -p &amp;lt;first-server-port&amp;gt; BGREWRITEAOF
redis-cli -p &amp;lt;first-server-port&amp;gt; CONFIG SET appendonly no

redis-cli -p &amp;lt;second-server-port&amp;gt; CONFIG SET appendonly yes
redis-cli -p &amp;lt;second-server-port&amp;gt; BGREWRITEAOF
redis-cli -p &amp;lt;second-server-port&amp;gt; CONFIG SET appendonly no
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;All this is doing, is basically flipping the AOF switch, forcing your redis server&amp;rsquo;s to rewrite their AOF&amp;rsquo;s and then flipping it back to the binary storage format. You can now, merge these two AOF files by cat'ing them out to a third file as shown before and voila, you now have all of your data from your two redis servers in one AOF file that you can then use to bring up a new redis server (or as a backup).&lt;/p&gt;

&lt;h3&gt;Which one is better?&lt;/h3&gt;

&lt;p&gt;For most situations, using the binary format is better. When in doubt, avoid using the AOF &amp;ndash; you run the risk of running out of diskspace if you forget to do a BGREWRITEAOF. And if you do end up having to merge data across multiple redis-servers, you could always do the trick shown above. The binary format is also a little faster in couple of redis-benchmarks that I had run.&lt;/p&gt;

&lt;h3&gt;Fine tuning binary format save strategy&lt;/h3&gt;

&lt;p&gt;Redis let&amp;rsquo;s you specify how often you&amp;rsquo;d like to take a snapshot of your redis DB and persist it to disk. A typical configuration entry that you will in a redis.conf file (&lt;a href="https://github.com/santosh79/dot-files/blob/master/redis-dot.conf"&gt;here&amp;rsquo;s an example of one&lt;/a&gt; ) will have entries in the SNAPSHOTTING section that look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;save 900 1
save 300 10
save 60 10000
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Read the first line as &amp;ldquo;Snapshot the in-memory DB to disk every 900 seconds if only 1 key has changed&amp;rdquo;. The second line as &amp;ldquo;Snapshot the in-memory DB to disk every 300 seconds if only 10 keys have changed&amp;rdquo;. And the last line as (you guessed it) &amp;ldquo;Snapshot the in-memory DB to disk every 60 seconds if 10000 keys have changed&amp;rdquo;. Now you can play with these parameters and tweak them as you see fit. For instance you could have something like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;save 5 1
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where you&amp;rsquo;ll be taking a snapshot of your DB every 5 seconds even if a single key has changed. Obviously this is going to come with a performance impact (since disk write&amp;rsquo;s or fsync&amp;rsquo;s as they are known the geek world are a time consuming operation), but Redis &lt;strong&gt;does&lt;/strong&gt; support storing your stuff on disk even at this paranoid level! Finally, you can force a snapshot to happen anytime with the BGSAVE command.&lt;/p&gt;

&lt;h3&gt;Fine tuning AOF save strategy&lt;/h3&gt;

&lt;p&gt;In AOF mode, redis keeps logging what operations it&amp;rsquo;s doing in-memory. You can tell redis how often you&amp;rsquo;d like it to flush this in-memory buffer to disk. In the APPEND ONLY MODE section of a redis.conf file you will see an entry for &lt;strong&gt;appendfsync&lt;/strong&gt;. Typical values are:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;appendfsync always
appendfsync everysec
appendfsync no
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Where the &lt;strong&gt;always&lt;/strong&gt; and &lt;strong&gt;everysec&lt;/strong&gt; operations are pretty self-explanatory. The &lt;strong&gt;no&lt;/strong&gt; option is kind of a misnomer. With the &lt;strong&gt;no&lt;/strong&gt; option set, redis assumes no responsibility for deciding when it&amp;rsquo;s flushing the in-memory AOF buffer to disk, it&amp;rsquo;s let&amp;rsquo;s the Operating System decide when it needs to be done. Of the three, the &lt;strong&gt;no&lt;/strong&gt; option is generally the most performant, while at the same time being the most risky. The &lt;strong&gt;everysec&lt;/strong&gt; option seems the most popular giving the best of both worlds.&lt;/p&gt;

&lt;h3&gt;TL;DR&lt;/h3&gt;

&lt;p&gt;Redis &lt;strong&gt;does&lt;/strong&gt; store your data to disk and has numerous options that let you control how often you&amp;rsquo;d like it to persist your data. So if this is what is holding you back from using Redis, don&amp;rsquo;t let it anymore.&lt;/p&gt;
</content>
  </entry>
</feed>
