Sunday, November 8, 2015

Node.js handling results from multiple async functions

I've been using node.js regularly for a couple months now, and have just started to try and answer questions about it on stackoverflow.  I have frequently been seeing questions about handling Multiple asynchronous requests, ie starting multiple async requests in response to some sort of event, then waiting until All requests have finished and taking their results and then performing some action with the results.  This blog post should explain a couple of strategies for dealing with this.

The problem:

For this post I'm going to assume we have have a node.js http express server.  It has a single route registered.  The route makes a series of external API calls, waits for their response,  then processes all data and sends a response to the user:

Naive 'Synchronous' looking approach:

The goal is to get the result of each api call in the `results` array and then aggregate the results.  A naive implementation that Does Not Work is:

Why isn't this correct?  The above code sends off 3 requests to the api endpoints, saves their results, performs aggregation and then sends response to client.  If this code was executed synchronously it would work as expected, but since `request` is asynchronous results will be empty when they are aggregated.  Lets trace the code path.

1. all urls are initialized in apiUrls
2. results array is initialized, which will hold all data returned by api calls
3. apiUrls is iterated over, a request to each url is made, a callback is registered to handle the response, the callback will add the response to results array
4. the results are aggregated
5. the aggregated results are sent to the client.

Since request() is asynchronous there is no guarantee of when the callback will be called. This is the difficulty of asynchronous programming, and node in general.  If the code read the way it executed, it would be correct, but because it is asynchronous, we have absolutely no idea when or even IF the call to request() will finish.  (there are certainly ways to guarantee that the call will finish through the use of a timeout).  The program will execute like:

1. all urls are initialized in apiUrls
2. results array is initialized, which will hold all data returned by api calls, (results = [])
3. apiUrls is iterated over, a request to each url is made, a callback is registered to handle the response, the callback will add the response to results array (results = [])
4. results is still equal to [] because no data has been retrieved!!! the requests were only Sent, and a function was registered to handle the responses, WHEN THEY OCCUR, which could be anytime in the future!!!!
5. the results are aggregated, (still an empty list)
6. the aggregated results are sent to the client

Keeping track of the responses:

To be correct the program needs to aggregate the results only AFTER all requests have been made and returned.  That means the program needs to keep track of how many requests are going to be made and how many requests have been completed. When all expected requests have been completed, THEN the results should be aggregated and the response should be sent to the client.

A correct implementation requires that the program keep track of how many responses have been received.  Because of this the program is significantly more complicated.  The function keeps track of how many responses have been received, when all responses have been received, only then are the results aggregated, and the response is sent to the client.  While this program handles the case of all requests succeeding it is extremely deficient in its handling of errors.  Should the results be aggregated if one of the requests times out, or if the API server returns an error?

An important thing to notice is that the API client request callback is responsible for triggering the aggregation the the data and sending response to the client.  There is A LOT going on here.  Tracing the flow of this program can be complicated.  If we add in error handling, (or short circuiting of the requests) things can get even more complicated!!  Finally, we are laying a base for a nice callback pyramid of doom.  The top level code queues the API requests, and callbacks to be executed when the requests finish, and then the callbacks are responsible for finalizing the express get request and sending a response to the client.  I would certainly prefer that the callback is NOT responsible for this.  I feel like the callback should only be responsible for handing an individual API response.  Very focused (single responsibility) functions are generally easier to reason about, and usually easier to test.

async, A level of abstraction:

Using the wildly popular async library allows us to separate processing the results and sending response to client from making the api requests.

The above code may look more complicated, but it could be easier to test, as the callback responsible for aggregating results and sending response, is no longer located inside of an API response callback.  The requests to the API are now triggered by the async library. They are completed in parallel.  When all requests have finished completing (have called the callback method, or when one request has called callback with an error) the function passed as the third parameter to async.each will be executed.

This is great because the API response callback is no longer directly responsible for aggregating results and sending response to a client.  Internally async library is keeping track of the number of requests similar to the way we did in our first correct example.  I would argue that making these requests and performing an action when all responses have been complete is significantly more cleaner using async library.

Another approach using promises.... to be continued.....

Friday, June 7, 2013

How much experience does a Computer Science Degree provide?

Many companies employment requirements include a computer science degree or "Equivalent Experience".  In reality, the "Equivalent Experience" of a college computer science degree is much lower than employers believe.

Most times companies quantify equivalent experience as an amount of time, which is 4 years.  Just because school normally lasts for 4 years, does that mean an individual gains 4 years worth of computer experience from a computer science degree.

  A core component of the US bachelor degree is a humanities education.  This comes in the form of required courses in art, english, history, philosophy, mathematics, social studies and often contain a physical education aspect.  The vast majority of 4 year programs require courses from the above disciplines, in hopes of creating "well rounded" students.  These courses are required even in technical programs.  

I looked at the requirements for a computer science degree at UMBC (major requirements).  70 computer science and math credits are required for a degree.  In my browsings, I have seen anywhere between 70-90 credits required in the field.  For calculation purposes I will assume 80 credits required.

80 credits total / 15 credits per semester (average) = 5.3 semesters of school 

A semester is roughly 3.5 months of school.

5 semester * 3.5 months per semester = 17.5 months 

Um. This is 1.5 years of 15 / per week instruction?

84 weeks (1.5 years) * 15 hours of instruction per week = 1260 hours of instruction

1260 hours is 31.5 40 hour weeks, significantly less than 1 full year.

Friday, May 31, 2013

Django Request/Response Cycle, How Requests Die and Responses Are Born

This summer I decided to start contributing to django.  Looking through the easy picking tickets, I realized it would be beneficial to develop an understanding of some of the core components of the django framework before jumping into tickets.

I plan on writing blog posts about the internals of django, starting with today's:  What is the path of a request through the django framework, and what is the path or a response out of the framework.

Lets begin with describing the entry point, and all major functions that are called for a request/response.  We can then elaborate on the important functions.  Django uses the WSGI standard to interact with web servers, there is a lot of information available about this standard on the web.

Basically, django will be returning a callable application with the signature and return value the same as

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return ['Hello world!\n']

When deploying django the webserver needs to know of the location of the file that belongs to your project, this is where it retrieves the callable application (like the one above)

If we look in the file where django that is included with django we see:

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

Ah HA!

So lets start tracing how our application is built!

1. get_wsgi_application is a public wrapper function that returns a WSGIHandler instance.
    Notice how WSGI handler has a lot in common with the above sample application:
    Django calculates the correct response status codes and headers and calls the `start_response` function that was passed into the app (from the webserver?) when the application is called.  It then returns the response.

2. So we are already at a dead end! Django returned our app to the webserver.  I don't know when the webserver actually calls our app or what start_response actually does, and we don't need to, we can just trust that the wsgi modules/wsgi servers do things right!  The response is calculated when our application (WSGIHandler instance) is called.  This takes place in WSGIHandlers.__call__.

3. The first thing WSGIHandler does is attempt to load middleware that is registered in the file.

4. The next significant action is the instantiation of a a new WSGIRequest

5. The last main action is retrieving a response for the new request object in get_response.  This is where the bulk of the request logic comes into play.  This function is responsible for calling all middleware hooks at the correct times, and for resolving a url to a view function, and executing that view function.  This is where those nice error messages of your view did not return a HTTPResponse object comes from!

This has been a quick, pretty high level architecture overview, of which main functions django calls to turn a request into a response.  There ARE other things going on like emitting signals along the way, and a couple config options to prepare a request for processing, but this is pretty much the gist of it.

Friday, April 26, 2013

Loading jQuery plugins using require.js and backbone.js

I recently began using backbone.js through the package grunt-bbb.  I ran into a little bit of trouble loading external javascript libraries through require.js, specifically jQuery plugins.

For anyone new to backbone.js development, I would strongly advise you check out grunt-bbb.  It is a framework and build system for backbone.js.  It provides commands to generate scaffolding and for building/testing/linting your project.

Below are instructions specific to grunt-bbb, but can easily be adapted to any project using require.js.

require.js configuration provides a `shim` option. According to the docs:

Configure the dependencies and exports for older, traditional "browser globals" scripts that do not use define() to declare the dependencies and set a module value. 

This allows you to define what order packages are loaded in.  This is necessary because your jQuery plugin relies on jQuery.

In your require.js config it is important to provide both the location of your plugin and specify that jQuery is dependent on your plugin

The above code points to where the js plugins are located (without.js extension) and registers them as
 dependent on jQuery.
Now all you have to do is load the scripts in your module's define statement:
define(["app", 'prettyPhoto', 'jqueryui' ], function(app) {
The plugins are loaded by the key you define in `paths` object! 

Thursday, March 21, 2013

The Cult of Twitter

Twitter is a very unique product.  It comes as no surprise that it has a unique relationship with its users. This relationship is really shines when the service goes down.  The twitter fail whale is iconic.  Little blue birds trying to carry a whale.  As someone who doesn't even use twitter, I have seen this image many places, at many times.  What is most intriguing to me is how Twitter branded themselves to be so cute.  By using imagery like that found on the fail whale, they have taken the absolute worst possible thing that could happen to a web service (ie. it not functioning) and turn it into something cute and tolerable.  

When twitter goes down, the site is filled with the image of small blue birds carrying a whale.  The second twitter goes down the internet blows up with articles about twitters outage.  This is no different then other mega services like gmail or facebook.  The difference is how users react to it.  No other service has this relationships with its customers.  Twitter failing has become "cute" because of the fail whale.

The reason people can interpret this as cute, is because twitter is not a necessary service.  It is a leisure service.  Even though companies can indirectly use it to gain customers or advertise, it is still primarily recreation.  This becomes apparent when, by contrast, a service like gmail goes down.  Personally, when gmail goes down by work stops.  I don't even receive that many emails per day (probably only a few per hour) but when gmail goes down I get up from the computer, because it has become synonymous with work.  I don't think twitter will ever have this relationship with its customers.

What do you think?  Do you get angry when you see the fail whale, or is something you can shrug off until the service comes back up? 

Friday, September 21, 2012

Introduction to Unit Testing Using Python Unittest

Unit Testing is an extremely powerful tool.  It directly helps to ensure the 3 aspects of good software: Verifiability, Maintainability, and Extensibility.  Unit testing is as much a process of software design as it is a tool.  There are many great tutorials on HOW to use python unittest, and of course python documentation is an excellent resource.  I am going to focus on WHY to test your software.  Below are a couple short unit testing examples using python's built in unittest package.


How does one verify that their program works?  With unit testing we can create specific functions or groups of functions to target our code.  For example:

def sum_numbers(num_one, num_two):
   """return an integer, the sum of two numbers"""
   return num_one + num_two

We can easily create a test for this using python's built in unittest module.  Tests are created by subclassing unittest.TestClass

from mymodule.functions import sum_numbers

class TestFunctions(unittest.TestClass): 

    def test_add_numbers_success(self):
        self.assertEqual(sum_numbers(2, 2), 4)

On python 2.7+ running python -m unittest discover in our package will automatically search for all files and run the TestClasses.  There are a couple of things to note in the above example.  All test classes must subclass unittest.TestClass.  The focal point of any test is its assertions.  The assertions are what dictate whether a test passes or fails.  Although this is a contrived example it displays how easy it is to isolate our methods and control exactly what inputs they receive!  If we were to create one test method (or more) for every method in our project we would quickly grow a test suite.  When making ANY changes to our code it becomes trivial to run through every single function in our project and verify that nothing has broken!  Imagine if we had a web app and every time we added a feature we would have to run through EVERY possible page/action!? It could take a long time doing it manually.


Bugs are a part of software development.  It is extremely important to minimize bugs but when they do happen it is important to create fixes very quickly.  Because bugs will occur in code it is important to have a process set up that helps to isolate the bug so that it is easy to reproduce, easy to correct and easy to verify the bug has been fixed.  Unittesting helps to do all of these.  Suppose the sum_numbers function is at the heart of a website.  It gets all sorts of user input data, and occasionally some faulty data slips through. If a string is passed as one of the parameters it will result in a TypeError!!  It is trivial to isolate and reproduce this bug.  I guess we need a little thought about what should be returned if there is an invalid input.  For this example lets return None.  We then can create another test method like:

class TestFunctions(unittest.TestClass): 

    def test_add_numbers_success(self):
        self.assertEqual(sum_numbers(2, 2), 4)

    def test_add_numbers_string_bug(self):
        self.assertEqual(sum_numbers('a', 2), None)

Running the above code reproduces the string error.  Since we haven't fixed our code yet this test will fail.  We can then change our sum_numbers method to handle a type error:

def sum_numbers(num_one, num_two):
   """return an integer, the sum of two numbers, can't trust user input"""
     return int(num_one) + int(num_two)
   except ValueError:
     return None

Running our test again will result in two passing tests.  We successfully isolated the bug, reproduced the bug and verified the bug has been fixed!! We now also have a test trail assuring us the big has been addressed.  Pretty cool.


With tests it becomes very easy to help an app grow.  A test suite provides a safety net for an application.  We can programatically run through every function of an app in a short amount of time.  This could take hours to do manually.   Some test suites can take hours to run, it wouldn't even be feasible to manually test a large codebase!!   As long as we keep designing our apps in a modular unit based way we can easily add functions and tests for those individual functions.  Another aspect of unittesting is how easy it is to refactor code.  Suppose we had thought it was a good idea at the time to write our original function like:

def sum_numbers(num_one, num_two):
   """return an integer, the sum of two numbers"""
   return sum([num_one, num_two])

Assuming we had the same test method as before:

def test_add_numbers_success(self):
        self.assertEqual(sum_numbers(2, 2), 4)

This test is focused on the output of our function.  It is assuring us the output is as expected.   This allows us to easily change what is happening in our function and still have the test acting as a safety net.  We can rewrite (refactor) the internals of our methods and guarentee they are still functioning in the way we originally tested them!  Our method would pass the test because it is performing the action that we want it to.  We could change the method to remove the list and sum function

def sum_numbers(num_one, num_two):
   """return an integer, the sum of two numbers"""
   return num_one + num_two

We cleaned up our function and ensures that it functions in the way we designed it to!!

Testing is a powerful tool that should be very heavily considered.  It helps verify our functions are working the way we intended them, help us easily maintain our applications and help us extend our applications.  Correct aplication design will help us isolate our problems.  Testing can take a significant amount of time, but the benefits it offers far outweigh any downsides.

Sunday, September 9, 2012

Why Unit Testing is important

One of the most controversial topics in programming is Unit Testing, or testing in generally.  There are a number of strong arguments on both sides of the issue.

For the past 10 months I have been freelancing.  During this time I have been exposed to a wide variety of code created by many individuals of vastly varying skill levels.  All of these projects have been php websites and webapps.  Of course this had led to tons and tons of different architectures.  All of these projects have had one troubling thing in common: The projects were created with absolutely no thought about maintainability.  This is, in part, because of the industry.  Boutiques and contractors are not maintaining the app.  The goal is to ship a working product in as little time as possible.  Unit testing code doesn't play into this because the time investment involved.  A significant time investment is required on determining testing strategy and writing the actual tests.  I have had many instances where writing tests takes AS LONG AS writing my actual functions!!!  Having to budget in up to 50% more time to write tests is undesirable for every party involved.

This test-less, unmaintainalbe strategy actually works pretty well (it is an industry standard) as long as the sites never need features added.   During a 3 month contract with a php boutique we had a number of recurring contracts.  This involved maintaining web sites which were created 5-10 years ago.  Many of these sites were from a different era, relying on registered_globals, and completely prone to sql injection.  So the solution is simple right? Fix the security holes, add new features, and deploy!?  No.  Many of the sites did not have any sort of structure to the programs.  Each file generally had 1 or no functions and hundreds of lines of code.  Fixing bugs meant wading through lots of code that was more or less unrelated to the problems.  Why not fix this code soup?  The issues are there are set deadlines.  People don't see refactoring the whole site into something that can be maintained as a good usage of time.

Writing code as if it were going to be unit tested will resolve many issues.  Unit testing is important because it helps us think in terms of maintaining code.  The most important aspects of unit testing are usually overlooked:

Thinking about program structure.  This NEEDS to be thought about.  Even for small websites.  Sitting down and writing whatever comes to ones head is a sure way to reduce the quality of code and to remain a mediocre programmer.

Designing units, what are logical sections

I think a simple way to do this is to go through long code and comment what each section does.  For example

// log in a user
// check user permissions
// get friends of user
// etc

When doing this it becomes very very apparent what should comprise a "section".  This would makes sense to have a login_user function.  Or a check_permission function.   Even if they are only used one time, it still makes sense to create functions for these.  This helps with the next point.

Thinking in terms of maintaining code.
    - how willl this code be added to?

Actually thinking about program design will make maintaining and extending your code so much easier.  Say that in addition to the facebook login that version 1 of the site uses a client wants to add twitter auth too.  With a login_user function this is pretty easy.  All login code can be located in this function.

Learning to create software as a series or related units takes practice, but it pays off in the long run as code is easier to work on and easier to extend.