The problem:
For this post I'm going to assume we have have a node.js http express server. It has a single route registered. The route makes a series of external API calls, waits for their response, then processes all data and sends a response to the user:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Aggregates results from the API calls | |
*/ | |
function aggregateResults(results) {} | |
app.get('/endpoint', function(req, res) { | |
var apiUrls = [ | |
'http://api/endpoint1', | |
'http://api/endpoint2', | |
'http://api/endpoint3', | |
]; | |
var results = []; | |
// make requests to all apiUrls and save result of each call in `results` | |
// ?????????????????? | |
// do something with all the results | |
var aggregatedResults = aggregateResults(results); | |
// send processed results to client | |
res.send(aggregatedResults); | |
}); |
Naive 'Synchronous' looking approach:
The goal is to get the result of each api call in the `results` array and then aggregate the results. A naive implementation that Does Not Work is:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Aggregates results from the API calls | |
*/ | |
function aggregateResults(results) {} | |
app.get('/endpoint', function(req, res) { | |
var apiUrls = [ | |
'http://api/endpoint1', | |
'http://api/endpoint2', | |
'http://api/endpoint3', | |
]; | |
var results = []; | |
// make requests to all apiUrls and save result of each call in `results` | |
apiUrls.forEach(function(url) { | |
request(url, function(error, response, body) { | |
// assume all requests always succeed, no error checking | |
results.push(body); | |
}); | |
}); | |
// do something with all the results | |
var aggregatedResults = aggregateResults(results); | |
// send processed results to client | |
res.send(aggregatedResults); | |
}); |
Why isn't this correct? The above code sends off 3 requests to the api endpoints, saves their results, performs aggregation and then sends response to client. If this code was executed synchronously it would work as expected, but since `request` is asynchronous results will be empty when they are aggregated. Lets trace the code path.
1. all urls are initialized in apiUrls
2. results array is initialized, which will hold all data returned by api calls
3. apiUrls is iterated over, a request to each url is made, a callback is registered to handle the response, the callback will add the response to results array
4. the results are aggregated
5. the aggregated results are sent to the client.
Since request() is asynchronous there is no guarantee of when the callback will be called. This is the difficulty of asynchronous programming, and node in general. If the code read the way it executed, it would be correct, but because it is asynchronous, we have absolutely no idea when or even IF the call to request() will finish. (there are certainly ways to guarantee that the call will finish through the use of a timeout). The program will execute like:
1. all urls are initialized in apiUrls
2. results array is initialized, which will hold all data returned by api calls, (results = [])
3. apiUrls is iterated over, a request to each url is made, a callback is registered to handle the response, the callback will add the response to results array (results = [])
4. results is still equal to [] because no data has been retrieved!!! the requests were only Sent, and a function was registered to handle the responses, WHEN THEY OCCUR, which could be anytime in the future!!!!
5. the results are aggregated, (still an empty list)
6. the aggregated results are sent to the client
Keeping track of the responses:
To be correct the program needs to aggregate the results only AFTER all requests have been made and returned. That means the program needs to keep track of how many requests are going to be made and how many requests have been completed. When all expected requests have been completed, THEN the results should be aggregated and the response should be sent to the client.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** | |
* Aggregates results from the API calls | |
*/ | |
function aggregateResults(results) {} | |
app.get('/endpoint', function(req, res) { | |
var apiUrls = [ | |
'http://api/endpoint1', | |
'http://api/endpoint2', | |
'http://api/endpoint3', | |
]; | |
// we are expecting a response for each url; | |
var NUM_RESPONSES_EXPECTED = apiUrls.length; | |
// keep track of how many responses | |
var numResponsesReceived = 0; | |
// store the data of each response | |
var results = []; | |
// make requests to all apiUrls and save result of each call in `results` | |
apiUrls.forEach(function(url) { | |
request(url, function(error, response, body) { | |
// assume all requests always succeed, no error checking | |
results.push(body); | |
// keep track that we received a response | |
numResponsesReceived++; | |
// have all responses completed??? | |
if (numResponsesReceived === NUM_RESPONSES_EXPECTED) { | |
// all responses have completed, only now should we aggregate | |
// the responses and send results to client. | |
var aggregatedResults = aggregateResults(results); | |
// send processed results to client | |
res.send(aggregatedResults); | |
} | |
}); | |
}); | |
}); |
An important thing to notice is that the API client request callback is responsible for triggering the aggregation the the data and sending response to the client. There is A LOT going on here. Tracing the flow of this program can be complicated. If we add in error handling, (or short circuiting of the requests) things can get even more complicated!! Finally, we are laying a base for a nice callback pyramid of doom. The top level code queues the API requests, and callbacks to be executed when the requests finish, and then the callbacks are responsible for finalizing the express get request and sending a response to the client. I would certainly prefer that the callback is NOT responsible for this. I feel like the callback should only be responsible for handing an individual API response. Very focused (single responsibility) functions are generally easier to reason about, and usually easier to test.
async, A level of abstraction:
Using the wildly popular async library allows us to separate processing the results and sending response to client from making the api requests.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var async = require('async'); | |
/** | |
* Aggregates results from the API calls | |
*/ | |
function aggregateResults(results) {} | |
app.get('/endpoint', function(req, res) { | |
var apiUrls = [ | |
'http://api/endpoint1', | |
'http://api/endpoint2', | |
'http://api/endpoint3', | |
]; | |
// execute all functions in parallel | |
async.each(apiUrls, function(url, callback) { | |
// called on each item in parallel. Signal completion by calling | |
// callback , which is provided by async library. | |
request(url, function(error, response, body) { | |
// if request has error callback can be called passing | |
// in a first parameter indicating an error. | |
var err = false; | |
callback(err, body); | |
}); | |
}, | |
function(err, results) { | |
// All functions (allRequests) have returned OR one has returned an error (not implemented) | |
var aggregatedResults = aggregateResults(results); | |
// send processed results to client | |
res.send(aggregatedResults); | |
}); | |
}); |
This is great because the API response callback is no longer directly responsible for aggregating results and sending response to a client. Internally async library is keeping track of the number of requests similar to the way we did in our first correct example. I would argue that making these requests and performing an action when all responses have been complete is significantly more cleaner using async library.
No comments:
Post a Comment