This post assumes a basic understanding of Node streams. If you aren’t already familiar with streams, there are a plethora of great resources for learning about them. I personally recommend Substack’s Stream Handbook.
What is backpressure?
When one stream pushes data to another stream faster than the latter can process it causes excess data to be read into memory, creating a vicious cycle wherein the garbarge collector has to sift through more data to determine what can be cleaned up, slowing down the actual processing of data, and then writing even more data into memory as the preceding stream continues to push. In addition to the aforementioned excess memory consumption and strain on the garbage collector, this will also cause a general slow down of other processes. Backpressure is a way of mitigating and managing this problem.
What does it look like when backpressure is applied and everything works correctly?
Standard interaction between Node streams
Node streams have a basic workflow for situations that require backpressure. If you’re using .pipe()
, this workflow is already taken care of for you. The steps are as follows:
For the sake of this example, I’m assuming a simple Readable-to-Writable stream setup.
R
is a readable stream andW
is a writable stream.R
begins reading from a data source and passes chunks of data intoW.write()
.W
begins writing the data received fromR
.- Due to either
W
’s highWaterMark being reached or the write queue being busy,W.write()
returnsfalse
. R
receivesfalse
and in response it callsR.pause()
to prevent more data from being pushed toW
.W
finishes writing its current chunk of data. It now emits adrain
event to signal that it is ready to process another chunk.- In response to the
drain
event,R.resume()
is called and more data begins to flow intoW
.
Cardinal rules for building custom streams
As already mentioned, .pipe()
handles the above steps right out of the box. If you’re building custom streams, however, you’ll need to implement backpressure mechanisms yourself. Keeping in mind two rules will make this easier:
-
For Writable streams, always return
false
if either thehighWaterMark
has been met or the write queue is already in use. -
For Readable streams, do not ignore the return value of
Writable.write()
and therefore fail to callReadable.pause()
when Writeable is not yet ready for more data. Put more succinctly, do not callWriteable.write()
orReadable.push()
unconditionally.
Conclusion
Node provides a great deal of convenience and abstraction, allowing developers to get their projects up and running quickly. Although it’s easy to get far with the basics alone, it’s also important to now and then peak under the hood and gain a deeper knowledge of how things run. Hopefully this post has given you an understanding of backpressure, what problem it seeks to solve, how it’s handled in Node, and how to leverage the same logic when building custom streams.
Additional Reading
- https://nodejs.org/api/stream.html
- https://nodejs.org/en/docs/guides/backpressuring-in-streams/
- https://www.bennadel.com/blog/3237-managing-stream-back-pressure-during-asynchronous-tasks-using-readable-and-data-events-in-node-js.htm
- http://ey3ball.github.io/posts/2014/07/17/node-streams-back-pressure/