Node.js Buffers and Streams

    0 Votes

In the previous tutorials, we have seen how to read and write data into files directly. But it assumes the fact that the data is available all together at the time of processing. Also consider the case when the file to be read is very large say a few hundred MBs. How can you read such large files efficiently? This tutorial introduces you to the concept of Buffers and Streams in NodeJS.

The problem

With today’s growing data consumption and creation, data storage is an important task. Today’s server’s typical log files can take up hundreds of MBs of space if not in Gbs.

The solution - Streams
Streams provide a way to read or write data in chunks, rather than all at once. Rather than reading an entire 1 GB text file into memory and processing it all at once, you can stream the data in smaller, more manageable portions.

Before me move on to streams, we need to understand how data is stored on the disk.

Buffers

Data in the disk is stored in form of binary bits - 1s and 0s. So to get a character from the memory, the computer actually reads 8 consecutive bits and decodes its value. Now reading from the permanent storage can be time consuming for a system, hence the operating system bring some data from the Hard disk (secondary storage) to the RAM (Main memory) for easier and faster accessing. These chunks of data in the main memory are called Buffers.

nodejs buffers

Buffers provide lower-level access to data in memory than the JavaScript’s engine provides itself. Effectively, they allow you to directly access the binary data that composes strings rather than a string's encoded value itself. The main benefit is higher performance, since you are dealing directly with the stored data and bypassing all the Node wrappers.

"use strict"; 
const str = "Hello!!! I am learning NodeJS from A4academics";
let buf1 = new Buffer(str);
console.log("Buffer 1:");
console.log(buf1);
console.log("toString: "+ buf1.toString());
console.log();

let buf2 = new Buffer(10);
let length = buf2.write("small text");

console.log("Buffer 2:");
console.log(buf2);
console.log("toString: "+ buf2.toString());
console.log();

console.log("Buffer 2 toJSON:");
console.log(buf2.toJSON());
console.log();

let buf3 = Buffer.concat([buf1, buf2]);
console.log("Buffer 3:");
console.log("toString: "+ buf3.toString());

Let’s see some code:

In this example, we first create a string str which we then convert to Buffer. Buffer is a global class which means that you can use it without the need to require any module.

We convert str to buffer by passing it as an argument to the Buffer class’ constructor. Then we print the content of the buffer by using console.log(buf1). See the output below.

Note: If we write console.log(“Buffer 1:” + buf1) i.e. use buffer concatenation with string, then the buffer’s toString() method is implicitly called and will display the string version of the buffer. Hence to display buffer content we need to use a separate console.log().

We then create a new empty Buffer of size 10. i.e it can contain only 10 characters. After this we write a string to the buf2. If the string length is greater than the buffer size then the string will be truncated. Then we show the content of buffer as string and as JSON.

Buffers kind off act as an Array. So you can concat two or more buffers to get a new buffer, slice a buffer, get it’s length etc. Basically you can use buffers as an array. It is illustrated in the example with the variable buf3 which is the concatenation of buf1 and buf2. See the output above following:

concatenate buffer

Streams

Understand this in simple english. You have a water tank somewhere and water comes from a tap when you open the tap. The water comes out in the form of a stream. Stream is a steady flow of anything from a larger reservoir.

In computers, data stream refers to a flow of bytes from a source to destination in the form of chunks of data or packets. i.e. the data flow takes place in the form of packets instead of the whole data together. This type of transfer occurs in networking where the data is divided into various packets and each packet arrives through different paths i.e. the data is buffered in the main memory while the earlier chunks are being processed.

in NodeJS these streams have event emitters with emit different events with the stream. These are:

  1. data

  2. error

  3. finish

  4. end

All of these events have specific usages. Let’s see with the help of an example.

Say we have a text file myTextFile.txt in our root folder which contains a lot of text data. (You can copy garbage data from http://www.lipsum.com/ ). Let us read this file using streams:

Create an index.js file in your root directory with the following code.

"use strict"; 
const fs = require('fs'); 
const path = require('path'); 
let readStream = fs.createReadStream(path.join(__dirname,"myTextFile.txt"));
let __data = "";
readStream.on('data', function(data)
{
	__data = data;
});
readStream.on('end',function()
{
	console.log("Finished reading data");
});

Here, we first create a readable stream from the myTextFile.txt using the fs module. Whenever a chunk is ready to be processed by the stream , the stream emits a ‘data’ event. So on every data event, we append the __data variable with the current chunk of data. After the stream has processed all the data, it emits an ‘end’ event. So after the end event is emitted, we display the data to the console. Here is the output.

readstream output

Yes, this example does not show the full power of streams. Streams in NodeJS is a very advanced topics and a very useful one. But for developing web server, you will hardly use the stream library. Most of the time, the interfaces are made such that you won’t even recognise that what you are using is a stream.

Similar to the readable stream, there is a writeable stream also provided by the fs module which uses the stream module internally. So one benefit you might have thought of is that if you can read the data in small chunks, you can even write the data in small chunks. This is very helpful when some data is transferred over network and you need to write the data into a file. Here instead of waiting for the data to fully arrive before writing, you can write the data to the file as soon as it arrives.

"use strict"; 
const fs = require('fs'); 
const path = require('path'); 
let writeStream = fs.createWriteStream(path.join(__dirname,"output.txt"));
writeStream.write("Hello world!!!");
writeStream.end();

Here is an example of the writeable stream:

Here we create a writable stream which will write data into output.txt .

Then we write data to the writeable stream using writeStream.write() function. After writing, we need to emit the end event so that other scripts which are reading this file can detect the end event.

Shortcut

With the readable and writable streams, you can read chunks of data from one file and write chunks to another file. This is too much of manual work. NodeJS provides a mechanism to simplify this. This is called piping and what it does is that it directly sends the data from readable stream to the writable stream.

"use strict"; 
const fs = require('fs'); 
const path = require('path'); 
let readStream = fs.createReadStream(path.join(__dirname,"myTextFile.txt"));
let writeStream = fs.createWriteStream(path.join(__dirname,"output.txt"));
readStream.on('data', function(data)
{
	writeStream.write(data);
});
readStream.on('end',function()
{
	writeStream.end();
	console.log("Finished writing");
});

Here is an example of how you can write from readable stream using both the methods:

  • Using separate read and write commands:
  • Using pipe
"use strict"; 
const fs = require('fs'); 
const path = require('path'); 
let readStream = fs.createReadStream(path.join(__dirname,"myTextFile.txt"));
let writeStream = fs.createWriteStream(path.join(__dirname,"output.txt"));
readStream.pipe(writeStream);

As you can see, piping reduces the size of code drastically. You can even chain multiple pipes into a single line subject to condition that the pipe returns a readable stream.

Streams have other functions which are used frequently.

  1. pause()
    This method is used to pause the stream. When this method is called on a stream, it stops reading or writing data and stops emitting any event.

  2. resume()
    This method is used to resume a stream paused by the pause() method. On executing this function, the stream begins to emit events again and continue processing.

One example where you can use streams

If you have done competitive coding, streams will be useful to execute the code in JavaScript.

Here is one solution to the problem Life, Universe and Everything on Codechef (https://www.codechef.com/problems/TEST)

Here is the execution:

http://ideone.com/7ynGUT

"use strict"; 
const fs = require('fs'); 
process.stdin.resume();
process.stdin.setEncoding('utf8');
var __data = "";
process.stdin.on('data', function(d)
{
	__data += d;
});
process.stdin.on('end',function()
{
	processInput();
});
function processInput()
{
var lines = __data.split("\n");
var write = true;
 for(var line of lines)
 {
	if(Number(line) == 42)
	{
	  write = true;
	}
	if(write)
	{
	  console.log(line);
	}
	else
	{
	process.exit(0);
	}
  }
}

If you remember the process global variable, it contains an object stdin which points to standard input. This is a readable stream. Since the inputs are coming in chunks we use the data event to get the complete input and when the data has ended, we execute our logic.

This is a simple example of readable stream which you can use in your day to day life.

Popular Videos

communication

How to improve your Interview, Salary Negotiation, Communication & Presentation Skills.

Got a tip or Question?
Let us know

Related Articles

Node.js Introduction & Environment Setup
Node Package Manager (NPM)
Global Variables in Node.js
Node.js Callbacks
Event Loop & Event Emitters in Node.js
Node.js File System & Setting up File Structure