Frequently asked HBase Interview Questions and answers for freshers and 2-5 year experienced Hadoop developers on HBase features, CAP theorem, column families, data flow, HMaster, Hregion, HLogs etc.
HBase is a column oriented distributed database residing within the Hadoop environment. It can store huge amounts of data from terabytes to petabytes. HBase is scalable, distributed big data storage on top of the Hadoop eco system. It has some special features , when compared with traditional relational database models.
- Storage system capable of storing huge volumes of data
- Distributed design to handle large tables
- Horizontally Scalable
- Column Oriented Stores
Four major components of HBase architecture are HMaster, HRegions, HRegionserver and Zookeeper.
4) Explain what does HBase consists of?
- HBase consists of a set of tables
- Like any traditional database, HBase table contains rows and columns
- In HBase table, it is mandatory to define an element as a Primary Key
- Each column in the HBase table, corresponds to an attribute of an object
5) Explain CAP theorem?
CAP Theorem stands for Consistency, Availability and Partition tolerance. Let us have a look at the real meaning of each terminologies.
Consistency implies that at any moment of time, all the nodes will be able to see the same set of data. Availability makes sure that, we will receive a response for every request, even if the request was a success or failure. It’s like any handshaking mechanism in computer networks. Partition tolerance makes sure that the system continues to work despite occasional message loss or failure of part of the system. Systems with partition tolerance feature works well regardless of physical network partitions.
6) HBase follows which features of CAP theorem?
A column oriented database, like HBase provides features like consistency and partition tolerance.
7) Explain two differences between HDFS and HBase in terms of storage?
The key difference between HDFS and HBase lies in the operations and processing aspect of the data. When it comes to batch processing and high latency operations, HDFS is the best suited database. When it comes to low latency operations, HBase suits well. In HDFS, we can access data, primarily through Map Reduce jobs. In case of HBase, it provides access to single rows from billions of records.
8) State two differences between HBase and RDBMS?
- RDBMS is a row oriented database having a fixed schema.
- HBase is a column oriented storage having a flexible schema.
9) Name some column oriented databases other than HBase?
Some of the popular column oriented databases are Cassandra, MongoDB and CouchDB. These databases can store large data sets and provide facility to access the stored data in a random manner.
10) Explain the column families in HBase?
Column families constitute the basic unit of physical storage in HBase upon which advanced features like compressions are applied.
11) Explain the row key in HBase?
The Row keys are byte arrays created internally by the application to identify a row. There will be a unique row key for each row. Row key does not have a data type. We can use Row keys to specify the sort order. It also makes sure that the cells with same rowkey is located in the same server.
12) Explain HBase data flow in brief?
There is a bidirectional communication that exists between the client and HBase Master (HMaster) which handles DDL operations, Region assignment, etc. and Zookeeper, which acts as a distributed coordination service in HBase to maintain a live cluster state.
To read and write operations, client contacts HRegion servers. In an HBase architecture, there can be multiple region servers. Master is responsible for assigning regions to region servers. It also checks the health of regional servers, periodically. Log files of each region server, will be stored by the Hlog present in it.
13) Explain about HMaster?
HMaster is one of the key services of HBase. Major tasks of the HMaster are to provide a stable performance and maintain the nodes in a cluster. It also executes the administrative operations and allocates the services to different region servers.
14) Explain how Hregion servers will work in HBase?
It will perform the following functions in communication with HMaster and Zookeeper.
- Host and manage regions
- Automatically, splitting the regions
- Handling read, write operations
- Direct communication with the client
15) Explain about Zoo keeper in HBase?
Zookeeper acts as a monitoring server. It provides synchronization to a distributed environment and stores configuration info. In case, a client wants to communicate with regions servers, the client need to approach Zookeeper for access to regions servers.
16) Explain different type of read and write operations provided by HBase?
HBase provides random read and write operations. Data can be accessed via shell commands and various API's REST api, Java Client API, Avro and thrift.
17) Explain the partition provided by HBase?
HBase provide automatic partitions.
18) What is the difference between row and column oriented databases?
The row oriented database provides easy read and writes operations on records. It is well suited for OLTP systems. Column oriented databases are well suited for OLAP systems.
19) What is Hlog in HBase?
Hlog is a centralized log storage and it will store all the log files of HBase. Hlogs are present in all the regional servers.
20) Explain about HRegions in HBase?
When a write operation request is received by region servers, it transfers the request to the appropriate Region. A region has multiple stores representing each column family. On receipt of a write request, based on the column family, it will be allocated to the appropriate store. The two main components of HStore are Memstore and Hfile. Memstore is kept in RS main memory, while HFiles are written to HDFS.
21) Which type of data HBase can process?
HBase can store and process Structured and Semi structured data
22) How many filters available in apache HBase?
In apache HBase, we have 18 filters. Some of the filters are PageFilter, ColumnPrefixFilter and Timestamp Filter.
23) What are data model operations in HBase?
The four data model operations present in HBase are Get, Put, Scan and Delete.