Hadoop FS & DFS Commands

    0 Votes


Hadoop distributed file system (HDFS) helps us to store data in a distributed environment and due to its superior design. We can store and process its file system on a standard machine, compared to existing distributed systems, which requires high end machines to storage and processing. It also provides a high level of fault tolerance, since the data is replaced into 3 nodes. Even if one node goes down, the other nodes will act as a backup recovery mechanism

So HDFS provides concepts like Replication Factor, High memory block size and it can scale out up to several 1000 nodes. Hadoop itself is distributed file system and it provides the features of the file system. You can see the definitions of the two commands (Hadoop fs& Hadoop dfs) in $Hadoop_HOME/bin/Hadoop.

You can see definitions of the two commands (hadoop fs& hadoop dfs) in $HADOOP_HOME/bin/hadoop.

Hadoop DFS commands

Hadoop FS + DFS commands

The File System(FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS) as well as other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS, and others.

The FS shell is invoked by:

bin/Hadoopfs<args>

The Hadoop File system uses the following commands to read, write and modify the data. These are shell based commands which are used by developers to communicate with the Hadoop file system.

appendToFile

hdfs dfs -appendToFile<localsrc> ... <dst>

Append single source, or multiple sources from local file system to the destination file system. Also reads input from stdin and appends to destination file system.

Example:-

hdfs dfs -appendToFilelocalfile/user/Hadoop/Hadoopfile
hdfs dfs-appendToFilelocalfile1 localfile2 /user/Hadoop/Hadoopfile
hdfs dfs-appendToFilelocalfilehdfs://nn.example.com/Hadoop/Hadoopfile
hdfs dfs -appendToFile- hdfs://nn.example.com/Hadoop/Hadoopfile ( Reads the input from stdin.)

cat

hdfs dfs -cat URI [URI ...]

Copies source paths to stdout.

Example:-

hdfs dfs-cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2

chgrp

hdfs dfs -chgrp [-R] GROUP URI [URI ...]

Change group association of files. The user must be the owner of files, or else a super user.

Options

The -R option will make the change recursively through the directory structure.

chmod

hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]

Change the permissions of files. The user must be the owner of the file, or else a super user.

Options

The -R option will make the change recursively through the directory structure.

chown

hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

Change the owner of files. The user must be a super user.

Options :The -R option will make the change recursively through the directory structure.

copyFromLocal

hdfs dfs -copyFromLocal<localsrc> URI

This command is very similar to put command except that the source is restricted to a local file reference.

Options

The -f option will overwrite the destination if it already exists.

copyToLocal

hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

It is similar to the get command implementation. Limitation is that the destination is restricted to a local file reference.

count

hdfs dfs -count [-q] <paths>

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns with count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME

Example:-

hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hdfs dfs -count -q hdfs://nn1.example.com/file1

cp

hdfs dfs -cp [-f] URI [URI ...] <dest>

Copy files from source to destination. This command allows multiple sources as well. In that scenario, the destination must be a directory.

Options: The -f option will overwrite the destination if it already exists.

Example:-

hdfs dfs -cp /user/Hadoop/file1 /user/Hadoop/file2
hdfs dfs -cp /user/Hadoop/file1 /user/Hadoop/file2 /user/Hadoop/dir

get

hdfs dfs -get [-ignorecrc] [-crc] <src><localdst>

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

Example:-

hdfs dfs -get /user/Hadoop/file localfile
hdfs dfs -get hdfs://nn.example.com/user/Hadoop/file localfile

getmerge

hdfs dfs -getmerge<src><localdst> [addnl]

Takes a source directory and a destination file as input and concatenate files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file.

ls

hdfs dfs -ls<args>

For a file returns stat on the file with the following format:

Permissionsnumber_of_replicasuseridgroupidfilesizemodification_datemodification_time filename

For a directory it returns list of its direct children as in Unix. A directory is listed as:

permissionsuseridgroupidmodification_datemodification_timedirname

Example:-

hdfs dfs -ls /user/Hadoop/file1

lsr

hdfs dfs -lsr<args>

Recursive version of ls. Similar to Unixls -R.

mkdir

hdfs dfs -mkdir [-p] <paths>

Takes path uri's as argument and creates directories.

Options: The -p option behavior is much like Unixmkdir -p, creating parent directories along the path.

Example:-

>hdfs dfs -mkdir /user/Hadoop/dir1 /user/Hadoop/dir2
hdfs dfs -mkdir hdfs://nn1.example.com/user/Hadoop/dir hdfs://nn2.example.com/user/Hadoop/dir

moveFromLocal

dfs-moveFromLocal<localsrc><dst>

Similar to put command, except that the source localsrc is deleted after it's copied.

moveToLocal

hdfs dfs -moveToLocal [-crc] <src><dst>

Displays a "Not implemented yet" message.

mv

hdfs dfs -mv URI [URI ...] <dest>

Moves files from source to destination. This command allows multiple sources as well, but the destination needs to be a directory. Moving files across the file systems is not permitted.

Example:-

hdfs dfs -mv /user/Hadoop/file1 /user/Hadoop/file2
hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

put

hdfs dfs -put <localsrc> ... <dst>

Copy single source or multiple sources from local file system to the destination file system. Also reads input from stdin and writes to the destination file system.

Example:- -

hdfs dfs -put localfile /user/Hadoop/Hadoopfile
hdfs dfs -put localfile1 localfile2 /user/Hadoop/Hadoopdir

rm

hdfs dfs -rm [-skipTrash] URI [URI ...]

Delete files specified as args. Only deletes non empty directory and files. If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) will be deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory. Refer to rmr for recursive deletes.

Example:-

hdfs dfs -rm hdfs://nn.example.com/file /user/Hadoop/emptydir

rmr

hdfs dfs-rmr [-skipTrash] URI [URI ...]

Recursive version of delete. If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) will be deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory.

Example:-

hdfs dfs -rmr /user/Hadoop/dir

test

hdfs dfs -test -[ezd] URI

Options:

The -e option will check to see if the file exists, returning 0 if true.
The -z option will check to see if the file is zero length, returning 0 if true.
The -d option will check to see if the path is directory, returning 0 if true.

Example:-

hdfs dfs -test -e filename

text

hdfs dfs -text <src>

Takes a source file and output the file in text format. The allowed formats are zip and TextRecordInputStream. Commands for File System health checking are as given below

Hadoop fsck<path>

Example:-

 Hadoop fsck ‘/test.txt’ -files -blocks-locations

This commands checks entire file system and provides the below details by just providing the input path as argument.

  • Total size
  • Total dirs.
  • Total files
  • Total blocks (validated)
  • Minimally replicated blocks
  • Over-replicated blocks
  • Under-replicated blocks
  • Mis-replicated blocks
  • Default replication factor
  • Average block replication
  • Corrupt blocks 
  • Missing replicas
  • Number of data-nodes 
  • Number of racks