- Hdfs commands
- hadoop fs -ls <>
- hadoop fs -mkdir <>
- hadoop fs -rmdir <>
- Copy file from Edge Node to HDFS
- hadoop fs -put /home/cloudera/<src> /user/cloudera/<dest>
- Read file rom HDFS
- hadoop fs -cat /user/cloudera/<file_path>
- hadoop dfsadmin -safemode leave
✓ -R : Recursively list the contents of directories.
✓ -C : Display the paths of files and directories only.
By default it sorts in ascending order by name.
✓ ls –r : Reverse the order of the sort.
✓ ls –S : Sort files by size.
✓ ls –t : Sort files by modification time (most recent first).
✓ Command : rmdir
• Remove Directory if it is empty.
• --ignore-fail-on-non-empty : Suppress Error messages if the Folder you are trying to remove is non empty.
✓ Command : rm
• rm remove files.
• With option –r , recursively deletes directories
• With option –skipTrash bypasses trash, if enabled.
• With option –f, no error message even if file does not exist. Check with Unix command $?. It returns the status of
last ran job. 0 means successful and 1 means not successful.
✓ Command: mkdir
• Create Directory
• –p : Do not fail if the directory already exists. Also we can create multiple folders recursively.
HDFS to Local
###########
✓ Command - copyToLocal or get
hadoop fs -get practice/retail_db/orders .
✓ Error if the destination path already exists. To overwrite use –f flag.
hadoop fs -get practice/retail_db/orders .
hadoop fs -get –f practice/retail_db/orders .
✓ -p flag to preserves access and modification times, ownership and the mode.
hadoop fs -get -p practice/retail_db/orders .
✓ To Only copy the files with out folder use a pattern.
hadoop fs -get practice/retail_db/orders/* .
✓ When copying multiple files, the destination must be a directory.
mkdir copyHere
hadoop fs -get practice/retail_db/orders/* practice/sample.txt copyHere
Local to HDFS
######
✓ Command - copyFromLocal or put
hadoop fs -mkdir –p practice/retail_db
hadoop fs -put dataFiles/* practice/retail_db/
hadoop fs -mkdir –p practice/retail_db1
hadoop fs -put dataFiles practice/retail_db1/ #Creates a subfolder dataFiles under retail_db
✓ Error if the destination path already exists. To overwrite use –f flag.
hadoop fs -put -f dataFiles/* practice/retail_db/
✓ -p flag to Preserves timestamps, ownership and the mode.
hadoop fs -put -p dataFiles/* practice/retail_db/
✓ We can also copy multiple files.
hadoop fs -put -f dataFiles/* sample.txt practice/retail_db/
✓ First 10
hadoop fs -cat practice/retail_db/orders/part-00000 | head -10
✓ Last 10
hadoop fs -cat practice/retail_db/orders/part-00000 | tail -10
Statistics
#########
✓ Command – stat
• Print statistics related to any file/directory
✓ default or %y - Modification Time
hdfs dfs -stat %y <hdfs_file_name>
✓ %b - File Size in Bytes
hdfs dfs -stat %b <hdfs_file_name>
✓ %F - Type of object.
✓ %o - Block Size
✓ %r - Replication
✓ %u - User Name
✓ %a - File Permission in Octal
✓ %A - File Permission in Symbolic
storage details in HDFS
##############
✓ Command – df
• Shows the capacity, free and used space of the HDFS file system.
• hadoop fs -df
• -h →Readable Format
✓ Command – du
• Show the amount of space, in bytes, used by the files that match the specified file pattern.
• hadoop fs -du practice/retail_db
• -h :Readable Format
• -v : Displays with Header
• -s : Summary of total size
File Metadata
########
✓ Command – fsck
✓ Even if a Size of a file has less than 128MB , it will still occupy 1 Block.
✓ Help: hadoop fsck –help
### Print the fsck High Level Report
hadoop fsck practice/retail_db
### -files → Print a detailed file level report.
hadoop fsck practice/retail_db –files
### -files -blocks →Print a detailed file and block report.
hadoop fsck practice/retail_db –files -blocks
### -files -blocks -locations → Print out locations for every block
hadoop fsck practice/retail_db –files –blocks –locations
### -files -blocks -racks → Print out network topology for data-node locations
Properties update (-D)