Sep 13, 2021

bigdata, hdfs

  • Hdfs commands
    • hadoop fs -ls <>
    • hadoop fs -mkdir <>
    • hadoop fs -rmdir <>
    • Copy file from Edge Node to HDFS
      • hadoop fs -put /home/cloudera/<src> /user/cloudera/<dest> 
    • Read file rom HDFS
      • hadoop fs -cat /user/cloudera/<file_path>
  • hadoop dfsadmin -safemode leave
✓ -R : Recursively list the contents of directories.
    ✓ -C : Display the paths of files and directories only.
      By default it sorts in ascending order by name.
        ✓ ls –r : Reverse the order of the sort.
          ✓ ls –S : Sort files by size.
            ✓ ls –t : Sort files by modification time (most recent first).

                ✓ Command : rmdir
                  • Remove Directory if it is empty.
                    • --ignore-fail-on-non-empty : Suppress Error messages if the Folder you are trying to remove is non empty.
                      ✓ Command : rm
                        • rm remove files.
                          • With option –r , recursively deletes directories
                            • With option –skipTrash bypasses trash, if enabled.
                              • With option –f, no error message even if file does not exist. Check with Unix command $?. It returns the status of
                                last ran job. 0 means successful and 1 means not successful.
                                  ✓ Command: mkdir
                                    • Create Directory
                                      • –p : Do not fail if the directory already exists. Also we can create multiple folders recursively.

                                          HDFS to Local
                                            ###########
                                              ✓ Command - copyToLocal or get
                                                hadoop fs -get practice/retail_db/orders .
                                                  ✓ Error if the destination path already exists. To overwrite use –f flag.
                                                    hadoop fs -get practice/retail_db/orders .
                                                      hadoop fs -get –f practice/retail_db/orders .
                                                        ✓ -p flag to preserves access and modification times, ownership and the mode.
                                                          hadoop fs -get -p practice/retail_db/orders .
                                                            ✓ To Only copy the files with out folder use a pattern.
                                                              hadoop fs -get practice/retail_db/orders/* .
                                                                ✓ When copying multiple files, the destination must be a directory.
                                                                  mkdir copyHere
                                                                    hadoop fs -get practice/retail_db/orders/* practice/sample.txt copyHere

                                                                        Local to HDFS
                                                                          ######
                                                                            ✓ Command - copyFromLocal or put
                                                                              hadoop fs -mkdir –p practice/retail_db
                                                                                hadoop fs -put dataFiles/* practice/retail_db/
                                                                                  hadoop fs -mkdir –p practice/retail_db1
                                                                                    hadoop fs -put dataFiles practice/retail_db1/ #Creates a subfolder dataFiles under retail_db
                                                                                      ✓ Error if the destination path already exists. To overwrite use –f flag.
                                                                                        hadoop fs -put -f dataFiles/* practice/retail_db/
                                                                                          ✓ -p flag to Preserves timestamps, ownership and the mode.
                                                                                            hadoop fs -put -p dataFiles/* practice/retail_db/
                                                                                              ✓ We can also copy multiple files.
                                                                                                hadoop fs -put -f dataFiles/* sample.txt practice/retail_db/

                                                                                                    ✓ First 10
                                                                                                      hadoop fs -cat practice/retail_db/orders/part-00000 | head -10
                                                                                                        ✓ Last 10
                                                                                                          hadoop fs -cat practice/retail_db/orders/part-00000 | tail -10

                                                                                                              Statistics
                                                                                                                #########
                                                                                                                  ✓ Command – stat


                                                                                                                    • Print statistics related to any file/directory
                                                                                                                      ✓ default or %y - Modification Time
                                                                                                                        hdfs dfs -stat %y <hdfs_file_name>

                                                                                                                        ✓ %b - File Size in Bytes
                                                                                                                          hdfs dfs -stat %b <hdfs_file_name>

                                                                                                                          ✓ %F - Type of object.
                                                                                                                            ✓ %o - Block Size
                                                                                                                              ✓ %r - Replication
                                                                                                                                ✓ %u - User Name
                                                                                                                                  ✓ %a - File Permission in Octal
                                                                                                                                    ✓ %A - File Permission in Symbolic

                                                                                                                                        storage details in HDFS
                                                                                                                                          ##############
                                                                                                                                            ✓ Command – df
                                                                                                                                              • Shows the capacity, free and used space of the HDFS file system.
                                                                                                                                                • hadoop fs -df
                                                                                                                                                  • -h →Readable Format
                                                                                                                                                    ✓ Command – du
                                                                                                                                                      • Show the amount of space, in bytes, used by the files that match the specified file pattern.
                                                                                                                                                        • hadoop fs -du practice/retail_db
                                                                                                                                                          • -h :Readable Format
                                                                                                                                                            • -v : Displays with Header
                                                                                                                                                              • -s : Summary of total size

                                                                                                                                                                  File Metadata
                                                                                                                                                                    ########
                                                                                                                                                                      ✓ Command – fsck
                                                                                                                                                                        ✓ Even if a Size of a file has less than 128MB , it will still occupy 1 Block.
                                                                                                                                                                          ✓ Help: hadoop fsck –help

                                                                                                                                                                              ### Print the fsck High Level Report
                                                                                                                                                                                hadoop fsck practice/retail_db

                                                                                                                                                                                    ### -files → Print a detailed file level report.
                                                                                                                                                                                      hadoop fsck practice/retail_db –files

                                                                                                                                                                                          ### -files -blocks →Print a detailed file and block report.
                                                                                                                                                                                            hadoop fsck practice/retail_db –files -blocks

                                                                                                                                                                                                ### -files -blocks -locations → Print out locations for every block
                                                                                                                                                                                                  hadoop fsck practice/retail_db –files –blocks –locations

                                                                                                                                                                                                      ### -files -blocks -racks → Print out network topology for data-node locations


                                                                                                                                                                                                            Properties update (-D)