May 13, 2023

Hive Interview Questions


  • How to choose no buckets?

    • Depends on data size of table

    • Data Size of  / 128 MB = 31

    • 2 ^ n > 31

  • See no of partitions

    • show partitions txnrecords_dp;

  • Drop/Remove a partition from metastore

    • show partitions txnrecords_dp;

    • alter table txnrecords_dp DROP IF EXISTS PARTITION(category="Games");

    • show partitions txnrecords_dp;

    • This will remove the partition from HDFS as well

    • Same applies for Internal/External table

  • Add partition

    • alter table txnrecords_dp ADD PARTITION(category="Games");

    • show partitions txnrecords_dp;

    • Add multiple in a single command

      • alter table txnrecords_dp ADD PARTITION(category="Games-1") PARTITION(category="Games-2");

    • When you add a new partition, it will automatically load data into Hive

  • Edit Partition

    • alter table txnrecords_dp 

  • Delete partition under Hdfs end but not in Metastore/Hive

    • hdfs dfs -rm -r /apps/hive/xyz/txnrecords_dp/category=Puzzles

    • Check if partition on hive

      • Show partitions txnrecords_dp;

      • Still it is available, metastore/hive will not recognize the HDFS changes

      • You have to make it recognize using msck repair command

    • How to sync from HDFS to MetaStore/Hive

      • hive> msck repair table txnrecords_dp;

  • How to create a new table from an existing partitioned table (new table should not contain partitions) ?

    • create table txnrecords_dp_bkp as select * from txnrecords_dp;

      • Map reduce job will run as it is copying

      • Only data & schema is copied, not partitions

      • show partitions txnrecords_dp_bkp;

        • No partitions are copied

  • Create only backup table only schema from existing table

    • create table txnrecords_dp_bkp_schema LIKE txnrecords_dp;

      • Only schema & partitions is copied (but no data)

  • Run HQL command

    • hive -f test.hql

    • or

    • hive -e test.hql

  • hive -e  (run hive queries)

    • hive -e "select * from batch5_hadoop.txnrecords_dp imit 10"

  • Hive - rename a table

    • hive> alter table txnrecords_dp_bkp_schema RENAME to  txnrecords_dp_bkp_schema1

  • show create table txnrecords_dp_bkp_schema;

  • Hive - case

    • select id, name, marks, 

    • CASE WHEN marks>65 THEN 'PASS'

    • CASE WHEN marks<65 THEN FAIL'

    • ELSE 'NO_MARKS'

    • END as result

    • from student_data

No comments:

Post a Comment