Friday, January 8, 2016

How to get Partitioning Information of Hive Tables in CDH


Following document has useful information for getting partition information of a hive table.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowPartitions

In order to query whole partitioning information on hive databases, hive metadata database can be used. It is in the MySQL server. Just login to MySQL server, switch to hive schema/database.

Following queries print partitioning summary and details

select db.name,t.owner,t.tbl_name,t.tbl_type, count(tp.part_name)
from TBLS t, DBS db, PARTITIONS tp where t.db_id=db.db_id
and t.tbl_id=tp.tbl_id
group by db.name,t.owner,t.tbl_name,t.tbl_type
order by 1,2,3;

select db.name,t.owner,t.tbl_name,t.tbl_type, tp.part_name
from TBLS t, DBS db, PARTITIONS tp where t.db_id=db.db_id
and t.tbl_id=tp.tbl_id
order by 1,2,3,5;

Following query prints bucketed table information

select d.name,t.owner, t.tbl_name,b.bucket_col_name,s.num_buckets
from BUCKETING_COLS b, SDS s, COLUMNS_V2 c, TBLS t, DBS d
where b.sd_id=s.sd_id and s.cd_id=c.cd_id and b.bucket_col_name=c.column_name and s.sd_id=t.sd_id
and t.db_id=d.db_id
order by 1,2,3,4;