[
https://issues.apache.org/jira/browse/IMPALA-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855778#comment-17855778
]
Quanlong Huang commented on IMPALA-13160:
-----------------------------------------
I can reproduce the issue now. The key is the partitions should be created by
Hive. When running SHOW PARTITIONS in Hive, I can see duplicated partitions
(e.g. hour=00 duplicates hour=00, hour=01 duplicates hour=1):
{noformat}
+-----------------------+
| partition |
+-----------------------+
| day=20240613/hour=0 |
| day=20240613/hour=00 |
| day=20240613/hour=01 |
| day=20240613/hour=02 |
| day=20240613/hour=03 |
| day=20240613/hour=04 |
| day=20240613/hour=05 |
| day=20240613/hour=06 |
| day=20240613/hour=07 |
| day=20240613/hour=08 |
| day=20240613/hour=09 |
| day=20240613/hour=1 |
| day=20240613/hour=10 |
| day=20240613/hour=11 |
| day=20240613/hour=12 |
| day=20240613/hour=13 |
| day=20240613/hour=14 |
| day=20240613/hour=15 |
| day=20240613/hour=16 |
| day=20240613/hour=17 |
| day=20240613/hour=18 |
| day=20240613/hour=19 |
| day=20240613/hour=2 |
| day=20240613/hour=20 |
| day=20240613/hour=21 |
| day=20240613/hour=22 |
| day=20240613/hour=23 |
| day=20240613/hour=3 |
| day=20240613/hour=4 |
| day=20240613/hour=5 |
| day=20240613/hour=6 |
| day=20240613/hour=7 |
| day=20240613/hour=8 |
| day=20240613/hour=9 |
+-----------------------+
34 rows selected (0.103 seconds){noformat}
However, partitions are not referenced correctly in the query. E.g. inserting a
row to hour=00 actually inserts to hour=0
{code:sql}
hive> insert into hive_partition.two_partition partition(day=20240613, hour=00)
select 1, 'name';
{code}
The file is created as
'hdfs://localhost:20500/test-warehouse/hive_partition.db/two_partition/day=20240613/hour=0/000000_0'
which is under the partition dir of hour=0.
Using local-catalog mode in Impala can fix the hanging issue. However, query
results could be unexpected. This seems to be a gray area of both Hive and
Impala.
> Impala query stuck after query from special partition 'hour=0' and 'hour=00'
> which hour type is int
> ---------------------------------------------------------------------------------------------------
>
> Key: IMPALA-13160
> URL: https://issues.apache.org/jira/browse/IMPALA-13160
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog, fe
> Affects Versions: Impala 3.4.0, Impala 4.3.0
> Reporter: LiuYuan
> Priority: Critical
>
> 1.When create table as below:
> {code:java}
> CREATE TABLE hive_partition.two_partition (
> id INT,
> name STRING
> )
> PARTITIONED BY (
> day INT,
> hour INT
> )
> WITH SERDEPROPERTIES ('serialization.format'='1')
> STORED AS ORC
> LOCATION 'hdfs://ly-pfs/hive/hive_partition/two_partition'{code}
> 2.Then create dir as below:
>
> {code:java}
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=0
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=00
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=01
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=02
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=03
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=04
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=05
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=06
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=07
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=08
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=09
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=1
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=10
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=11
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=12
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=13
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=14
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=15
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=16
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=17
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=18
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=19
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=2
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=20
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=21
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=22
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=23
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=3
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=4
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=5
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=6
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=7
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=8
> hdfs://ly-pfs/hive/hive_partition/two_partition/day=20240613/hour=9{code}
>
> 3. Execute Refresh hive_partition.two_partition more times
> on Impala 3.4.0, total parititons grow after refresh, partitions grows from
> 34 to 74 after refresh three times
>
> {code:java}
> I0617 17:01:36.244355 18605 CatalogServiceCatalog.java:2225] Refreshing table
> metadata: hive_partition.two_partition
> I0617 17:01:38.033699 18605 HdfsTable.java:995] Reloading metadata for table
> definition and all partition(s) of hive_partition.two_partition (REFRESH
> issued by root)
> I0617 17:01:39.245016 18605 ParallelFileMetadataLoader.java:147] Loading file
> and block metadata for 10 paths for table hive_partition.two_partition using
> a thread pool of size 10
> I0617 17:01:39.336242 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=1, day=20240613/hour=2, and 7 others. Time taken: 91.234ms
> I0617 17:01:39.336658 18605 ParallelFileMetadataLoader.java:147] Refreshing
> file and block metadata for 34 paths for table hive_partition.two_partition
> using a thread pool of size 20
> I0617 17:01:39.435528 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=0, day=20240613/hour=2, and 51 others. Time taken: 99.075ms
> I0617 17:01:39.435572 18605 HdfsTable.java:1026] Incrementally loaded table
> metadata for: hive_partition.two_partition
> I0617 17:01:39.450933 18605 CatalogServiceCatalog.java:2249] Refreshed table
> metadata: hive_partition.two_partition
> I0617 17:01:40.930032 18605 CatalogServiceCatalog.java:2225] Refreshing table
> metadata: hive_partition.two_partition
> I0617 17:01:40.942590 18605 HdfsTable.java:995] Reloading metadata for table
> definition and all partition(s) of hive_partition.two_partition (REFRESH
> issued by root)
> I0617 17:01:41.086898 18605 ParallelFileMetadataLoader.java:147] Loading file
> and block metadata for 10 paths for table hive_partition.two_partition using
> a thread pool of size 10
> I0617 17:01:41.360704 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=1, day=20240613/hour=2, and 7 others. Time taken: 273.775ms
> I0617 17:01:41.361122 18605 ParallelFileMetadataLoader.java:147] Refreshing
> file and block metadata for 34 paths for table hive_partition.two_partition
> using a thread pool of size 20
> I0617 17:01:41.494148 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=0, day=20240613/hour=2, and 61 others. Time taken: 133.243ms
> I0617 17:01:41.494203 18605 HdfsTable.java:1026] Incrementally loaded table
> metadata for: hive_partition.two_partition
> I0617 17:01:41.494494 18605 CatalogServiceCatalog.java:2249] Refreshed table
> metadata: hive_partition.two_partition
> I0617 17:01:42.312086 18605 CatalogServiceCatalog.java:2225] Refreshing table
> metadata: hive_partition.two_partition
> I0617 17:01:42.326767 18605 HdfsTable.java:995] Reloading metadata for table
> definition and all partition(s) of hive_partition.two_partition (REFRESH
> issued by root)
> I0617 17:01:42.516573 18605 ParallelFileMetadataLoader.java:147] Loading file
> and block metadata for 10 paths for table hive_partition.two_partition using
> a thread pool of size 10
> I0617 17:01:42.708683 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=1, day=20240613/hour=2, and 7 others. Time taken: 192.084ms
> I0617 17:01:42.709103 18605 ParallelFileMetadataLoader.java:147] Refreshing
> file and block metadata for 34 paths for table hive_partition.two_partition
> using a thread pool of size 20
> I0617 17:01:42.828385 18605 HdfsTable.java:690] Loaded file and block
> metadata for hive_partition.two_partition partitions: day=20240613/hour=0,
> day=20240613/hour=0, day=20240613/hour=2, and 71 others. Time taken:
> 119.517ms{code}
>
> on impala 4.3.0, select query drop into dead loop to prioritized load
> metadata:
>
> {code:java}
> E0617 17:07:01.926093 4158610 ImpaladCatalog.java:264] Error adding catalog
> object: Error applying incremental updates on table
> hive_partition.two_partition: missing partition ids true, stale partition ids
> false
> Java exception follows:
> org.apache.impala.catalog.TableLoadingException: Error applying incremental
> updates on table hive_partition.two_partition: missing partition ids true,
> stale partition ids false
> at
> org.apache.impala.catalog.HdfsTable.validatePartitions(HdfsTable.java:2066)
> at
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:545)
> at
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:560)
> at
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:186)
> I0617 17:07:01.927062 4158610 impala-server.cc:2176] Catalog topic update
> applied with version: 14 new min catalog object version: 1
> I0617 17:07:03.927403 1511568 StmtMetadataLoader.java:225]
> 9c4e7b92e71da380:855fd23400000000] Waiting for table metadata. Waited for 10
> catalog updates and 12182ms. Tables remaining: [hive_partition.two_partition]
> I0617 17:07:13.931190 1511568 StmtMetadataLoader.java:225]
> 9c4e7b92e71da380:855fd23400000000] Waiting for table metadata. Waited for 20
> catalog updates and 22186ms. Tables remaining: [hive_partition.two_partition]
>
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]