abstractdog commented on code in PR #6376:
URL: https://github.com/apache/hive/pull/6376#discussion_r2975444614
##########
ql/src/java/org/apache/hadoop/hive/ql/cache/results/QueryResultsCache.java:
##########
@@ -549,6 +578,39 @@ public boolean setEntryValid(CacheEntry cacheEntry,
FetchWork fetchWork) {
return false;
}
+ if (isSafeCacheWriteEnabled) {
Review Comment:
@ramitg254 : thanks for working on this so far
I'm not sure if the approach fully addresses what has been reported: as far
as I can understand, there is a safe buffer directory, where the files are
placed, and this safe folder is on the same storage, but that's not the only
issue, it's rather that this doesn't prevent big files actually landing on the
filesystem that holds the cache
the original report showed something like this:
```
du -h -d 1
/efs/tmp/hive/_resultscache_/results-9d89cc59-c99d-46a5-9d93-2b550576532012.0K
./66356edb-57a6-4f0a-90cd-7d14d9e2b739
...
1.1T ./0fe343fb-6a89-4d28-b2fd-caed2f2e42f6
...
1.1T .
```
I missed something to double-check before creating the jira: if the
"0fe343fb-6a89-4d28-b2fd-caed2f2e42f6" folder belongs to a finished query
result? if so - and given that it clearly exceeded the configured 2G max cache
size - query results cache should have taken care of that, so I think the
original problem/usecase should be investigated thoroughly first
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]