Rushabh Shah created PHOENIX-7002:
-------------------------------------

             Summary: Insufficient logging in phoenix client when server throws 
StaleRegionBoundaryCacheException.
                 Key: PHOENIX-7002
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7002
             Project: Phoenix
          Issue Type: Bug
            Reporter: Rushabh Shah
            Assignee: Rushabh Shah


Saw an incident in production cluster where phoenix returned result outside of 
the range provided by the customer. There were hbck repair runs going on while 
the query was running. During the start of the query, there were region holes 
in the table (no way to confirm) and while the query was still running we ran 
hbck repair operation and that caused region overlaps (This is confirmed since 
overlap continued after the query). 
But the sad part is there were absolutely no exceptions/errors/stack trace on 
the client or server side.
After the query is run we log the execution time, number of exception 
encountered as a log line. There we see this query encountered 
[StaleRegionBoundaryCacheException|https://github.com/apache/phoenix/blob/4.16/phoenix-core/src/main/java/org/apache/phoenix/monitoring/MetricType.java#L57].

There is some logic in BaseResultIterators where we adjust the start and end 
key range for the scan. See 
[here|https://github.com/apache/phoenix/blob/4.16/phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L688-L730]

Without knowing the state of meta known or exception encountered, it is very 
difficult to debug why this happened.

At the very least, we would want to log all the exceptions on the phoenix 
client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to