Hi,

I don't want to steal Kevin Minder's work, but one of our important
customers got serious performance issues due to this bug.

This ticket is untouched for 7 years, so if you don't mind, I would hand it
over.

Reminder about the bug: When we run a select on a table, Beeline has an
option to do extra coloring and formatting in the console output if we use
the default output format. It wants to highlight the columns that are part
of the primary key. Unfortunately, it does it in a way by querying the
metadata for each column and each row:
https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L92
And it can lead to serious performance problems.

I believe the problem can be solved in two separate ways:
- Firstly, as a quick solution, coloring is disabled by default. So, we can
completely get rid of those isPrimaryKey calls in case of no coloring
required. I started to work on a draft to do that:
https://github.com/apache/hive/blob/master/beeline/src/java/org/apache/hive/beeline/Rows.java#L92
(note it is not the final version, just the validation).
- Secondly, as a long-term solution, we can query the primary keys only
once during a select. And use that information later, at the actual
coloring.

And one note: In my personal opinion, I can accept that there are people
who prefer coloring the output. I see no value in this at all. What do you
think about removing this option from Beeline at all?

Note: there is a screenshot from my test. That simple test had 4 calls:[image:
image.png]

Thanks,
Zsolt Miskolczi

Reply via email to