Hi all,

I was running an experiment which involved counting terms by day, so I was 
using pivot facets to get the counts. However as the number of time and term 
values increased the performance got very rubbish. So I knocked up a quick 
test, using a collection of 1 million documents with a different number of 
random values, to compare different ways of getting the counts.

1) Combined = combining the time and term in a single field.
2) Facet = for each term set the query to the term and then get the time facet 
3) Pivot = get the pivot facet.

The results show that, as the number of values (i.e. number of terms * number 
of times) increases, everything is fine until around 100,000 values and then it 
goes pair-shaped for pivots, taking nearly 4 minutes for 1 million values, the 
facet based approach produces much more robust performance.

          |      Processing time in ms     |
Values    |  Combined|     Facet|     Pivot|
9         |       144|       391|        62|
100       |       170|       501|        52|
961       |       789|      1014|       153|
10000     |       907|      1966|       940|
99856     |      1647|      3832|      1960|
499849    |      5445|      7573|    136423|
999867    |      9926|      8690|    233725|


In the end I used the facet rather than pivot approach but I’d like to know why 
pivots have such a catastrophic performance crash? Is this an expected 
behaviour of pivots or am I doing something wrong?

N

Reply via email to