mcvsubbu opened a new issue #5346: URL: https://github.com/apache/incubator-pinot/issues/5346
Recently, a bug was introduced in PR #5132 that led to bad results when executing queries that involved indexed and non-indexed columns in a certain combination. We had a query in our integration test that tested this combination, but then queries are run at random from the query file (I think it is 100 queries out of 10k). So, some runs of travis would fail, but a re-run could well pass. The bug was discovered in LinkedIn's test environment (where the mvn test command is run repeatedly and multiple failures were noticed). The bug was fixed in PR #5328 So, we had the test for it, and still could not catch the bug. I did a test on my desktop to see how much time would be taken if we enabled all 10k queries to be run, and it came to about 8 hours! The chief time-consumers were as below: ```testQueriesFromQueryFile(org.apache.pinot.integration.tests.HybridClusterIntegrationTest) Time elapsed: 2,470.777 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.HybridClusterIntegrationTest) Time elapsed: 1,942.148 sec testQueriesFromQueryFile(org.apache.pinot.integration.tests.FlakyConsumerRealtimeClusterIntegrationTest) Time elapsed: 2,536.809 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.FlakyConsumerRealtimeClusterIntegrationTest) Time elapsed: 1,992.969 sec testQueriesFromQueryFile(org.apache.pinot.integration.tests.ConvertToRawIndexMinionClusterIntegrationTest) Time elapsed: 2,509.986 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.ConvertToRawIndexMinionClusterIntegrationTest) Time elapsed: 1,972.634 sec testQueriesFromQueryFile(org.apache.pinot.integration.tests.MultiNodesOfflineClusterIntegrationTest) Time elapsed: 2,522.219 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.MultiNodesOfflineClusterIntegrationTest) Time elapsed: 1,957.898 sec testQueriesFromQueryFile(org.apache.pinot.integration.tests.LLCRealtimeClusterIntegrationTest) Time elapsed: 2,497.354 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.LLCRealtimeClusterIntegrationTest) Time elapsed: 1,968.099 sec testQueriesFromQueryFile(org.apache.pinot.integration.tests.RealtimeClusterIntegrationTest) Time elapsed: 2,521.568 sec testSqlQueriesFromQueryFile(org.apache.pinot.integration.tests.RealtimeClusterIntegrationTest) Time elapsed: 2,001.346 sec ``` We do disable some tests in travis: ``` <excludes> <!-- Covered by FlakyConsumerRealtimeClusterIntegrationTest --> <exclude>**/RealtimeClusterIntegrationTest.java</exclude> <!-- Covered by ConvertToRawIndexMinionClusterIntegrationTest --> <exclude>**/HybridClusterIntegrationTest.java</exclude> </excludes> ``` But the remaining tests still add a lot of time. The goal is to be able to discover problems early, preferably before merging. Some randomness cannot be avoided (e.g. generation of data), so we will live with that. One way to get this is to comb the 10k queries to select a few hundred that can be enough to detect all issues. And then we add queries to this selected list as we find more bugs. Another way is to increase travis time to whatever number of hours needed to get all 10k queries to run. This is probably not desirable, and will slow us down. Another possibility is to set up a regular run of the full stuff (say, once a day) so that we catch the issue next day. But then we will still be left with the commit in the master. Thoughts? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org