siddharthteotia commented on issue #5490: URL: https://github.com/apache/incubator-pinot/issues/5490#issuecomment-638298974
One way to use data sets which are nearly similar to production datasets is to use the Data Anonymizer tool (part of pinot-tools). This tool will generate anonymized data with similar characteristics (cardinality, distribution etc) as the given input data. It will also generate the corresponding queries. So folks with Pinot deployments can do the following: - Take your production dataset and corresponding queries. - Provide them as input to the tool - Tool will generate anonymized data and corresponding queries Publish (hopefully allowed) the anonymized dataset to the open source community so that it can be made part of the benchmark. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org