siddharthteotia edited a comment on issue #5490:
URL: 
https://github.com/apache/incubator-pinot/issues/5490#issuecomment-638298974


   One way to use data sets which are nearly similar to production datasets is 
to use the Data Anonymizer tool (part of pinot-tools). This tool will generate 
anonymized data with similar characteristics (cardinality, distribution etc) as 
the given input data. It will also generate the corresponding queries.
   
   So folks with Pinot deployments can do the following:
   
   - Take your production dataset and corresponding queries.
   - Provide them as input to the tool
   - Tool will generate anonymized data and corresponding queries
   
   Publish (hopefully allowed) the anonymized dataset and queries to the open 
source community so that it can be made part of the benchmark. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to