[GitHub] [incubator-pinot] siddharthteotia commented on issue #5490: Performance benchmark framework

GitBox Wed, 03 Jun 2020 09:12:10 -0700


siddharthteotia commented on issue #5490:
URL: 
https://github.com/apache/incubator-pinot/issues/5490#issuecomment-638298974



   One way to use data sets which are nearly similar to production datasets is 
to use the Data Anonymizer tool (part of pinot-tools). This tool will generate 
anonymized data with similar characteristics (cardinality, distribution etc) as 
the given input data. It will also generate the corresponding queries.
   
   So folks with Pinot deployments can do the following:
   
   - Take your production dataset and corresponding queries.
   - Provide them as input to the tool
   - Tool will generate anonymized data and corresponding queries
   
   Publish (hopefully allowed) the anonymized dataset to the open source 
community so that it can be made part of the benchmark. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] siddharthteotia commented on issue #5490: Performance benchmark framework

Reply via email to