richardstartin opened a new issue #8009: URL: https://github.com/apache/pinot/issues/8009
Run this query against hybrid quick start: ```sql explain plan for select count(*) from airlineStats where insubquery(OriginAirportID, 'select idset(DestAirportID) from airlineStats') = 1 ``` it prints: ```json { "resultTable": { "dataSchema": { "columnNames": [ "Operator", "Operator_Id", "Parent_Id" ], "columnDataTypes": [ "STRING", "INT", "INT" ] }, "rows": [ [ "BROKER_REDUCE(limit:10)", 0, -1 ], [ "COMBINE_AGGREGATE", 1, 0 ], [ "AGGREGATE(aggregations:count(*))", 2, 1 ], [ "TRANSFORM_PASSTHROUGH()", 3, 2 ], [ "PROJECT()", 4, 3 ], [ "FILTER_EXPRESSION(operator:EQ,predicate:inidset(OriginAirportID,'ATowAAABAAAAAAAZARAAAACXJ5gnnCeiJ6snrSe6J8kn4CcRKBwoJyg7KF0oeSiEKJ0oqCi3KL8oISk3KUEpVSlnKXwpgymHKZMpqim9KcUp2SnhKegp6ynsKfMp+ykCKhsqHSohKigqMCpFKmEqdCp6KqYqriqyKuQq7iryKvsqISsiKykrMSs6KzsrRCtZK2UrZytyK4Qriiu5K8Mr9Cv7KwMsCiwOLBwsIiwsLDMsSSyVLJ8sqSzPLNks7ywFLREtFC1TLVwtYS1iLWgtbi11LXYteS2ALbEtyS3OLf8tAi4vLlkuWy5sLoEukS6xLsUuyS7MLs4u0i7bLtwu4y7nLuwu8C4+L40vny+lL64vuS/oL+ov9i/4LyAwIzAvMDMwNzBlMGcwcjCZMKAwozC+MN8w6zDWMRMyVDJYMlkyWzJcMmAyczKRMpcymTKaMrYywDLgMuUyBTNHM2YzgDOOM5QzrjOwM7wzyDPQM90z6jPwM/czHjQgNDA0NzRBNEw0bjRwNHk0pDStNK40rzS3NL40CTXjNeQ1BjYbNi82MTZDNmo2azZtNow2kja2Nss26TYSNxQ3GzccNyE3KjdxN6w3rjewN7Y34zfxN3k4lziZOJw4uDi8OM846jjuOPA4KzlSOVc5WzldOWE5aDlqOXU5ijmbObM5vznKOd457DnvOfo5+zkVOi06OTo8Omg6cDqKOqg6sDqzOsE6yDreOvg6kTu/O8g7/DsKPBA8FDwdPCk8Mzw0PPc8CD3hPS8+Wj8=') = '1')", 5, 4 ] ] }, "exceptions": [], "numServersQueried": 1, "numServersResponded": 1, "numSegmentsQueried": 1, "numSegmentsProcessed": 0, "numSegmentsMatched": 0, "numConsumingSegmentsQueried": 0, "numDocsScanned": 0, "numEntriesScannedInFilter": 0, "numEntriesScannedPostFilter": 0, "numGroupsLimitReached": false, "totalDocs": 289, "timeUsedMs": 22, "offlineThreadCpuTimeNs": 0, "realtimeThreadCpuTimeNs": 0, "offlineSystemActivitiesCpuTimeNs": 0, "realtimeSystemActivitiesCpuTimeNs": 0, "offlineResponseSerializationCpuTimeNs": 0, "realtimeResponseSerializationCpuTimeNs": 0, "offlineTotalCpuTimeNs": 0, "realtimeTotalCpuTimeNs": 0, "segmentStatistics": [], "traceInfo": {}, "minConsumingFreshnessTimeMs": 0, "numRowsResultSet": 6 } ``` Printing function parameters leaks data when taking an explain plan. The base64 encoded idsets can be deserialised to reveal the values of an entire column, and anyone capable of reading the source code can decode these parameters: ```java public static void main(String... args) throws IOException { ByteBuffer idset = ByteBuffer.wrap(Base64.getDecoder().decode(args[0])).position(1).slice().order(ByteOrder.LITTLE_ENDIAN); RoaringBitmap bitmap = new RoaringBitmap(); bitmap.deserialize(idset); System.err.println(Arrays.toString(bitmap.toArray())); } ``` prints the airline ids, and the subquery could easily have been for social security numbers of users satisfying some condition: ``` [10135, 10136, 10140, 10146, 10155, 10157, 10170, 10185, 10208, 10257, 10268, 10279, 10299, 10333, 10361, 10372, 10397, 10408, 10423, 10431, 10529, 10551, 10561, 10581, 10599, 10620, 10627, 10631, 10643, 10666, 10685, 10693, 10713, 10721, 10728, 10731, 10732, 10739, 10747, 10754, 10779, 10781, 10785, 10792, 10800, 10821, 10849, 10868, 10874, 10918, 10926, 10930, 10980, 10990, 10994, 11003, 11041, 11042, 11049, 11057, 11066, 11067, 11076, 11097, 11109, 11111, 11122, 11140, 11146, 11193, 11203, 11252, 11259, 11267, 11274, 11278, 11292, 11298, 11308, 11315, 11337, 11413, 11423, 11433, 11471, 11481, 11503, 11525, 11537, 11540, 11603, 11612, 11617, 11618, 11624, 11630, 11637, 11638, 11641, 11648, 11697, 11721, 11726, 11775, 11778, 11823, 11865, 11867, 11884, 11905, 11921, 11953, 11973, 11977, 11980, 11982, 11986, 11995, 11996, 12003, 12007, 12012, 12016, 12094, 12173, 12191, 12197, 12206, 12217, 12264, 12266, 12278, 12280, 12320, 12323, 12335, 12339, 12343, 12389, 12391, 12402, 12441, 12448, 12451, 12478, 12511, 12523, 12758, 12819, 12884, 12888, 12889, 12891, 12892, 12896, 12915, 12945, 12951, 12953, 12954, 12982, 12992, 13024, 13029, 13061, 13127, 13158, 13184, 13198, 13204, 13230, 13232, 13244, 13256, 13264, 13277, 13290, 13296, 13303, 13342, 13344, 13360, 13367, 13377, 13388, 13422, 13424, 13433, 13476, 13485, 13486, 13487, 13495, 13502, 13577, 13795, 13796, 13830, 13851, 13871, 13873, 13891, 13930, 13931, 13933, 13964, 13970, 14006, 14027, 14057, 14098, 14100, 14107, 14108, 14113, 14122, 14193, 14252, 14254, 14256, 14262, 14307, 14321, 14457, 14487, 14489, 14492, 14520, 14524, 14543, 14570, 14574, 14576, 14635, 14674, 14679, 14683, 14685, 14689, 14696, 14698, 14709, 14730, 14747, 14771, 14783, 14794, 14814, 14828, 14831, 14842, 14843, 14869, 14893, 14905, 14908, 14952, 14960, 14986, 15016, 15024, 15027, 15041, 15048, 15070, 15096, 15249, 15295, 15304, 15356, 15370, 15376, 15380, 15389, 15401, 15411, 15412, 15607, 15624, 15841, 15919, 16218] ``` This would make it impossible for either a business user to take an explain plan from a production database on behalf of an operator and share it to diagnose a performance problem, or to create a role common in enterprises which gives technical users the ability run diagnostic commands but not access production data, because they can essentially access any data they like combining explain plans and idsets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org