chenboat commented on a change in pull request #6567: URL: https://github.com/apache/incubator-pinot/pull/6567#discussion_r573984359
########## File path: pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotSegmentUploadDownloadRestletResource.java ########## @@ -245,21 +245,28 @@ private SuccessResponse uploadSegment(@Nullable String tableName, FormDataMultiP LOGGER.info("Uploading a segment {} to table: {}, push type {}, (Derived from segment metadata)", segmentName, tableName, uploadType); } - String offlineTableName = TableNameBuilder.OFFLINE.tableNameWithType(rawTableName); + String tableNameWithType; + if (_pinotHelixResourceManager.isRealtimeOnlyTable(rawTableName)) { + tableNameWithType = TableNameBuilder.REALTIME.tableNameWithType(rawTableName); + } else { + tableNameWithType = TableNameBuilder.OFFLINE.tableNameWithType(rawTableName); + } String clientAddress = InetAddress.getByName(request.getRemoteAddr()).getHostName(); LOGGER.info("Processing upload request for segment: {} of table: {} from client: {}, ingestion descriptor: {}", - segmentName, offlineTableName, clientAddress, ingestionDescriptor); + segmentName, tableNameWithType, clientAddress, ingestionDescriptor); - // Skip segment validation if upload only segment metadata - if (uploadType != FileUploadDownloadClient.FileUploadType.METADATA) { + // Skip segment validation if upload only segment metadata or it is a realtime table segment. + // TODO Perform a validation check for realtime segments too. Review comment: For normal Pinot realtime tables, yes, the validation check on the uploaded segment should be done for multiple aspects as you mentioned. Otherwise, problems like duplicated records will arise. For upsert enabled Pinot realtime tables (which is the focus of this work), many issues (especially for duplication) have been resolved already by upsertManager bookeeping. Note that the upsertManager will perform conflict resolution of records of the same primary key. There is still some work to be done on the segment validation. The main one I think is about being careful about not letting the upload segment change the stream consumed offset. So overall, I think the realtime table segment upload can be enabled for upsert tables first. The support for upload for normal realtime tables can be added after a more complete validation is ready. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org