xingyc15 opened a new issue, #8641: URL: https://github.com/apache/pinot/issues/8641
A pinot segment creation failure happened when I run a standalone script for offline ingestion. But the problem is that, this failure didn't raise any exception, but just print an error and mark the task as succeed. We are running this data ingestion as an airflow task, this missing exception pretty much delay us from debugging. Our error log is: > [2022-04-28 00:32:33,981] {pod_launcher.py:149} INFO - Start building IndexCreator! [2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - Failed to generate Pinot segment for file - s3://deepmap-anga-production/metrics/etl_staging/pinot_ingest/map_making_metrics/date=2022-02-22/part-0-0 [2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - shaded.com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: was expecting closing quote for a string value [2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at [Source: (String)"{"extra":"[13508528,13508529,13508604,13594110,13594112,13508467,13508479,13563695,13594105,13508475,13508489,13508494,13594107,13508483,13594109]","missed":"[6900744,6900745,6900746,6900747,6900748,6900748,6900804,6900804,6900804,6900805,6900806,6901088,6901088,6901089,6901090,6901470,6908481,6908886,6911028,6911030,7647532,7647592,8062355,8062356,8062357,8062358,8062359,8062360,8062364,8062365,8062366,8091813,8091819,8091821,8091822,8091823,8091825,8091827,8091828,8091829,8091830,8091838,80918"[truncated 1000 chars]; line: 1, column: 3001] [2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2051) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2038) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:293) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:267) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.ObjectReader._bindAsTree(ObjectReader.java:1770) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndCloseAsTree(ObjectReader.java:1735) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at shaded.com.fasterxml.jackson.databind.ObjectReader.readTree(ObjectReader.java:1422) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.spi.utils.JsonUtils.stringToJsonNode(JsonUtils.java:87) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.segment.local.segment.creator.impl.inv.json.BaseJsonIndexCreator.add(BaseJsonIndexCreator.java:92) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:402) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:243) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.8.0-shaded.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?] [2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at java.util.concurrent.FutureTask.run(Unknown Source) [?:?] [2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?] [2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?] [2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at java.lang.Thread.run(Unknown Source) [?:?] [2022-04-28 00:32:36,383] {pod_launcher.py:149} INFO - Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner [2022-04-28 00:32:36,383] {pod_launcher.py:149} INFO - Initializing PinotFS for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS Here is something I find in the source code [code](https://github.com/apache/pinot/blob/1e90f141282e40f819de806920cc2a836e0e35ba/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java#L284), I saw that this function didn't raise the exception, instead it just print an error. Can you fix this? I suppose it should raise an error and fail the process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org