jhyao commented on issue #11948:
URL: https://github.com/apache/pinot/issues/11948#issuecomment-1797769758

   After publishing 1M ids, producer continued to send 2M upsert data with same 
ids as first 1M ids.
   Producer code like this:
   ```python
   def generate_record(id):
       record = {
           'UID': id,
           'UpdatedTime': get_time(),
           'Content': generate_random_string(CONTENT_LENGTH)
       }
       for i in range(1, 11):
           record[f'JTD{i}'] = generate_random_number()
       return json.dumps(record)
   
   for i in range(3):
       for id in range(1_000_000):
           record = generate_record(id)
           producer.send(TOPIC, key=str(id).encode(), value=record.encode())
           if id % 10000 == 0:
               print(f'Published {i} round, {id} messages')
               producer.flush()
   
   producer.flush()
   ```
   
   
   I tested again without upsert, no this issue. So the issue is only on upsert 
table.
   
![image](https://github.com/apache/pinot/assets/17529008/e7847f06-8073-4ffa-bd38-606fc4aa6e10)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to