deemoliu opened a new issue, #8579:
URL: https://github.com/apache/pinot/issues/8579
We lose some data after migrating table to another tenant and rebalancing
with downtime (no server restart). Is server restart a required step after
migrating table to another tenant?
We did the following operation to move a table to another tenant
1. update tenant server in table config.
2. rebalance table with downtime.
After rebalanced, the below queries doesn't return result as expected.
`select count(0), datestr from
rta.rta.panorama_city_minute_metrics
where datestr>=‘2022-03-28’
group by 2 order by 2`
the result is as follows
`
count(*) | datestr
1761336 | 2022-03-28
1438965 | 2022-03-30
1751511 | 2022-03-31
1789216 | 2022-04-01
1796171 | 2022-04-02
1772584 | 2022-04-03
1756764 | 2022-04-04
1756555 | 2022-04-05
`
We have rerun the hive to pinot pipelines multiple times (without restarting
server), the data on 03-29 is still not queryable.
Queries will return different number of records. sometime 03-30 data is also
missing. In external view each servers has three servers.
i looked into the servers holding segments that match the date 2022-03-29,
and it shows the following
`
qiaochu@pinotHost:~$ sudo cat
/var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/metadata.properties
|grep date
segment.dimension.column.names =
city,completed_avg_surge,completed_avg_surge_sub_0,completed_avg_surge_sub_1,completed_trips,datestr,demand_jobs_accepted_rate_sub_0,demand_jobs_count_driver_canceled,demand_jobs_requested,driver_acceptance_rate,driver_accepts,driver_cancellation_rate,driver_offered_dispatches,eyeball_eta_sum,geodriver_accepted,geodriver_arrived,geodriver_dispatched,geodriver_driving_client,geodriver_open,gross_bookings_usd,helix_sessions,helix_sessions_surged,mean_demand_jobs_ata,mean_demand_jobs_ata_sub_0,mean_demand_jobs_ata_sub_1,mean_eta_post_request,mean_eta_post_request_sub_0,mean_eta_post_request_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_0,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_2,minute,non_requesting_non_zeroed_ss,nonzero,pct_surged_demand_jobs,requesting_sessions,shopping_sessions_v1,summary_completed_sessions,summary_
driver_cancel,summary_non_stressed_nrs,summary_requesting_sessions,summary_rider_cancel,summary_stressed_nrs,summary_unfulfilled,supply_minutes,surged_demand_jobs_completed_avg_surge,surged_demand_jobs_completed_avg_surge_sub_0,surged_demand_jobs_completed_avg_surge_sub_1,surged_demand_jobs_count,surged_session_avg_surge_bb8,surged_session_avg_surge_bb8_sub_0,surged_session_avg_surge_bb8_sub_1,surged_unique_session_count_bb8_hp,time_bucket,unique_session_count_bb8_hp,zeroed_sessions,zeroed_shopping_sessions_in_sessions_stats,zeros
segment.datetime.column.names =
column.datestr.cardinality = 1
column.datestr.totalDocs = 1438965
column.datestr.dataType = STRING
column.datestr.bitsPerElement = 1
column.datestr.lengthOfEachEntry = 10
column.datestr.columnType = DIMENSION
column.datestr.isSorted = true
column.datestr.hasNullValue = false
column.datestr.hasDictionary = true
column.datestr.textIndexType = NONE
column.datestr.hasInvertedIndex = true
column.datestr.isSingleValues = true
column.datestr.maxNumberOfMultiValues = 0
column.datestr.totalNumberOfEntries = 1438965
column.datestr.isAutoGenerated = false
column.datestr.defaultNullValue = null
column.datestr.minValue = 2022-03-30
column.datestr.maxValue = 2022-03-30
`
the segment
rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000
should show datestr.minValue 03-29 but it shows as 03-30.
We rerun the hive to Pinot jobs. then we restart the host, and we get the
following
`
qiaochu@streampinot-prod12-dca1:~$ sudo ls -l
/var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/
total 143184
-rwx------ 1 upinot upinot 146537680 Apr 4 20:34 columns.psf
-rwx------ 1 upinot upinot 16 Apr 4 20:34 creation.meta
-rwx------ 1 upinot upinot 13571 Apr 4 20:34 index_map
-rwx------ 1 upinot upinot 60446 Apr 4 20:34 metadata.properties
qiaochu@streampinot-prod12-dca1:~$ sudo cat
/var/upinot/stream-pinot-server/dataDir/rta_panorama_city_minute_metrics_OFFLINE/rta_panorama_city_minute_metrics_1648598399_1648598399_00000/v3/metadata.properties
|grep date
segment.dimension.column.names =
city,completed_avg_surge,completed_avg_surge_sub_0,completed_avg_surge_sub_1,completed_trips,datestr,demand_jobs_accepted_rate_sub_0,demand_jobs_count_driver_canceled,demand_jobs_requested,driver_acceptance_rate,driver_accepts,driver_cancellation_rate,driver_offered_dispatches,eyeball_eta_sum,geodriver_accepted,geodriver_arrived,geodriver_dispatched,geodriver_driving_client,geodriver_open,gross_bookings_usd,helix_sessions,helix_sessions_surged,mean_demand_jobs_ata,mean_demand_jobs_ata_sub_0,mean_demand_jobs_ata_sub_1,mean_eta_post_request,mean_eta_post_request_sub_0,mean_eta_post_request_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_0,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_1,mean_zero_adjusted_eyeball_eta_per_shopping_state_sub_2,minute,non_requesting_non_zeroed_ss,nonzero,pct_surged_demand_jobs,requesting_sessions,shopping_sessions_v1,summary_completed_sessions,summary_
driver_cancel,summary_non_stressed_nrs,summary_requesting_sessions,summary_rider_cancel,summary_stressed_nrs,summary_unfulfilled,supply_minutes,surged_demand_jobs_completed_avg_surge,surged_demand_jobs_completed_avg_surge_sub_0,surged_demand_jobs_completed_avg_surge_sub_1,surged_demand_jobs_count,surged_session_avg_surge_bb8,surged_session_avg_surge_bb8_sub_0,surged_session_avg_surge_bb8_sub_1,surged_unique_session_count_bb8_hp,time_bucket,unique_session_count_bb8_hp,zeroed_sessions,zeroed_shopping_sessions_in_sessions_stats,zeros
segment.datetime.column.names =
column.datestr.cardinality = 1
column.datestr.totalDocs = 1744810
column.datestr.dataType = STRING
column.datestr.bitsPerElement = 1
column.datestr.lengthOfEachEntry = 10
column.datestr.columnType = DIMENSION
column.datestr.isSorted = true
column.datestr.hasNullValue = false
column.datestr.hasDictionary = true
column.datestr.textIndexType = NONE
column.datestr.hasInvertedIndex = true
column.datestr.isSingleValues = true
column.datestr.maxNumberOfMultiValues = 0
column.datestr.totalNumberOfEntries = 1744810
column.datestr.isAutoGenerated = false
column.datestr.defaultNullValue = null
column.datestr.minValue = 2022-03-29
column.datestr.maxValue = 2022-03-29
`
the query result correct result after restarting the host.
We have two questions
1. what's the root cause that 03-29 data are missing in the table.
2. Why segment with the 1648598399_1648598399 timestamp in the name, holding
the 03-30 day's data before we restart server?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]