zclllyybb commented on issue #63669: URL: https://github.com/apache/doris/issues/63669#issuecomment-4541814928
Initial triage for `doris-3.1.4-rc02-7f5ba43de6`: This looks more like a BE clone/rowset-metadata handling bug than a user operation problem. Why: - The stack is in replica clone/snapshot conversion: `EngineCloneTask::_make_and_download_snapshots()` -> `SnapshotManager::convert_rowset_ids()` -> `SnapshotManager::_rename_rowset_id()` -> `BaseBetaRowsetWriter::add_rowset()` -> `BetaRowset::get_inverted_index_size()`. - In 3.1.4-rc02, `BaseBetaRowsetWriter::add_rowset()` calls `get_inverted_index_size()` only when the existing rowset meta has an invalid `index_disk_size` (`index_size < 0 || index_size > total_size * 2`). So entering this function does not mean the table has a user-created inverted index. - The posted failure is at `be/src/olap/rowset/beta_rowset.cpp:80`, before any inverted-index file is checked. That line only means `_rowset_meta->fs()` returned null. - In this version, `RowsetMeta::fs()` can return null if it cannot resolve the tablet from `ExecEnv::get_tablet(tablet_id())` while building the filesystem wrapper. During clone rowset-id conversion, the downloaded source rowset is being rewritten in a temporary clone directory, so this is a plausible failure point when invalid index-size metadata forces the fallback size recalculation. The empty `resource_id` in the message is normal for local rowsets and is not enough by itself to indicate a remote-storage resource problem. Current judgment: the user's workaround of rewriting the affected table into a new table makes sense because it creates new rowsets with fresh metadata. However, the clone path should probably handle this case; a table without inverted indexes should not fail replacement just because old rowset meta has invalid index-size fields. Information needed to confirm: 1. The BE log lines immediately before this stack, especially any `invalid index size:` and `get tablet failed:` warnings. 2. The affected `tablet_id`, `schema_hash`, and `rowset_id` from the clone task logs. 3. `SHOW CREATE TABLE` for an affected table. 4. The version history of the affected table/cluster, especially whether the rowsets were created before 2.1.5 or 3.0.0, and whether storage policy, remote storage, or table-level encryption is enabled. Suggested maintainer next step: - Reproduce with a local rowset snapshot whose rowset meta has invalid `index_disk_size`, no inverted indexes, and a source tablet id not registered on the destination BE during `SnapshotManager::_rename_rowset_id()`. - The fix direction should be to make the invalid-index-size recovery path in clone use a filesystem/metadata source valid for the cloned rowset, or skip/fallback safely for no-inverted-index schemas, instead of failing at `_rowset_meta->fs()`. Breakwater-GitHub-Analysis-Slot: slot_8b7cb4e30532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
