zclllyybb commented on issue #63669:
URL: https://github.com/apache/doris/issues/63669#issuecomment-4541814928

   Initial triage for `doris-3.1.4-rc02-7f5ba43de6`:
   
   This looks more like a BE clone/rowset-metadata handling bug than a user 
operation problem.
   
   Why:
   - The stack is in replica clone/snapshot conversion: 
`EngineCloneTask::_make_and_download_snapshots()` -> 
`SnapshotManager::convert_rowset_ids()` -> 
`SnapshotManager::_rename_rowset_id()` -> `BaseBetaRowsetWriter::add_rowset()` 
-> `BetaRowset::get_inverted_index_size()`.
   - In 3.1.4-rc02, `BaseBetaRowsetWriter::add_rowset()` calls 
`get_inverted_index_size()` only when the existing rowset meta has an invalid 
`index_disk_size` (`index_size < 0 || index_size > total_size * 2`). So 
entering this function does not mean the table has a user-created inverted 
index.
   - The posted failure is at `be/src/olap/rowset/beta_rowset.cpp:80`, before 
any inverted-index file is checked. That line only means `_rowset_meta->fs()` 
returned null.
   - In this version, `RowsetMeta::fs()` can return null if it cannot resolve 
the tablet from `ExecEnv::get_tablet(tablet_id())` while building the 
filesystem wrapper. During clone rowset-id conversion, the downloaded source 
rowset is being rewritten in a temporary clone directory, so this is a 
plausible failure point when invalid index-size metadata forces the fallback 
size recalculation. The empty `resource_id` in the message is normal for local 
rowsets and is not enough by itself to indicate a remote-storage resource 
problem.
   
   Current judgment: the user's workaround of rewriting the affected table into 
a new table makes sense because it creates new rowsets with fresh metadata. 
However, the clone path should probably handle this case; a table without 
inverted indexes should not fail replacement just because old rowset meta has 
invalid index-size fields.
   
   Information needed to confirm:
   1. The BE log lines immediately before this stack, especially any `invalid 
index size:` and `get tablet failed:` warnings.
   2. The affected `tablet_id`, `schema_hash`, and `rowset_id` from the clone 
task logs.
   3. `SHOW CREATE TABLE` for an affected table.
   4. The version history of the affected table/cluster, especially whether the 
rowsets were created before 2.1.5 or 3.0.0, and whether storage policy, remote 
storage, or table-level encryption is enabled.
   
   Suggested maintainer next step:
   - Reproduce with a local rowset snapshot whose rowset meta has invalid 
`index_disk_size`, no inverted indexes, and a source tablet id not registered 
on the destination BE during `SnapshotManager::_rename_rowset_id()`.
   - The fix direction should be to make the invalid-index-size recovery path 
in clone use a filesystem/metadata source valid for the cloned rowset, or 
skip/fallback safely for no-inverted-index schemas, instead of failing at 
`_rowset_meta->fs()`.
   
   Breakwater-GitHub-Analysis-Slot: slot_8b7cb4e30532
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to