This is an automated email from the ASF dual-hosted git repository. aswinshakil pushed a commit to branch HDDS-10239-container-reconciliation in repository https://gitbox.apache.org/repos/asf/ozone.git
commit 7c591b41984744aeff8f60a437d7bdddf27b988e Merge: 989746a0de a689ec850f Author: Aswin Shakil Balasubramanian <[email protected]> AuthorDate: Fri Apr 25 23:06:16 2025 +0530 Merge branch 'master' of https://github.com/apache/ozone into HDDS-10239-container-reconciliation Commits: 320 a689ec850f Revert "HDDS-12173. Follow RocksDB basic tuning guide (#8206)" 5c5db8e9bc HDDS-12897. Enable EmptyLineSeparator checkstyle rule (#8329) 68452d0573 HDDS-12893. addendum: cp: warning: behavior of -n is non-portable and may change in future; use --update=none instead (#8334) 6852e3eb93 HDDS-12861. Add AvoidStringBufferField PMD rule (#8312) c21ec5b645 HDDS-12880. Move field declarations to start of class in hdds-server-framework (#8310) 56b5b02317 HDDS-12707. Recon - In-memory extraction of om tarball from network input stream (#8212) 64dbf44d3c HDDS-12893. cp: warning: behavior of -n is non-portable and may change in future; use --update=none instead (#8328) 250d3c1734 HDDS-12891. OMKeyAclRequestWithFSO is incorrectly setting full path as key name. (#8326) f6597d1b2d HDDS-12888. Add negative test cases for FS operations on OBS buckets (#8323) cdab4eb42d HDDS-12173. Follow RocksDB basic tuning guide (#8206) 6f25b439b1 HDDS-12347. Fix spotbugs warnings in integration tests (#8298) e83569f80b HDDS-12816. Ozone debug replicas chunk-info has incorrect or None in FileLocations (#8289) 621f018134 HDDS-12624. Fix pipeline limit check to prevent incorrect pipeline creation (#8216) dd8950bd96 HDDS-12340. Remove unnecessary findbugs exclusions (#8309) 0d88da0b71 HDDS-12723. Handle Volume Db failure in volume scanner (#8256) 434a5d539a HDDS-12823. SnapshotDiffReportOzone#fromProtobuf empty token handling (#8280) 7c23cbdf66 HDDS-12504. Replace calls to deprecated RandomStringUtils methods (#8306) 3ee57e6052 HDDS-12813. Replace calls to deprecated RandomUtils methods (#8302) 84b2ff6d66 HDDS-12876. Bump awssdk to 2.31.25 (#8304) db9076e11e HDDS-12867. Replace hard-coded namespace URL with constant S3_XML_NAMESPACE (#8299) 97b35eeddc HDDS-12874. Bump picocli to 4.7.7 (#8303) 3eed6a6f37 HDDS-12847. Use DatanodeID instead of DatanodeDetails.getUuidString (#8293) 205cb2cc4d HDDS-12846. Log DatanodeDetails instead of DatanodeDetails.getUuidString (#8295) 213c2fe6a0 HDDS-11734. Bump maven-compiler-plugin to 3.14.0 (#8301) 889f3b7de8 HDDS-10284. Move GenericTestUtils#getTempPath to MiniOzoneCluster (#8300) e613598768 HDDS-12113. Move HAProxy test to HA environment (#8271) d8a391558f HDDS-12413. Move field declarations to start of class in hdds-container-service (#7968) 5c91b44ad1 HDDS-8802. Added pagination support for ListSnapshotDiff jobs (#8124) e9e149336b HDDS-12704. Add missing audit logs for SCM client operations (#8251) e8af186de1 HDDS-12152. Stop testing with Hadoop 3.1.2 (#7773) f6a4a48b0b HDDS-12840. Avoid long containerId in KeyValueContainer (#8291) 3c9e1465e9 HDDS-12350. Reduce duplication between OmBucketReadWrite tests (#8241) 93613c1cae HDDS-12734. Enable native lib in CI checks (#8190) 3d4b5fdf55 HDDS-6631. Fix typos in output/exception messages (#8294) 978dd717f8 HDDS-12806. Replace commons-logging with jcl-over-slf4j (#8265) ef8e0da2c1 HDDS-12524. Reuse TestDataUtil.createKey in more tests (#8270) b91e8e732f HDDS-12518. Auto-compact tables which can tend to be large in size at intervals (#8260) 55f6924a1f HDDS-12845. Reuse TestDataUtil.createKey in OzoneRpcClientTests (#8274) 8a4deeb1cf HDDS-12827. Move out NodeStateMap.Entry and ContainerMap.Entry (#8273) 62df3069f8 HDDS-12493. Move container upgrade under repair (#8205) b33ed23605 HDDS-12412. Make hdds-common compliant with FieldDeclarationsShouldBeAtStartOfClass (#7967) a95d3389e0 HDDS-12740. Use DatanodeID in HddsTestUtils (#8281) f26e2f06a3 HDDS-12771. xcompat fails if run in itself due to undefined OZONE_CURRENT_VERSION (#8236) c7117dcc17 HDDS-11974. Split Container Safemode Rule into Ratis & EC Container Safemode Rules (#7951) bb2c63c502 HDDS-12803. OmKeyInfo#isKeyInfoSame should handle object tags (#8259) a27c1cf3a6 HDDS-12837. Bump vite to 4.5.13 (#8283) 9b48b6e8c1 HDDS-12838. Bump awssdk to 2.31.21 (#8267) 3d8644e172 HDDS-12830. Add RocksDatabaseException. (#8277) 9e4da5bac2 HDDS-12821. Update Build from Source user doc. (#8262) 5292ac57d1 HDDS-12836. Bump junit to 5.12.2 (#8269) e6b9d5a123 HDDS-12825. ReconIncrementalContainerReportHandler is not synchronized on datanode. (#8272) a92fe59ec4 HDDS-12581. Multi-threaded Log File Parsing with Batch Updates to DB (#8254) eed5924251 HDDS-12770. Use ContainerID instead of Long in CONTAINER_IDS_TABLE. (#8247) 7dfd8c1a78 HDDS-12060. Replace System.currentTimeMillis() with Time.monotonicNow() for duration calculation (#8096) a21e362b6a HDDS-12547. Container creation and import use the same VolumeChoosingPolicy (#8090) 22734a91f2 HDDS-12461. Bump Ranger to 2.6.0 (#8120) 0d6231c2b3 HDDS-12801. SCM should remove pipeline before DN. (#8261) fdebdacc18 HDDS-12761. Add missing network configuration properties in ozone-default.xml (#8257) 7062685609 HDDS-12805. Use slf4j for HTTP request log (#8255) 7ff8ad7a13 HDDS-12755. Redundant declaration in TestHadoopNestedDirGenerator#spanCheck() (#8218) fbb8706a7b HDDS-12145. Remove unused config hdds.container.ratis.replication.level (#8252) 691383526a HDDS-12519. Generate auto-complete script for Ozone commands (#8030) 25dda2d954 HDDS-12408. Create mixin for ContainerID list parameters (#7970) f22f32e05d HDDS-12764. NodeDecommissionManager#monitorInterval should get with ms unit from config (#8250) dc9952e446 HDDS-12746. Reduce visibility of Logger members (#8210) 102ae3fd26 HDDS-8660. Notify ReplicationManager when nodes go dead or out of service (#7997) a1345f9fdd HDDS-12368. Seek to correct start key in KeyManagerImpl#getTableEntries (#7925) 1ebf2652b1 HDDS-12580. Set up Temporary DB for Storing Container Log Information (#8072) 189fbdbe4c HDDS-12463. Add perf metrics in SCM for allocateBlock and openPipeline (#8111) fa8afa339c HDDS-12741. Recon UI should show full path from ContainerKeys API response (#8200) a0387afc88 HDDS-12800. Bump aspectj-maven-plugin to 1.14.1 (#8230) 5733c5556e HDDS-11799. Remove config hdds.scm.safemode.pipeline-availability.check (#8095) c962b98cef HDDS-12404. Grafana dashboard for snapshot metrics (#7954) bf20540044 HDDS-12458. Refactor DataNodeSafeModeRule to use NodeManager (#7998) 25b7102e30 HDDS-12775. flaky-test-check builds the workflow branch (#8237) 2a4de14eb8 HDDS-12772. Configure initial heartbeat and first election time for quicker MiniOzoneCluster startup (#8235) 0f5b59089b HDDS-12691. Calculation of committed space in Datanode seems incorrect (#8228) 6d0a8306ab HDDS-12721. Log more details about volumes (#8181) 1bee369bb2 HDDS-12525. Replace some TestHelper#createKey usage with TestDataUtil#createKey (#8233) 5834fcf7c8 HDDS-10091. TestOzoneDebugShell fails with FILE_SYSTEM_OPTIMIZED bucket layout (#8227) 38229439fa HDDS-12768. Bump vite to 4.5.12 (#8234) b6dac4ae33 HDDS-12766. Bump awssdk to 2.31.16 (#8232) 0ef1f0208e HDDS-12767. Bump jacoco to 0.8.13 (#8231) 2bbeaafe1f HDDS-12700. Upgrade test overwrites previous result (#8229) 0a558c15e0 HDDS-12758. Error in OmUtils.normalizeKey for key name starting with `//` (#8225) fee8817193 HDDS-12759. Bump vitest to 1.6.1 (#7828) 1f716eae7d HDDS-12699. Bump vite to 4.5.11 (#8224) abc1e02834 HDDS-12757. Duplicated declaration of dnsInterface in HddsUtils (#8222) 3fb57c1695 HDDS-12037. Removing unit from quota namespace (#8148) ebcece4272 HDDS-11038. Add Helm Chart to the Ozone on Kubernetes doc (#8220) bba8a67831 HDDS-12750. Move StorageTypeProto from ScmServerDatanodeHeartbeatProtocol.proto to hdds.proto (#8208) bc19a4c18a HDDS-12711. Limit number of excluded SST files logged at info level (#8186) 6bcebe8a0c HDDS-4517. Remove leftover references to RaftServerImpl (#8223) 868a2376fe HDDS-12756. Speed up TestReconfigShell and TestOzoneDebugShell (#8219) 40d02b9165 HDDS-12639. Add info for TimeoutException (#8113) 5f74da38ff HDDS-12760. Mark TestContainerReplication#testImportedContainerIsClosed as flaky ed0111582f HDDS-9241. Document S3 Gateway REST Secret Revoke command (#8221) 479f6d949f HDDS-12690. Remove om.fs.snapshot.max.limit from RDBStore and OmMetadataManagerImpl (#8195) 0f52a34d0c HDDS-12559. Implement Bulk Ozone Locks for taking locks on multiple snapshots (#8052) de73c00a8d HDDS-12738. Refactor AbstractContainerReportHandler and its subclasses. (#8207) 30e4aa44d8 HDDS-12748. Remove unused config ozone.manager.db.checkpoint.transfer.bandwidthPerSec (#8213) 693548b037 HDDS-12676. Prefer minFreeSpace if minFreeSpacePercent is also defined (#8180) 092fe403c4 HDDS-12751. native build fails with CMake 4 (#8215) 1433d0c047 HDDS-1480. Prefer resolved datanode ip address over persisted ip address (#7495) af5301e380 HDDS-12233. Atomically import a container (#7934) 2475949d1b HDDS-11107. Remove unnecessary run_test_scripts call in upgrade tests (#8201) 20a13da5ef HDDS-12615. Failure of any OM task during bootstrapping of Recon needs to be handled (#8098) 7fb92b0144 HDDS-12717. Combine nodeMap and nodeToContainer in NodeStateMap. (#8199) fc6776db33 HDDS-12739. Replace the getNodeByUuid methods in NodeManager. (#8202) 0d40030469 HDDS-12698. Unused FailureService in MiniOzoneChaosCluster (#8197) 400835fd9e HDDS-12736. Bump hadoop-thirdparty to 1.4.0 (#8193) a540684346 HDDS-12720. Use DatanodeID in SimpleMockNodeManager (#8198) 604576a0c9 HDDS-12735. Unused rocksDBConfiguration variable in `OmMetadataManagerImpl#start` (#8196) 64318f78a7 HDDS-12498. Allow limiting flaky-test-check to specific submodule (#8194) 3e7b5eb443 HDDS-12719. Remove the setContainers(..) method from NodeManager. (#8191) 78e2e73158 HDDS-12724. hdds-rocks-native build fails if JAVA_HOME not set (#8183) fbc696cc44 HDDS-12594. Optimize replica checksum verifier (#8151) 4411e35d53 HDDS-12733. Bump axios to 0.30.0 (#8189) 31aabc1efd HDDS-12732. Bump awssdk to 2.31.11 (#8188) d6e2509771 HDDS-12731. Bump restrict-imports-enforcer-rule to 2.6.1 (#8187) 4213307637 HDDS-12648. Fix checkLeaderStatus in removeOMFromRatisRing and addOMToRatisRing (#8142) e4e95470d3 HDDS-12709. Intermittent failure in Balancer acceptance test (#8182) c1d5b4f390 HDDS-12718. Use NodeStateMap in MockNodeManager instead of test-specific Node2ContainerMap (#8179) 65a9d6ce31 HDDS-12587. Detect test class module in flaky-test-check (#8162) aac938306b HDDS-12529. Clean up code in AbstractContainerReportHandler. (#8033) 68913156a7 HDDS-12603. Fix ContainerTable keys re-rendering issue (#8163) 2a1a6bf124 HDDS-12670. Improve encapsulation of volume spare space check (#8167) 05b6eb3a61 HDDS-12696. Replace link to Hadoop with Ozone in httpfs site.xml (#8177) 7e87a8a890 HDDS-11879. MiniKMS fails with ClassNotFoundException: com.sun.jersey....ServletContainer (#8158) 841d297104 HDDS-12592. Remove find missing padding check in favor of of block metadata check (#8145) ce003e898b HDDS-12702. Move checknative under ozone debug (#8170) 6ee9c2bbb3 HDDS-8007. Add more detailed stages for SnapDiff job progress tracking (#8010) 3ebe5e7cd8 HDDS-12660. Allow --verbose option of GenericCli at leaf subcommands (#8166) c8b3ccb01b HDDS-12528. Create new module for S3 integration tests (#8168) 435dee9896 HDDS-12703. Close pipeline command should display error on failure (#8169) deb8e7c472 HDDS-12705. Replace Whitebox with HddsWhiteboxTestUtils (#8172) 0d8aecc01e HDDS-12327. HDDS-12668. Fix HSync upgrade test failure in non-HA upgrade test (#8171) bef4beed82 HDDS-12239. (addendum) Volume should not be marked as unhealthy when disk full - fix compile error e3e47ea41e HDDS-12235. Reserve space on DN during container import operation. (#7981) 4ffae705c7 HDDS-12239. Volume should not be marked as unhealthy when disk full (#7830) 0781cbede3 HDDS-12694. Disable TestMiniChaosOzoneCluster after fixing init and shutdown (#8159) 92aa71debf HDDS-12687. Avoid ambiguity in URI descriptions (#8155) 536701649e HDDS-12679. Merge VolumeInfo into StorageVolume (#8147) 2cf6d59e1e HDDS-12553. ozone admin container list should output real JSON array (#8050) 63fcb271a7 HDDS-12465. Intermittent failure in TestOzoneFileSystemMetrics (#8130) bab26a43fa HDDS-12684. Update NOTICE and LICENSE file (#8160) 857cb768b7 HDDS-12327. Restore non-HA (to HA) upgrade test (#7880) 587e9ff1be HDDS-12320. Collect OM performance metrics for FSO key delete (#7883) 560d017bc7 HDDS-12500. Do not skip JUnit tests in post-commit runs (#8024) 273a627692 HDDS-12686. Remove output of OzoneAddress in --verbose mode CLI (#8153) 4d3d834c60 HDDS-12650. Added logs to SnapshotDeletingService to indicate skipped snapshots. (#8123) 752473628d Revert "HDDS-12528. Create new module for S3 integration tests (#8152)" d44ebf564c HDDS-12528. Create new module for S3 integration tests (#8152) 3739b0597b HDDS-12604. Reduce duplication in TestContainerStateMachine (#8104) a1616ae8a2 HDDS-12485. Repair tool should only print user warning for offline commands (#8140) de5c0a385e HDDS-12668. HSync upgrade test failure (#8137) bee10f0731 HDDS-12671. Include .editorconfig and .run in source tarball (#8139) 8e7d370ebf HDDS-12666. Remove -SNAPSHOT from OZONE_CURRENT_VERSION in upgrade test (#8136) 4e0a76464c HDDS-12486. Warmup KMS encrypted keys when OM starts (#8081) 09109978ac HDDS-12636. Reduce code duplication for tarball creation (#8121) cbafa02918 HDDS-12642. ACL test assertions depend on JSON element order (#8143) 8691adfc44 HDDS-12622. Refactor minFreeSpace calculation (#8119) 1b0e912f53 HDDS-12661. Standardize Maven module names (#8129) ebf5cc662b HDDS-12662. Rename upgrade callback directory 1.5.0 to 2.0.0 (#8131) 828b2d116b HDDS-12667. Bump awssdk to 2.31.6 (#8134) 2596b3be08 HDDS-12665. Bump zstd-jni to 1.5.7-2 (#8133) 0e079832ae HDDS-12664. Bump copy-rename-maven-plugin to 1.0.1 (#8132) 31735012d1 HDDS-12462. Use exclude rules for defining shaded filesystem jar contents (#8008) ee6201beb4 HDDS-12341. Share cluster in client tests (#8126) bd579b355f HDDS-12646. Improve OM decommission check (#8122) 482024e6bc HDDS-12569. Extract MiniOzoneCluster to separate module (#8067) 7a3ad16568 HDDS-12641. Move Lease to hdds-server-framework (#8128) e805c15c74 HDDS-12310. Online repair command to perform compaction on om.db (#7957) 7b9c15276a HDDS-12473. Trim duplicate space in proto message definition (#8005) 918bb9809c HDDS-12626. Move the compare method in NodeStatus to ECPipelineProvider. (#8116) bf5a2f2209 HDDS-12644. Create factory method for OzoneAcl (#8115) a646b85b82 HDDS-12557. Add progress indicator for checkpoint tarball in leader OM (#8085) 9e23d5989b HDDS-12627. NodeStateMap may handle opStateExpiryEpochSeconds incorrectly. (#8117) d19f3bc992 HDDS-12640. Move GrpcMetrics to hdds-server-framework (#8114) 1c0b445a5d HDDS-12426. SCM replication should check double of container size. (#8080) 157cca451b HDDS-12358. Intermittent failure in compatibility acceptance test (#8012) 9b25d029b5 HDDS-12549. refactor ratis request to common place (#8059) 3349248d9d HDDS-12588. Recon Containers page shows number of blocks, not keys (#8074) 693d0f483e HDDS-12589. Fix Incorrect FSO Key Listing for Container-to-Key Mapping. (#8109) 17dbc4f134 HDDS-12585. Recon ContainerHealthTask ConstraintViolationException error handling. (#8070) 53c0a320a3 HDDS-12620. Fix OM Mismatch Deleted Container API (#8102) 45c900d5d9 HDDS-12619. Optimize Recon OM Container Mismatch API (#8101) 57fda0c327 HDDS-12633. KEY_NOT_FOUND in OzoneRpcClientTests for LEGACY bucket with enableFileSystemPaths=true (#8108) 23b0505d2e Revert "HDDS-12630. Enable GitHub Discussions in asf.yml (#8107)" b402b7c1ad HDDS-12621. Change NodeStatus to value-based. (#8105) 20aeda57f3 HDDS-12601. Unknown tarball cleanup for Recon OM DB snapshot. (#8084) 93cd5aa442 HDDS-12630. Enable GitHub Discussions in asf.yml (#8107) 61a36f6312 HDDS-12373. Use File.getUsableSpace() instead of File.getFreeSpace() to calcuate usedSpace in DedicatedDiskSpaceUsage (#7927) 953e718872 HDDS-12565. Treat volumeFreeSpaceToSpare as reserved space (#8086) 8489cc86b5 HDDS-11735. Update ozone-default.xml for volume choosing policy (#8103) 13c5be850b HDDS-12577. [Ozone 2.0] Update master branch version number. 1a9d9f77f7 HDDS-12446. Add a Grafana dashboard for low level RocksDB operations. (#7992) c9990cacaa HDDS-12576. [Ozone 2.0] Update proto.lock files (#8064) b68b94cc08 HDDS-12617. Use DatanodeID as keys in NodeStateMap. (#8100) b71d4086b1 Revert "HDDS-12589. Fix Incorrect FSO Key Listing for Container-to-Key Mapping. (#8078)" 68e3842616 HDDS-12589. Fix Incorrect FSO Key Listing for Container-to-Key Mapping. (#8078) c8e77f8f06 HDDS-12602. Intermittent failure in TestContainerStateMachine.testWriteFailure (#8099) 08ac32dc23 HDDS-12608. Race condition in datanode version file creation (#8093) e5ef35d815 HDDS-12582. TypedTable support using different codec (#8073) dc47897721 HDDS-12611. Snapshot creation is removing extra keys from AOS's DB (#8094) ff3ef5112e HDDS-12551. Replace dnsToUuidMap with dnsToDnIdMap in SCMNodeManager. (#8087) 28520a7e01 HDDS-12614. Configurable java version in flaky-test-check with default to 21 (#8097) 434d5bd248 HDDS-12616. Bump junit to 5.12.1 (#8088) 699ee88683 HDDS-12610. Bump awssdk to 2.31.1 (#8089) 0a85f9b3ce HDDS-12602. Mark TestContainerStateMachineLeader/Follower as flaky 03c80f8d9b HDDS-12591. Include ContainerInfo in ContainerAttribute. (#8083) 49a2c853d2 HDDS-12097. Enhance Container Key Mapper for Faster Processing. (#8002) 5b7f96c21d HDDS-12303. Move ozone.om.user.max.volume into OmConfig (#8082) afc40430c0 HDDS-12566. Handle Over replication of Quasi Closed Stuck containers (#8061) eb96ff4892 HDDS-12590. Used db name as the threadNamePrefix. (#8076) cf5bad75cf HDDS-12539. Enable some String-related rules in PMD (#8047) d26f711e27 HDDS-12533. Offline repair command for generic rocksDB compaction (#8039) 93563e99d6 HDDS-12573. Pipeline#toString should separate ReplicaIndex from next node UUID. (#8063) 6e40831f27 HDDS-12572. Remove the ContainerID parameter when it has ContainerReplica. (#8075) 786da39cf5 HDDS-12057. Implement command ozone debug replicas verify checksums (#7748) 66bc7eaa3d HDDS-12535. Intermittent failure in TestContainerReportHandling (#8060) 87a674c875 HDDS-12555. Combine containerMap and replicaMap in ContainerStateMap. (#8057) f3689b692c HDDS-12458. Show safemode rules status irrespective of whether SCM is in safe mode in verbose mode. (#8049) 7d8c771aad HDDS-12469. Mark statemachine unhealthy for write operation timeout. (#8022) 9aa41fdbaf HDDS-12574. Add script to find modules by test classes (#8062) 5e28acb71d HDDS-12537. Selective checks: skip tests for PMD ruleset change (#8040) 9a3433b0d2 HDDS-12520. Move auditparser under debug (#8041) 0335385e5e HDDS-12420. Move FinalizeUpgradeCommandUtil to hdds-common. (#8023) 75bbba30a8 HDDS-12552. Fix raw use of generic class SCMCommand (#8048) d71aadf0f1 HDDS-12527. Separate S3 Gateway from MiniOzoneCluster (#8058) de8cf168ad HDDS-12483. Quasi Closed Stuck should have 2 replicas of each origin (#8014) 57a139e692 HDDS-11576. Create a separate S3 client factory (#8051) e8a4668074 HDDS-12383. Fix spotbugs warnings in hdds-common and httpfsgateway (#8046) 0dab553d70 HDDS-12541. Change ContainerID to value-based (#8044) 1202f6df15 HDDS-12532. Support only Enum in ContainerAttribute. (#8036) a2ad1e334d HDDS-12536. Move InMemoryTestTable to test (#8043) 7164c76ff3 HDDS-12488. S3G should handle the signature calculation with trailers (#8020) c8c6d0e5a7 HDDS-12534. Remove drop_column_family command from ozone debug ldb (#8038) 9a8321ef78 HDDS-12535. Mark TestContainerReportHandling as flaky ddd89fb2f3 HDDS-12543. Remove duplicate license information (#8045) 08e2c0a018 HDDS-12531. Use AtomicFileOutputStream to write YAML files. (#8035) 1bd8d8f56f HDDS-11813. Reduce duplication in CI workflow (#7497) 250bd5f317 HDDS-12450. Enable SimplifiableTestAssertion check in PMD (#8032) 96273ae699 HDDS-12489. Intermittent timeout in TestSCMContainerManagerMetrics.testReportProcessingMetrics (#8021) 231592705c HDDS-12476. Add TestDataUtil#createKey variant with small random content (#8028) d95ca4c44e HDDS-12421. ContainerReportHandler should not make the call to delete replicas (#7976) 26c859cc3a HDDS-12204. Improve failover logging (#7867) efbf79caf9 HDDS-12236. ContainerStateMachine should not apply or write future transactions in the event of failure (#7862) 3f88dbee51 HDDS-12377. Improve error handling of OM background tasks processing in case of abrupt crash of Recon. (#7960) 72da3a6aa3 HDDS-12477. Do not force RATIS/ONE replication in TestDataUtil#createKey (#8017) a428b15acb HDDS-12496. Use TextFormat#shortDebugString to flatten proto message in SCMDatanodeProtocolServer. (#8019) ecd2de095e HDDS-12409. Log an error before increasing the sequence id of a CLOSED container in SCM (#7964) 978e4a7eb6 HDDS-12168. Create new Grafana panel to display cluster growth rate (#7978) 9ab7c70380 HDDS-12456. Avoid FileInputStream and FileOutputStream (#8015) 9d41cd78a5 HDDS-12474. Add latency metrics of deletion services to grafana dashboard (#8007) 2b48e8c6ec HDDS-12354. Move Storage and UpgradeFinalizer to hdds-server-framework (#7973) ed737b3a38 HDDS-12428. Avoid force closing OPEN/CLOSING replica of a CLOSED Container (#7985) d1e8b90cbb HDDS-12210. Use correct BootstrapStateHandler.Lock in SnapshotDeletingService (#7991) b769a26481 HDDS-12295. Allow updating OM default replication config for tests (#7974) dd74eee7fc HDDS-12430. Document in ozone-default.xml the config keys moved from DFSConfigKeysLegacy (#7987) 34041caede HDDS-12466. Set default commit message to PR title (#8013) 1ea073560f HDDS-12467. Enable new asf.yaml parser (#8011) e87b8dbb2f HDDS-12417. Reduce duplication of createKey variants in TestDataUtil (#7999) 86d2027e32 HDDS-12193. Provide option to disable RDBStoreMetrics for Snapshotted DB (#7982) 7c1d201b2e HDDS-12451. Create factory for MultiTenantAccessController (#7996) c2a934ceb0 HDDS-12470. Revert workaround added by HDDS-8715 to preserve thread name. (#8004) b8c93ccf46 HDDS-12376. (addendum: fix findbugs) Remove scmRatisEnabled from ScmInfo. (#7931) 4005a104db HDDS-12376. (addendum: fix pmd) Remove scmRatisEnabled from ScmInfo. (#7931) abfa3becfa HDDS-12442. Add latency metrics for OM deletion services (#7986) 14db15cd67 HDDS-12376. Remove scmRatisEnabled from ScmInfo. (#7931) 7d31d9e522 HDDS-12410. Add detailed block info for ALLOCATE_BLOCK audit log (#7965) a8f0ff3d7b HDDS-12460. (addendum) Move hdds-test-utils code to src/test 4c28c7f62d HDDS-12460. Move hdds-test-utils code to src/test (#8000) bb16f66e22 HDDS-12448. Avoid using Jackson1 (#7994) be0e1e6a86 HDDS-12445. Remove unused code from ContainerStateMap. (#7990) a3c9c0e040 HDDS-12443. Intermittent failure in TestContainerBalancerSubCommand (#7989) 83fd8d7a31 HDDS-12449. Enable UseCollectionIsEmpty check in PMD (#7995) d18da13da4 HDDS-12452. Bump slf4j to 2.0.17 (#7993) 6a9e8b148f HDDS-12348. Reuse `TestDataUtil.createKey` method (#7971) 384d774254 HDDS-12416. Enable UnusedPrivateField check in PMD (#7975) 39d7da3713 HDDS-12424. Allow config key to include config group prefix (#7979) ba0939bc67 HDDS-12156. Add container health task metrics in Recon. (#7786) d34aee40c5 HDDS-12172. Rename Java constants of DFSConfigKeysLegacy keys (#7922) 59aaa5cdb0 HDDS-12418. Remove healthyReplicaCountAdapter from RatisContainerReplicaCount (#7972) 232e780902 HDDS-12150. Abnormal container states should not crash the SCM ContainerReportHandler thread (#7882) a708ea4fdd HDDS-12351. Move SCMHAUtils and ServerUtils to hdds-server-framework (#7961) 052bd2deef HDDS-12198. Exclude Recon generated code in coverage (#7962) ba44e12da3 HDDS-12382. Fix other spotbugs warnings (#7969) e2bf5998c1 HDDS-12345. Share cluster in filesystem tests (#7959) f9f1c80dc6 HDDS-12411. Make hdds-client compliant with FieldDeclarationsShouldBeAtStartOfClass (#7966) 46f4986482 HDDS-11512. Create Grafana dashboard for tracking system wide deletion (#7813) f7e9aedf49 HDDS-12396. Enable UnusedPrivateMethod check in PMD (#7956) 619f524517 HDDS-1234. Fix spotbugs warnings in ozone-manager (#7963) 1d64b37c64 HDDS-12380. Fix spotbugs warnings in hdds-container-service (#7958) 5c2b8f649f HDDS-12065. Checkpoint directory should be cleared on startup (#7681) a31755a085 HDDS-12315. Speed up some Freon integration tests (#7870) d8f3149211 HDDS-12399. Enable PMD.ForLoopCanBeForeach rule (#7952) 393211a6e8 HDDS-12062. Recon - Error handling in NSSummaryTask to avoid data inconsistencies. (#7723) ecc240330b HDDS-9792. Add tests for Pipelines page (#7859) e4de75e4b3 HDDS-12388. Key rewrite tests should be skipped if feature is disabled (#7953) 7b0fe618b1 HDDS-12288. (addendum) fix checkstyle d5df7e43c7 HDDS-12288. Improve bootstrap logging to indicate progress of snapshot download. (#7861) 58a04536f7 HDDS-12381. Fix spotbugs warnings in TestHddsUtils (#7955) 1a0c2238fa HDDS-11768. Extract SCM failover proxy provider logic (#7950) b45de0f18b HDDS-12403. Bump zstd-jni to 1.5.7-1 (#7949) a57809e5ad HDDS-12402. Bump sqlite-jdbc to 3.49.1.0 (#7948) 221d53dc3a HDDS-12400. Bump junit to 5.12.0 (#7947) f17abae511 HDDS-12398. Enable PMD checks for tests (#7946) 540f67b12f HDDS-12353. Move SpaceUsage implementations to hdds-server-framework (#7926) 70b93dc6a6 HDDS-10764.Tarball creation failing on leader OM node. (#7941) 6b20afcd13 HDDS-12387. Cleanup TestContainerOperations (#7940) 417ae7c62a HDDS-12365. Provide editor settings and IDEA run config in standard location (#7924) a2c5c8eaf0 HDDS-12293. Make ozone.om.server.list.max.size reconfigurable (#7938) e0bd2cc716 HDDS-12185. Enhance FileSizeCountTask for Faster Processing. (#7796) 589eeef871 HDDS-12371. Duplicated key scanning on multipartInfo table when listing multipart uploads (#7937) 6e766bffee HDDS-12367. Change ignorePipeline log level to DEBUG in OmKeyInfo (#7939) c87caa577d HDDS-12226. TestSecureOzoneRpcClient tests not run due to UnknownHostException (#7827) 1513c34e2f HDDS-11883. SCM HA: Move proxy object creation code to SCMRatisServer (#7914) 8c42c026ef HDDS-12349. Speed up some HDDS integration tests (#7932) a00787c9d1 HDDS-11532. Sort multipart uploads on ListMultipartUploads response (#7929) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerDataYaml.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Handler.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/ContainerTestUtils.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/TestSchemaTwoBackwardsCompatibility.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/impl/TestHddsDispatcher.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/interfaces/TestHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/replication/TestReplicationSupervisor.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/ozone/audit/SCMAction.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestContainerReportHandler.java hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/TestIncrementalContainerReportHandler.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ContainerCommands.java hadoop-ozone/dist/src/main/smoketest/admincli/container.robot hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestECKeyOutputStream.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/metrics/TestContainerMetrics.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOzoneContainerWithTLS.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestContainerServer.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestSecureContainerServer.java hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/fsck/TestContainerHealthTask.java .../ozone/container/checksum/ContainerChecksumTreeManager.java | 5 +++++ .../apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java | 2 -- .../src/main/java/org/apache/hadoop/hdds/utils/db/DBProfile.java | 5 ----- .../hadoop/hdds/utils/db/managed/ManagedBlockBasedTableConfig.java | 1 - hadoop-ozone/dist/src/main/smoketest/admincli/container.robot | 6 +++--- .../test/java/org/apache/hadoop/hdds/scm/TestCloseContainer.java | 5 ++--- .../ozone/dn/checksum/TestContainerCommandReconciliation.java | 2 +- pom.xml | 1 - 8 files changed, 11 insertions(+), 16 deletions(-) diff --cc hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java index 9511c87635,0000000000..5f0111bc3d mode 100644,000000..100644 --- a/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java +++ b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java @@@ -1,409 -1,0 +1,414 @@@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.container.checksum; + +import static java.nio.file.StandardCopyOption.ATOMIC_MOVE; +import static org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.util.concurrent.Striped; +import java.io.File; +import java.io.FileNotFoundException; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.file.Files; ++import java.nio.file.NoSuchFileException; +import java.util.Collection; +import java.util.HashSet; +import java.util.List; +import java.util.Optional; +import java.util.Set; +import java.util.SortedSet; +import java.util.TreeSet; +import java.util.concurrent.locks.Lock; +import java.util.concurrent.locks.ReentrantLock; +import org.apache.hadoop.hdds.conf.ConfigurationSource; +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos; +import org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException; +import org.apache.hadoop.hdds.utils.SimpleStriped; +import org.apache.hadoop.ozone.container.common.impl.ContainerData; +import org.apache.hadoop.ozone.container.common.interfaces.Container; +import org.apache.hadoop.ozone.container.common.statemachine.DatanodeConfiguration; +import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData; +import org.apache.ratis.thirdparty.com.google.protobuf.ByteString; +import org.apache.ratis.util.Preconditions; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This class coordinates reading and writing Container checksum information for all containers. + */ +public class ContainerChecksumTreeManager { + + private static final Logger LOG = LoggerFactory.getLogger(ContainerChecksumTreeManager.class); + + // Used to coordinate writes to each container's checksum file. + // Each container ID is mapped to a stripe. + // The file is atomically renamed into place, so readers do not need coordination. + private final Striped<Lock> fileLock; + private final ContainerMerkleTreeMetrics metrics; + + /** + * Creates one instance that should be used to coordinate all container checksum info within a datanode. + */ + public ContainerChecksumTreeManager(ConfigurationSource conf) { + fileLock = SimpleStriped.custom(conf.getObject(DatanodeConfiguration.class).getContainerChecksumLockStripes(), + () -> new ReentrantLock(true)); + metrics = ContainerMerkleTreeMetrics.create(); + } + + public void stop() { + ContainerMerkleTreeMetrics.unregister(); + } + + /** + * Writes the specified container merkle tree to the specified container's checksum file. + * The data merkle tree within the file is replaced with the {@code tree} parameter, but all other content of the + * file remains unchanged. + * Concurrent writes to the same file are coordinated internally. + */ + public ContainerProtos.ContainerChecksumInfo writeContainerDataTree(ContainerData data, + ContainerMerkleTreeWriter tree) + throws IOException { + long containerID = data.getContainerID(); + Lock writeLock = getLock(containerID); + writeLock.lock(); + try { + ContainerProtos.ContainerChecksumInfo.Builder checksumInfoBuilder = null; + try { + // If the file is not present, we will create the data for the first time. This happens under a write lock. + checksumInfoBuilder = readBuilder(data) + .orElse(ContainerProtos.ContainerChecksumInfo.newBuilder()); + } catch (IOException ex) { + LOG.error("Failed to read container checksum tree file for container {}. Overwriting it with a new instance.", + containerID, ex); + checksumInfoBuilder = ContainerProtos.ContainerChecksumInfo.newBuilder(); + } + + ContainerProtos.ContainerChecksumInfo checksumInfo = checksumInfoBuilder + .setContainerID(containerID) + .setContainerMerkleTree(captureLatencyNs(metrics.getCreateMerkleTreeLatencyNS(), tree::toProto)) + .build(); + write(data, checksumInfo); + LOG.debug("Data merkle tree for container {} updated", containerID); + return checksumInfo; + } finally { + writeLock.unlock(); + } + } + + /** + * Adds the specified blocks to the list of deleted blocks specified in the container's checksum file. + * All other content of the file remains unchanged. + * Concurrent writes to the same file are coordinated internally. + */ + public void markBlocksAsDeleted(KeyValueContainerData data, Collection<Long> deletedBlockIDs) throws IOException { + long containerID = data.getContainerID(); + Lock writeLock = getLock(containerID); + writeLock.lock(); + try { + ContainerProtos.ContainerChecksumInfo.Builder checksumInfoBuilder = null; + try { + // If the file is not present, we will create the data for the first time. This happens under a write lock. + checksumInfoBuilder = readBuilder(data) + .orElse(ContainerProtos.ContainerChecksumInfo.newBuilder()); + } catch (IOException ex) { + LOG.error("Failed to read container checksum tree file for container {}. Overwriting it with a new instance.", + data.getContainerID(), ex); + checksumInfoBuilder = ContainerProtos.ContainerChecksumInfo.newBuilder(); + } + + // Although the persisted block list should already be sorted, we will sort it here to make sure. + // This will automatically fix any bugs in the persisted order that may show up. + SortedSet<Long> sortedDeletedBlockIDs = new TreeSet<>(checksumInfoBuilder.getDeletedBlocksList()); + sortedDeletedBlockIDs.addAll(deletedBlockIDs); + + checksumInfoBuilder + .setContainerID(containerID) + .clearDeletedBlocks() + .addAllDeletedBlocks(sortedDeletedBlockIDs); + write(data, checksumInfoBuilder.build()); + LOG.debug("Deleted block list for container {} updated with {} new blocks", data.getContainerID(), + sortedDeletedBlockIDs.size()); + } finally { + writeLock.unlock(); + } + } + + /** + * Compares the checksum info of the container with the peer's checksum info and returns a report of the differences. + * @param thisChecksumInfo The checksum info of the container on this datanode. + * @param peerChecksumInfo The checksum info of the container on the peer datanode. + */ + public ContainerDiffReport diff(ContainerProtos.ContainerChecksumInfo thisChecksumInfo, + ContainerProtos.ContainerChecksumInfo peerChecksumInfo) throws + StorageContainerException { + + ContainerDiffReport report = new ContainerDiffReport(); + try { + captureLatencyNs(metrics.getMerkleTreeDiffLatencyNS(), () -> { + Preconditions.assertNotNull(thisChecksumInfo, "Datanode's checksum info is null."); + Preconditions.assertNotNull(peerChecksumInfo, "Peer checksum info is null."); + if (thisChecksumInfo.getContainerID() != peerChecksumInfo.getContainerID()) { + throw new StorageContainerException("Container ID does not match. Local container ID " + + thisChecksumInfo.getContainerID() + " , Peer container ID " + peerChecksumInfo.getContainerID(), + ContainerProtos.Result.CONTAINER_ID_MISMATCH); + } + + compareContainerMerkleTree(thisChecksumInfo, peerChecksumInfo, report); + }); + } catch (IOException ex) { + metrics.incrementMerkleTreeDiffFailures(); + throw new StorageContainerException("Container Diff failed for container #" + thisChecksumInfo.getContainerID(), + ex, ContainerProtos.Result.IO_EXCEPTION); + } + + // Update Container Diff metrics based on the diff report. + if (report.needsRepair()) { + metrics.incrementRepairContainerDiffs(); + return report; + } + metrics.incrementNoRepairContainerDiffs(); + return report; + } + + private void compareContainerMerkleTree(ContainerProtos.ContainerChecksumInfo thisChecksumInfo, + ContainerProtos.ContainerChecksumInfo peerChecksumInfo, + ContainerDiffReport report) { + ContainerProtos.ContainerMerkleTree thisMerkleTree = thisChecksumInfo.getContainerMerkleTree(); + ContainerProtos.ContainerMerkleTree peerMerkleTree = peerChecksumInfo.getContainerMerkleTree(); + Set<Long> thisDeletedBlockSet = new HashSet<>(thisChecksumInfo.getDeletedBlocksList()); + Set<Long> peerDeletedBlockSet = new HashSet<>(peerChecksumInfo.getDeletedBlocksList()); + + if (thisMerkleTree.getDataChecksum() == peerMerkleTree.getDataChecksum()) { + return; + } + + List<ContainerProtos.BlockMerkleTree> thisBlockMerkleTreeList = thisMerkleTree.getBlockMerkleTreeList(); + List<ContainerProtos.BlockMerkleTree> peerBlockMerkleTreeList = peerMerkleTree.getBlockMerkleTreeList(); + int thisIdx = 0, peerIdx = 0; + + // Step 1: Process both lists while elements are present in both + while (thisIdx < thisBlockMerkleTreeList.size() && peerIdx < peerBlockMerkleTreeList.size()) { + ContainerProtos.BlockMerkleTree thisBlockMerkleTree = thisBlockMerkleTreeList.get(thisIdx); + ContainerProtos.BlockMerkleTree peerBlockMerkleTree = peerBlockMerkleTreeList.get(peerIdx); + + if (thisBlockMerkleTree.getBlockID() == peerBlockMerkleTree.getBlockID()) { + // Matching block ID; check if the block is deleted and handle the cases; + // 1) If the block is deleted in both the block merkle tree, We can ignore comparing them. + // 2) If the block is only deleted in our merkle tree, The BG service should have deleted our + // block and the peer's BG service hasn't run yet. We can ignore comparing them. + // 3) If the block is only deleted in peer merkle tree, we can't reconcile for this block. It might be + // deleted by peer's BG service. We can ignore comparing them. + // TODO: HDDS-11765 - Handle missed block deletions from the deleted block ids. + if (!thisDeletedBlockSet.contains(thisBlockMerkleTree.getBlockID()) && + !peerDeletedBlockSet.contains(thisBlockMerkleTree.getBlockID()) && + thisBlockMerkleTree.getDataChecksum() != peerBlockMerkleTree.getDataChecksum()) { + compareBlockMerkleTree(thisBlockMerkleTree, peerBlockMerkleTree, report); + } + thisIdx++; + peerIdx++; + } else if (thisBlockMerkleTree.getBlockID() < peerBlockMerkleTree.getBlockID()) { + // this block merkle tree's block id is smaller. Which means our merkle tree has some blocks which the peer + // doesn't have. We can skip these, the peer will pick up these block when it reconciles with our merkle tree. + thisIdx++; + } else { + // Peer block's ID is smaller; record missing block if peerDeletedBlockSet doesn't contain the blockId + // and advance peerIdx + if (!peerDeletedBlockSet.contains(peerBlockMerkleTree.getBlockID())) { + report.addMissingBlock(peerBlockMerkleTree); + } + peerIdx++; + } + } + + // Step 2: Process remaining blocks in the peer list + while (peerIdx < peerBlockMerkleTreeList.size()) { + ContainerProtos.BlockMerkleTree peerBlockMerkleTree = peerBlockMerkleTreeList.get(peerIdx); + if (!peerDeletedBlockSet.contains(peerBlockMerkleTree.getBlockID())) { + report.addMissingBlock(peerBlockMerkleTree); + } + peerIdx++; + } + + // If we have remaining block in thisMerkleTree, we can skip these blocks. The peers will pick this block from + // us when they reconcile. + } + + private void compareBlockMerkleTree(ContainerProtos.BlockMerkleTree thisBlockMerkleTree, + ContainerProtos.BlockMerkleTree peerBlockMerkleTree, + ContainerDiffReport report) { + + List<ContainerProtos.ChunkMerkleTree> thisChunkMerkleTreeList = thisBlockMerkleTree.getChunkMerkleTreeList(); + List<ContainerProtos.ChunkMerkleTree> peerChunkMerkleTreeList = peerBlockMerkleTree.getChunkMerkleTreeList(); + int thisIdx = 0, peerIdx = 0; + + // Step 1: Process both lists while elements are present in both + while (thisIdx < thisChunkMerkleTreeList.size() && peerIdx < peerChunkMerkleTreeList.size()) { + ContainerProtos.ChunkMerkleTree thisChunkMerkleTree = thisChunkMerkleTreeList.get(thisIdx); + ContainerProtos.ChunkMerkleTree peerChunkMerkleTree = peerChunkMerkleTreeList.get(peerIdx); + + if (thisChunkMerkleTree.getOffset() == peerChunkMerkleTree.getOffset()) { + // Possible state when this Checksum != peer Checksum: + // thisTree = Healthy, peerTree = Healthy -> Both are healthy, No repair needed. Skip. + // thisTree = Unhealthy, peerTree = Healthy -> Add to corrupt chunk. + // thisTree = Healthy, peerTree = unhealthy -> Do nothing as thisTree is healthy. + // thisTree = Unhealthy, peerTree = Unhealthy -> Do Nothing as both are corrupt. + if (thisChunkMerkleTree.getDataChecksum() != peerChunkMerkleTree.getDataChecksum() && + !thisChunkMerkleTree.getIsHealthy() && peerChunkMerkleTree.getIsHealthy()) { + report.addCorruptChunk(peerBlockMerkleTree.getBlockID(), peerChunkMerkleTree); + } + thisIdx++; + peerIdx++; + } else if (thisChunkMerkleTree.getOffset() < peerChunkMerkleTree.getOffset()) { + // this chunk merkle tree's offset is smaller. Which means our merkle tree has some chunks which the peer + // doesn't have. We can skip these, the peer will pick up these chunks when it reconciles with our merkle tree. + thisIdx++; + } else { + // Peer chunk's offset is smaller; record missing chunk and advance peerIdx + report.addMissingChunk(peerBlockMerkleTree.getBlockID(), peerChunkMerkleTree); + peerIdx++; + } + } + + // Step 2: Process remaining chunks in the peer list + while (peerIdx < peerChunkMerkleTreeList.size()) { + report.addMissingChunk(peerBlockMerkleTree.getBlockID(), peerChunkMerkleTreeList.get(peerIdx)); + peerIdx++; + } + + // If we have remaining chunks in thisBlockMerkleTree, we can skip these chunks. The peers will pick these + // chunks from us when they reconcile. + } + + /** + * Returns the container checksum tree file for the specified container without deserializing it. + */ + @VisibleForTesting + public static File getContainerChecksumFile(ContainerData data) { + return new File(data.getMetadataPath(), data.getContainerID() + ".tree"); + } + + @VisibleForTesting + public static File getTmpContainerChecksumFile(ContainerData data) { + return new File(data.getMetadataPath(), data.getContainerID() + ".tree.tmp"); + } + + private Lock getLock(long containerID) { + return fileLock.get(containerID); + } + + /** + * Callers are not required to hold a lock while calling this since writes are done to a tmp file and atomically + * swapped into place. + */ + public Optional<ContainerProtos.ContainerChecksumInfo> read(ContainerData data) throws IOException { + try { + return captureLatencyNs(metrics.getReadContainerMerkleTreeLatencyNS(), () -> readChecksumInfo(data)); + } catch (IOException ex) { + metrics.incrementMerkleTreeReadFailures(); + throw new IOException(ex); + } + } + + private Optional<ContainerProtos.ContainerChecksumInfo.Builder> readBuilder(ContainerData data) throws IOException { + Optional<ContainerProtos.ContainerChecksumInfo> checksumInfo = read(data); + return checksumInfo.map(ContainerProtos.ContainerChecksumInfo::toBuilder); + } + + /** + * Callers should have acquired the write lock before calling this method. + */ + private void write(ContainerData data, ContainerProtos.ContainerChecksumInfo checksumInfo) throws IOException { + // Make sure callers filled in required fields before writing. + Preconditions.assertTrue(checksumInfo.hasContainerID()); + + File checksumFile = getContainerChecksumFile(data); + File tmpChecksumFile = getTmpContainerChecksumFile(data); + + try (OutputStream tmpOutputStream = Files.newOutputStream(tmpChecksumFile.toPath())) { + // Write to a tmp file and rename it into place. + captureLatencyNs(metrics.getWriteContainerMerkleTreeLatencyNS(), () -> { + checksumInfo.writeTo(tmpOutputStream); + Files.move(tmpChecksumFile.toPath(), checksumFile.toPath(), ATOMIC_MOVE); + }); + } catch (IOException ex) { + // If the move failed and left behind the tmp file, the tmp file will be overwritten on the next successful write. + // Nothing reads directly from the tmp file. + metrics.incrementMerkleTreeWriteFailures(); + throw new IOException("Error occurred when writing container merkle tree for containerID " + + data.getContainerID(), ex); + } + // Set in-memory data checksum. + data.setDataChecksum(checksumInfo.getContainerMerkleTree().getDataChecksum()); + } + + /** + * Reads the container checksum info file from the disk as bytes. + * Callers are not required to hold a lock while calling this since writes are done to a tmp file and atomically + * swapped into place. + * + * @throws FileNotFoundException When the file does not exist. It may not have been generated yet for this container. + * @throws IOException On error reading the file. + */ + public ByteString getContainerChecksumInfo(KeyValueContainerData data) throws IOException { + File checksumFile = getContainerChecksumFile(data); ++ if (!checksumFile.exists()) { ++ throw new NoSuchFileException("Checksum file does not exist for container #" + data.getContainerID()); ++ } ++ + try (InputStream inStream = Files.newInputStream(checksumFile.toPath())) { + return ByteString.readFrom(inStream); + } + } + + /** + * Reads the container checksum info file (containerID.tree) from the disk. + * Callers are not required to hold a lock while calling this since writes are done to a tmp file and atomically + * swapped into place. + */ + public static Optional<ContainerProtos.ContainerChecksumInfo> readChecksumInfo(ContainerData data) + throws IOException { + long containerID = data.getContainerID(); + File checksumFile = getContainerChecksumFile(data); + try { + if (!checksumFile.exists()) { + LOG.debug("No checksum file currently exists for container {} at the path {}", containerID, checksumFile); + return Optional.empty(); + } + try (InputStream inStream = Files.newInputStream(checksumFile.toPath())) { + return Optional.of(ContainerProtos.ContainerChecksumInfo.parseFrom(inStream)); + } + } catch (IOException ex) { + throw new IOException("Error occurred when reading container merkle tree for containerID " + + data.getContainerID() + " at path " + checksumFile, ex); + } + } + + @VisibleForTesting + public ContainerMerkleTreeMetrics getMetrics() { + return this.metrics; + } + + public static boolean checksumFileExist(Container container) { + File checksumFile = getContainerChecksumFile(container.getContainerData()); + return checksumFile.exists(); + } + +} diff --cc hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java index 918511e1f6,76e696e91d..86817ef258 --- a/hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java +++ b/hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java @@@ -361,8 -274,9 +361,6 @@@ public class TestKeyValueHandler null, StorageVolume.VolumeType.DATA_VOLUME, null); try { ContainerSet cset = newContainerSet(); -- int[] interval = new int[1]; -- interval[0] = 2; - ContainerMetrics metrics = new ContainerMetrics(interval); DatanodeDetails datanodeDetails = mock(DatanodeDetails.class); StateContext context = ContainerTestUtils.getMockContext( datanodeDetails, conf); diff --cc hadoop-ozone/dist/src/main/smoketest/admincli/container.robot index 83c2212978,8533c1938e..425b5df636 --- a/hadoop-ozone/dist/src/main/smoketest/admincli/container.robot +++ b/hadoop-ozone/dist/src/main/smoketest/admincli/container.robot @@@ -169,37 -169,5 +169,37 @@@ Cannot close container without admin pr Cannot create container without admin privilege Requires admin privilege ozone admin container create +Cannot reconcile container without admin privilege + Requires admin privilege ozone admin container reconcile "${CONTAINER}" + Reset user Run Keyword if '${SECURITY_ENABLED}' == 'true' Kinit test user testuser testuser.keytab + +Cannot reconcile open container + # At this point we should have an open Ratis Three container. - ${container} = Execute ozone admin container list --state OPEN | jq -r 'select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -n1 ++ ${container} = Execute ozone admin container list --state OPEN | jq -r '.[] | select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -n1 + Execute and check rc ozone admin container reconcile "${container}" 255 + # The container should not yet have any replica checksums. + # TODO When the scanner is computing checksums automatically, this test may need to be updated. + ${data_checksum} = Execute ozone admin container info "${container}" --json | jq -r '.replicas[].dataChecksum' | head -n1 + # 0 is the hex value of an empty checksum. + Should Be Equal As Strings 0 ${data_checksum} + +Close container - ${container} = Execute ozone admin container list --state OPEN | jq -r 'select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -1 ++ ${container} = Execute ozone admin container list --state OPEN | jq -r '.[] | select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -1 + Execute ozone admin container close "${container}" + # The container may either be in CLOSED or CLOSING state at this point. Once we have verified this, we will wait + # for it to progress to CLOSED. + ${output} = Execute ozone admin container info "${container}" + Should contain ${output} CLOS + Wait until keyword succeeds 1min 10sec Container is closed ${container} + +Reconcile closed container + # Check that info does not show replica checksums, since manual reconciliation has not yet been triggered. - ${container} = Execute ozone admin container list --state CLOSED | jq -r 'select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -1 ++ ${container} = Execute ozone admin container list --state CLOSED | jq -r '.[] | select(.replicationConfig.replicationFactor == "THREE") | .containerID' | head -1 + ${data_checksum} = Execute ozone admin container info "${container}" --json | jq -r '.replicas[].dataChecksum' | head -n1 + # 0 is the hex value of an empty checksum. After container close the data checksum should not be 0. + Should Not Be Equal As Strings 0 ${data_checksum} + # When reconciliation finishes, replica checksums should be shown. + Execute ozone admin container reconcile ${container} + Wait until keyword succeeds 1min 5sec Reconciliation complete ${container} diff --cc hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestCloseContainer.java index 8093c634b6,381c86d403..3062b6c07c --- a/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestCloseContainer.java +++ b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestCloseContainer.java @@@ -123,8 -106,8 +123,7 @@@ public class TestCloseContainer throws Exception { // Create some keys to write data into the open containers for (int i = 0; i < 10; i++) { -- TestDataUtil.createKey(bucket, "key" + i, - "this is the content".getBytes(UTF_8)); - "this is the content".getBytes(StandardCharsets.UTF_8)); ++ TestDataUtil.createKey(bucket, "key" + i, "this is the content".getBytes(UTF_8)); } StorageContainerManager scm = cluster.getStorageContainerManager(); @@@ -211,82 -164,4 +210,82 @@@ "Container " + container.getContainerID() + " already closed"); } + @Test + public void testContainerChecksumForClosedContainer() throws Exception { + // Create some keys to write data into the open containers + ReplicationConfig repConfig = RatisReplicationConfig.getInstance(HddsProtos.ReplicationFactor.THREE); + TestDataUtil.createKey(bucket, "key1", repConfig, "this is the content".getBytes(UTF_8)); + StorageContainerManager scm = cluster.getStorageContainerManager(); + + ContainerInfo containerInfo1 = scm.getContainerManager().getContainers().get(0); + // Checksum file doesn't exist before container close + List<HddsDatanodeService> hddsDatanodes = cluster.getHddsDatanodes(); + for (HddsDatanodeService hddsDatanode : hddsDatanodes) { + assertFalse(containerChecksumFileExists(hddsDatanode, containerInfo1)); + } + // Close container. + OzoneTestUtils.closeContainer(scm, containerInfo1); + ContainerProtos.ContainerChecksumInfo prevExpectedChecksumInfo1 = null; + // Checksum file exists after container close and matches the expected container + // merkle tree for all the datanodes + for (HddsDatanodeService hddsDatanode : hddsDatanodes) { + GenericTestUtils.waitFor(() -> checkContainerCloseInDatanode(hddsDatanode, containerInfo1), 100, 5000); + assertTrue(containerChecksumFileExists(hddsDatanode, containerInfo1)); + OzoneContainer ozoneContainer = hddsDatanode.getDatanodeStateMachine().getContainer(); + Container<?> container1 = ozoneContainer.getController().getContainer(containerInfo1.getContainerID()); + ContainerProtos.ContainerChecksumInfo containerChecksumInfo = ContainerMerkleTreeTestUtils.readChecksumFile( + container1.getContainerData()); + assertNotNull(containerChecksumInfo); + if (prevExpectedChecksumInfo1 != null) { + ContainerMerkleTreeTestUtils.assertTreesSortedAndMatch(prevExpectedChecksumInfo1.getContainerMerkleTree(), + containerChecksumInfo.getContainerMerkleTree()); + } + prevExpectedChecksumInfo1 = containerChecksumInfo; + } + + // Create 2nd container and check the checksum doesn't match with 1st container - TestDataUtil.createKey(bucket, "key2", repConfig, "this is the different content".getBytes()); ++ TestDataUtil.createKey(bucket, "key2", repConfig, "this is the different content".getBytes(UTF_8)); + ContainerInfo containerInfo2 = scm.getContainerManager().getContainers().get(1); + for (HddsDatanodeService hddsDatanode : hddsDatanodes) { + assertFalse(containerChecksumFileExists(hddsDatanode, containerInfo2)); + } + + // Close container. + OzoneTestUtils.closeContainer(scm, containerInfo2); + ContainerProtos.ContainerChecksumInfo prevExpectedChecksumInfo2 = null; + // Checksum file exists after container close and matches the expected container + // merkle tree for all the datanodes + for (HddsDatanodeService hddsDatanode : hddsDatanodes) { + GenericTestUtils.waitFor(() -> checkContainerCloseInDatanode(hddsDatanode, containerInfo2), 100, 5000); + assertTrue(containerChecksumFileExists(hddsDatanode, containerInfo2)); + OzoneContainer ozoneContainer = hddsDatanode.getDatanodeStateMachine().getContainer(); + Container<?> container2 = ozoneContainer.getController().getContainer(containerInfo2.getContainerID()); + ContainerProtos.ContainerChecksumInfo containerChecksumInfo = ContainerMerkleTreeTestUtils.readChecksumFile( + container2.getContainerData()); + assertNotNull(containerChecksumInfo); + if (prevExpectedChecksumInfo2 != null) { + ContainerMerkleTreeTestUtils.assertTreesSortedAndMatch(prevExpectedChecksumInfo2.getContainerMerkleTree(), + containerChecksumInfo.getContainerMerkleTree()); + } + prevExpectedChecksumInfo2 = containerChecksumInfo; + } + + // Container merkle tree for different container should not match. + assertNotEquals(prevExpectedChecksumInfo1.getContainerID(), prevExpectedChecksumInfo2.getContainerID()); + assertNotEquals(prevExpectedChecksumInfo1.getContainerMerkleTree().getDataChecksum(), + prevExpectedChecksumInfo2.getContainerMerkleTree().getDataChecksum()); + for (ContainerReplica replica : getContainerReplicas(containerInfo1)) { + assertNotEquals(0, replica.getDataChecksum()); + } + for (ContainerReplica replica : getContainerReplicas(containerInfo2)) { + assertNotEquals(0, replica.getDataChecksum()); + } + } + + private boolean checkContainerCloseInDatanode(HddsDatanodeService hddsDatanode, + ContainerInfo containerInfo) { + Container container = hddsDatanode.getDatanodeStateMachine().getContainer().getController() + .getContainer(containerInfo.getContainerID()); + return container.getContainerState() == ContainerProtos.ContainerDataProto.State.CLOSED; + } } diff --cc hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java index d7dfa0bbfe,0000000000..c4d9f55049 mode 100644,000000..100644 --- a/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java +++ b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java @@@ -1,728 -1,0 +1,728 @@@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone.dn.checksum; + +import static java.nio.charset.StandardCharsets.UTF_8; +import static org.apache.commons.lang3.RandomStringUtils.randomAlphabetic; +import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_BLOCK_TOKEN_ENABLED; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_CONTAINER_TOKEN_ENABLED; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_DATANODE_KERBEROS_KEYTAB_FILE_KEY; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_DATANODE_KERBEROS_PRINCIPAL_KEY; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_SECRET_KEY_EXPIRY_DURATION; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_SECRET_KEY_ROTATE_CHECK_DURATION; +import static org.apache.hadoop.hdds.HddsConfigKeys.HDDS_SECRET_KEY_ROTATE_DURATION; +import static org.apache.hadoop.hdds.HddsConfigKeys.OZONE_METADATA_DIRS; +import static org.apache.hadoop.hdds.client.ReplicationFactor.THREE; +import static org.apache.hadoop.hdds.client.ReplicationType.RATIS; +import static org.apache.hadoop.hdds.scm.ScmConfig.ConfigStrings.HDDS_SCM_KERBEROS_KEYTAB_FILE_KEY; +import static org.apache.hadoop.hdds.scm.ScmConfig.ConfigStrings.HDDS_SCM_KERBEROS_PRINCIPAL_KEY; +import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_CHUNK_SIZE_KEY; +import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_CLIENT_ADDRESS_KEY; +import static org.apache.hadoop.hdds.scm.server.SCMHTTPServerConfig.ConfigStrings.HDDS_SCM_HTTP_KERBEROS_KEYTAB_FILE_KEY; +import static org.apache.hadoop.hdds.scm.server.SCMHTTPServerConfig.ConfigStrings.HDDS_SCM_HTTP_KERBEROS_PRINCIPAL_KEY; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ADMINISTRATORS; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_SCM_BLOCK_SIZE; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_SECURITY_ENABLED_KEY; +import static org.apache.hadoop.ozone.container.checksum.ContainerChecksumTreeManager.getContainerChecksumFile; +import static org.apache.hadoop.ozone.container.checksum.ContainerMerkleTreeTestUtils.assertTreesSortedAndMatch; +import static org.apache.hadoop.ozone.container.checksum.ContainerMerkleTreeTestUtils.buildTestTree; +import static org.apache.hadoop.ozone.container.checksum.ContainerMerkleTreeTestUtils.readChecksumFile; +import static org.apache.hadoop.ozone.container.checksum.ContainerMerkleTreeTestUtils.writeContainerDataTreeProto; +import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_OM_HTTP_KERBEROS_KEYTAB_FILE; +import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_OM_HTTP_KERBEROS_PRINCIPAL_KEY; +import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_OM_KERBEROS_KEYTAB_FILE_KEY; +import static org.apache.hadoop.ozone.om.OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY; +import static org.apache.hadoop.security.UserGroupInformation.AuthenticationMethod.KERBEROS; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNotEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +import java.io.File; +import java.io.IOException; +import java.net.InetAddress; +import java.nio.file.Files; +import java.nio.file.Paths; +import java.nio.file.StandardOpenOption; +import java.util.List; +import java.util.Properties; +import java.util.Random; +import java.util.Set; +import java.util.UUID; +import java.util.stream.Collectors; +import org.apache.commons.io.IOUtils; +import org.apache.commons.lang3.tuple.Pair; +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.hdds.conf.StorageUnit; +import org.apache.hadoop.hdds.protocol.DatanodeDetails; +import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos; +import org.apache.hadoop.hdds.protocol.proto.HddsProtos; +import org.apache.hadoop.hdds.scm.ScmConfig; +import org.apache.hadoop.hdds.scm.container.ContainerID; +import org.apache.hadoop.hdds.scm.container.ContainerReplica; +import org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException; +import org.apache.hadoop.hdds.scm.protocolPB.StorageContainerLocationProtocolClientSideTranslatorPB; +import org.apache.hadoop.hdds.scm.server.SCMHTTPServerConfig; +import org.apache.hadoop.hdds.scm.server.StorageContainerManager; +import org.apache.hadoop.hdds.security.symmetric.SecretKeyClient; +import org.apache.hadoop.hdds.security.x509.certificate.client.CertificateClient; +import org.apache.hadoop.hdds.utils.db.BatchOperation; +import org.apache.hadoop.minikdc.MiniKdc; +import org.apache.hadoop.ozone.ClientVersion; +import org.apache.hadoop.ozone.HddsDatanodeService; +import org.apache.hadoop.ozone.MiniOzoneCluster; +import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneBucket; +import org.apache.hadoop.ozone.client.OzoneClient; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.client.OzoneVolume; +import org.apache.hadoop.ozone.client.io.OzoneOutputStream; +import org.apache.hadoop.ozone.container.TestHelper; +import org.apache.hadoop.ozone.container.checksum.ContainerMerkleTreeWriter; +import org.apache.hadoop.ozone.container.checksum.DNContainerOperationClient; +import org.apache.hadoop.ozone.container.common.helpers.BlockData; +import org.apache.hadoop.ozone.container.common.interfaces.Container; +import org.apache.hadoop.ozone.container.common.interfaces.DBHandle; +import org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine; +import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer; +import org.apache.hadoop.ozone.container.keyvalue.KeyValueContainerData; +import org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler; +import org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils; +import org.apache.hadoop.ozone.container.keyvalue.interfaces.BlockManager; +import org.apache.hadoop.ozone.container.ozoneimpl.ContainerScannerConfiguration; +import org.apache.hadoop.ozone.container.ozoneimpl.MetadataScanResult; +import org.apache.hadoop.ozone.om.OzoneManager; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.ozone.test.GenericTestUtils; +import org.apache.ratis.thirdparty.com.google.protobuf.ByteString; +import org.apache.ratis.thirdparty.com.google.protobuf.InvalidProtocolBufferException; +import org.junit.jupiter.api.AfterAll; +import org.junit.jupiter.api.BeforeAll; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +/** + * This class tests container commands for reconciliation. + */ +public class TestContainerCommandReconciliation { + + private static MiniOzoneHAClusterImpl cluster; + private static OzoneClient rpcClient; + private static ObjectStore store; + private static OzoneConfiguration conf; + private static DNContainerOperationClient dnClient; + private static final String KEY_NAME = "testkey"; + + @TempDir + private static File testDir; + @TempDir + private static File workDir; + private static MiniKdc miniKdc; + private static File ozoneKeytab; + private static File spnegoKeytab; + private static File testUserKeytab; + private static String testUserPrincipal; + private static String host; + + @BeforeAll + public static void init() throws Exception { + conf = new OzoneConfiguration(); + conf.set(OZONE_SCM_CLIENT_ADDRESS_KEY, "localhost"); + conf.set(OZONE_METADATA_DIRS, testDir.getAbsolutePath()); + conf.setStorageSize(OZONE_SCM_CHUNK_SIZE_KEY, 128 * 1024, StorageUnit.BYTES); + conf.setStorageSize(OZONE_SCM_BLOCK_SIZE, 512 * 1024, StorageUnit.BYTES); + // Disable the container scanner so it does not create merkle tree files that interfere with this test. + // TODO: Currently container scrub sets the checksum to 0, Revert this after HDDS-10374 is merged. + conf.getObject(ContainerScannerConfiguration.class).setEnabled(false); + conf.setBoolean("hdds.container.scrub.enabled", false); + + startMiniKdc(); + setSecureConfig(); + createCredentialsInKDC(); + setSecretKeysConfig(); + startCluster(); + } + + @AfterAll + public static void stop() throws IOException { + if (rpcClient != null) { + rpcClient.close(); + } + + if (dnClient != null) { + dnClient.close(); + } + + if (miniKdc != null) { + miniKdc.stop(); + } + + if (cluster != null) { + cluster.stop(); + } + } + + /** + * Container checksum trees are only generated for non-open containers. + * Calling the API on a non-open container should fail. + */ + @Test + public void testGetChecksumInfoOpenReplica() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(false, volume, bucket); + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + StorageContainerException ex = assertThrows(StorageContainerException.class, + () -> dnClient.getContainerChecksumInfo(containerID, targetDN.getDatanodeDetails())); + assertEquals(ex.getResult(), ContainerProtos.Result.UNCLOSED_CONTAINER_IO); + } + + /** + * Tests reading the container checksum info file from a datanode who does not have a replica for the requested + * container. + */ + @Test + public void testGetChecksumInfoNonexistentReplica() { + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + + // Find a container ID that does not exist in the cluster. For a small test this should be a good starting + // point, but modify it just in case. + long badIDCheck = 1_000_000; + while (cluster.getStorageContainerManager().getContainerManager() + .containerExist(ContainerID.valueOf(badIDCheck))) { + badIDCheck++; + } + + final long nonexistentContainerID = badIDCheck; + StorageContainerException ex = assertThrows(StorageContainerException.class, + () -> dnClient.getContainerChecksumInfo(nonexistentContainerID, targetDN.getDatanodeDetails())); + assertEquals(ex.getResult(), ContainerProtos.Result.CONTAINER_NOT_FOUND); + } + + /** + * Tests reading the container checksum info file from a datanode where the container exists, but the file has not + * yet been created. + */ + @Test + public void testGetChecksumInfoNonexistentFile() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(true, volume, bucket); + // Pick a datanode and remove its checksum file. + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + Container<?> container = targetDN.getDatanodeStateMachine().getContainer() + .getContainerSet().getContainer(containerID); + File treeFile = getContainerChecksumFile(container.getContainerData()); + // Closing the container should have generated the tree file. + assertTrue(treeFile.exists()); + assertTrue(treeFile.delete()); + + StorageContainerException ex = assertThrows(StorageContainerException.class, () -> + dnClient.getContainerChecksumInfo(containerID, targetDN.getDatanodeDetails())); + assertEquals(ContainerProtos.Result.IO_EXCEPTION, ex.getResult()); - assertTrue(ex.getMessage().contains("(No such file or directory"), ex.getMessage() + ++ assertTrue(ex.getMessage().contains("Checksum file does not exist"), ex.getMessage() + + " did not contain the expected string"); + } + + /** + * Tests reading the container checksum info file from a datanode where the datanode fails to read the file from + * the disk. + */ + @Test + public void testGetChecksumInfoServerIOError() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(true, volume, bucket); + // Pick a datanode and remove its checksum file. + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + Container<?> container = targetDN.getDatanodeStateMachine().getContainer() + .getContainerSet().getContainer(containerID); + File treeFile = getContainerChecksumFile(container.getContainerData()); + assertTrue(treeFile.exists()); + // Make the server unable to read the file. + assertTrue(treeFile.setReadable(false)); + + StorageContainerException ex = assertThrows(StorageContainerException.class, () -> + dnClient.getContainerChecksumInfo(containerID, targetDN.getDatanodeDetails())); + assertEquals(ContainerProtos.Result.IO_EXCEPTION, ex.getResult()); + } + + /** + * Tests reading the container checksum info file from a datanode where the file is corrupt. + * The datanode does not deserialize the file before sending it, so there should be no error on the server side + * when sending the file. The client should raise an error trying to deserialize it. + */ + @Test + public void testGetCorruptChecksumInfo() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(true, volume, bucket); + + // Pick a datanode and corrupt its checksum file. + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + Container<?> container = targetDN.getDatanodeStateMachine().getContainer() + .getContainerSet().getContainer(containerID); + File treeFile = getContainerChecksumFile(container.getContainerData()); + Files.write(treeFile.toPath(), new byte[]{1, 2, 3}, + StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.SYNC); + + // Reading the file from the replica should fail when the client tries to deserialize it. + assertThrows(InvalidProtocolBufferException.class, () -> dnClient.getContainerChecksumInfo(containerID, + targetDN.getDatanodeDetails())); + } + + @Test + public void testGetEmptyChecksumInfo() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(true, volume, bucket); + + // Pick a datanode and truncate its checksum file to zero length. + HddsDatanodeService targetDN = cluster.getHddsDatanodes().get(0); + Container<?> container = targetDN.getDatanodeStateMachine().getContainer() + .getContainerSet().getContainer(containerID); + File treeFile = getContainerChecksumFile(container.getContainerData()); + // TODO After HDDS-10379 the file will already exist and need to be overwritten. + assertTrue(treeFile.exists()); + Files.write(treeFile.toPath(), new byte[]{}, + StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.SYNC); + assertEquals(0, treeFile.length()); + + // The client will get an empty byte string back. It should raise this as an error instead of returning a default + // protobuf object. + StorageContainerException ex = assertThrows(StorageContainerException.class, () -> + dnClient.getContainerChecksumInfo(containerID, targetDN.getDatanodeDetails())); + assertEquals(ContainerProtos.Result.IO_EXCEPTION, ex.getResult()); + } + + @Test + public void testGetChecksumInfoSuccess() throws Exception { + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + long containerID = writeDataAndGetContainer(true, volume, bucket); + // Overwrite the existing tree with a custom one for testing. We will check that it is returned properly from the + // API. + ContainerMerkleTreeWriter tree = buildTestTree(conf); + writeChecksumFileToDatanodes(containerID, tree); + + // Verify trees match on all replicas. + // This test is expecting Ratis 3 data written on a 3 node cluster, so every node has a replica. + assertEquals(3, cluster.getHddsDatanodes().size()); + List<DatanodeDetails> datanodeDetails = cluster.getHddsDatanodes().stream() + .map(HddsDatanodeService::getDatanodeDetails).collect(Collectors.toList()); + for (DatanodeDetails dn: datanodeDetails) { + ContainerProtos.ContainerChecksumInfo containerChecksumInfo = + dnClient.getContainerChecksumInfo(containerID, dn); + assertTreesSortedAndMatch(tree.toProto(), containerChecksumInfo.getContainerMerkleTree()); + } + } + + @Test + public void testContainerChecksumWithBlockMissing() throws Exception { + // 1. Write data to a container. + // Read the key back and check its hash. + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + Pair<Long, byte[]> containerAndData = getDataAndContainer(true, 20 * 1024 * 1024, volume, bucket); + long containerID = containerAndData.getLeft(); + byte[] data = containerAndData.getRight(); + // Get the datanodes where the container replicas are stored. + List<DatanodeDetails> dataNodeDetails = cluster.getStorageContainerManager().getContainerManager() + .getContainerReplicas(ContainerID.valueOf(containerID)) + .stream().map(ContainerReplica::getDatanodeDetails) + .collect(Collectors.toList()); + assertEquals(3, dataNodeDetails.size()); + HddsDatanodeService hddsDatanodeService = cluster.getHddsDatanode(dataNodeDetails.get(0)); + DatanodeStateMachine datanodeStateMachine = hddsDatanodeService.getDatanodeStateMachine(); + Container<?> container = datanodeStateMachine.getContainer().getContainerSet().getContainer(containerID); + KeyValueContainerData containerData = (KeyValueContainerData) container.getContainerData(); + ContainerProtos.ContainerChecksumInfo oldContainerChecksumInfo = readChecksumFile(container.getContainerData()); + KeyValueHandler kvHandler = (KeyValueHandler) datanodeStateMachine.getContainer().getDispatcher() + .getHandler(ContainerProtos.ContainerType.KeyValueContainer); + + BlockManager blockManager = kvHandler.getBlockManager(); + List<BlockData> blockDataList = blockManager.listBlock(container, -1, 100); + String chunksPath = container.getContainerData().getChunksPath(); + long oldDataChecksum = oldContainerChecksumInfo.getContainerMerkleTree().getDataChecksum(); + + // 2. Delete some blocks to simulate missing blocks. + try (DBHandle db = BlockUtils.getDB(containerData, conf); + BatchOperation op = db.getStore().getBatchHandler().initBatchOperation()) { + for (int i = 0; i < blockDataList.size(); i += 2) { + BlockData blockData = blockDataList.get(i); + // Delete the block metadata from the container db + db.getStore().getBlockDataTable().deleteWithBatch(op, containerData.getBlockKey(blockData.getLocalID())); + // Delete the block file. + Files.deleteIfExists(Paths.get(chunksPath + "/" + blockData.getBlockID().getLocalID() + ".block")); + } + db.getStore().getBatchHandler().commitBatchOperation(op); + db.getStore().flushDB(); + } + + // TODO: Use On-demand container scanner to build the new container merkle tree. (HDDS-10374) + Files.deleteIfExists(getContainerChecksumFile(container.getContainerData()).toPath()); + kvHandler.createContainerMerkleTree(container); + ContainerProtos.ContainerChecksumInfo containerChecksumAfterBlockDelete = + readChecksumFile(container.getContainerData()); + long dataChecksumAfterBlockDelete = containerChecksumAfterBlockDelete.getContainerMerkleTree().getDataChecksum(); + // Checksum should have changed after block delete. + assertNotEquals(oldDataChecksum, dataChecksumAfterBlockDelete); + + // Since the container is already closed, we have manually updated the container checksum file. + // This doesn't update the checksum reported to SCM, and we need to trigger an ICR. + // Marking a container unhealthy will send an ICR. + kvHandler.markContainerUnhealthy(container, MetadataScanResult.deleted()); + waitForDataChecksumsAtSCM(containerID, 2); + + // 3. Reconcile the container. + cluster.getStorageContainerLocationClient().reconcileContainer(containerID); + // Compare and check if dataChecksum is same on all replicas. + waitForDataChecksumsAtSCM(containerID, 1); + ContainerProtos.ContainerChecksumInfo newContainerChecksumInfo = readChecksumFile(container.getContainerData()); + assertTreesSortedAndMatch(oldContainerChecksumInfo.getContainerMerkleTree(), + newContainerChecksumInfo.getContainerMerkleTree()); + TestHelper.validateData(KEY_NAME, data, store, volume, bucket); + } + + @Test + public void testContainerChecksumChunkCorruption() throws Exception { + // 1. Write data to a container. + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + Pair<Long, byte[]> containerAndData = getDataAndContainer(true, 20 * 1024 * 1024, volume, bucket); + long containerID = containerAndData.getLeft(); + byte[] data = containerAndData.getRight(); + // Get the datanodes where the container replicas are stored. + List<DatanodeDetails> dataNodeDetails = cluster.getStorageContainerManager().getContainerManager() + .getContainerReplicas(ContainerID.valueOf(containerID)) + .stream().map(ContainerReplica::getDatanodeDetails) + .collect(Collectors.toList()); + assertEquals(3, dataNodeDetails.size()); + HddsDatanodeService hddsDatanodeService = cluster.getHddsDatanode(dataNodeDetails.get(0)); + DatanodeStateMachine datanodeStateMachine = hddsDatanodeService.getDatanodeStateMachine(); + Container<?> container = datanodeStateMachine.getContainer().getContainerSet().getContainer(containerID); + KeyValueContainerData containerData = (KeyValueContainerData) container.getContainerData(); + ContainerProtos.ContainerChecksumInfo oldContainerChecksumInfo = readChecksumFile(container.getContainerData()); + KeyValueHandler kvHandler = (KeyValueHandler) datanodeStateMachine.getContainer().getDispatcher() + .getHandler(ContainerProtos.ContainerType.KeyValueContainer); + + BlockManager blockManager = kvHandler.getBlockManager(); + List<BlockData> blockDatas = blockManager.listBlock(container, -1, 100); + long oldDataChecksum = oldContainerChecksumInfo.getContainerMerkleTree().getDataChecksum(); + + // 2. Corrupt first chunk for all the blocks + try (DBHandle db = BlockUtils.getDB(containerData, conf); + BatchOperation op = db.getStore().getBatchHandler().initBatchOperation()) { + for (BlockData blockData : blockDatas) { + // Modify the block metadata to simulate chunk corruption. + ContainerProtos.BlockData.Builder blockDataBuilder = blockData.getProtoBufMessage().toBuilder(); + blockDataBuilder.clearChunks(); + + ContainerProtos.ChunkInfo chunkInfo = blockData.getChunks().get(0); + ContainerProtos.ChecksumData.Builder checksumDataBuilder = ContainerProtos.ChecksumData.newBuilder() + .setBytesPerChecksum(chunkInfo.getChecksumData().getBytesPerChecksum()) + .setType(chunkInfo.getChecksumData().getType()); + + for (ByteString checksum : chunkInfo.getChecksumData().getChecksumsList()) { + byte[] checksumBytes = checksum.toByteArray(); + // Modify the checksum bytes to simulate corruption. + checksumBytes[0] = (byte) (checksumBytes[0] - 1); + checksumDataBuilder.addChecksums(ByteString.copyFrom(checksumBytes)).build(); + } + chunkInfo = chunkInfo.toBuilder().setChecksumData(checksumDataBuilder.build()).build(); + blockDataBuilder.addChunks(chunkInfo); + for (int i = 1; i < blockData.getChunks().size(); i++) { + blockDataBuilder.addChunks(blockData.getChunks().get(i)); + } + + // Modify the block metadata from the container db to simulate chunk corruption. + db.getStore().getBlockDataTable().putWithBatch(op, containerData.getBlockKey(blockData.getLocalID()), + BlockData.getFromProtoBuf(blockDataBuilder.build())); + } + db.getStore().getBatchHandler().commitBatchOperation(op); + db.getStore().flushDB(); + } + + Files.deleteIfExists(getContainerChecksumFile(container.getContainerData()).toPath()); + kvHandler.createContainerMerkleTree(container); + // To set unhealthy for chunks that are corrupted. + ContainerProtos.ContainerChecksumInfo containerChecksumAfterChunkCorruption = + readChecksumFile(container.getContainerData()); + long dataChecksumAfterAfterChunkCorruption = containerChecksumAfterChunkCorruption + .getContainerMerkleTree().getDataChecksum(); + // Checksum should have changed after chunk corruption. + assertNotEquals(oldDataChecksum, dataChecksumAfterAfterChunkCorruption); + + // 3. Set Unhealthy for first chunk of all blocks. This should be done by the scanner, Until then this is a + // manual step. + // TODO: Use On-demand container scanner to build the new container merkle tree (HDDS-10374) + Random random = new Random(); + ContainerProtos.ContainerChecksumInfo.Builder builder = containerChecksumAfterChunkCorruption.toBuilder(); + List<ContainerProtos.BlockMerkleTree> blockMerkleTreeList = builder.getContainerMerkleTree() + .getBlockMerkleTreeList(); + builder.getContainerMerkleTreeBuilder().clearBlockMerkleTree(); + for (ContainerProtos.BlockMerkleTree blockMerkleTree : blockMerkleTreeList) { + ContainerProtos.BlockMerkleTree.Builder blockMerkleTreeBuilder = blockMerkleTree.toBuilder(); + List<ContainerProtos.ChunkMerkleTree.Builder> chunkMerkleTreeBuilderList = + blockMerkleTreeBuilder.getChunkMerkleTreeBuilderList(); + chunkMerkleTreeBuilderList.get(0).setIsHealthy(false).setDataChecksum(random.nextLong()); + blockMerkleTreeBuilder.setDataChecksum(random.nextLong()); + builder.getContainerMerkleTreeBuilder().addBlockMerkleTree(blockMerkleTreeBuilder.build()); + } + builder.getContainerMerkleTreeBuilder().setDataChecksum(random.nextLong()); + Files.deleteIfExists(getContainerChecksumFile(container.getContainerData()).toPath()); + writeContainerDataTreeProto(container.getContainerData(), builder.getContainerMerkleTree()); + + // Since the container is already closed, we have manually updated the container checksum file. + // This doesn't update the checksum reported to SCM, and we need to trigger an ICR. + // Marking a container unhealthy will send an ICR. + kvHandler.markContainerUnhealthy(container, MetadataScanResult.deleted()); + waitForDataChecksumsAtSCM(containerID, 2); + + // 4. Reconcile the container. + cluster.getStorageContainerLocationClient().reconcileContainer(containerID); + // Compare and check if dataChecksum is same on all replicas. + waitForDataChecksumsAtSCM(containerID, 1); + ContainerProtos.ContainerChecksumInfo newContainerChecksumInfo = readChecksumFile(container.getContainerData()); + assertTreesSortedAndMatch(oldContainerChecksumInfo.getContainerMerkleTree(), + newContainerChecksumInfo.getContainerMerkleTree()); + assertEquals(oldDataChecksum, newContainerChecksumInfo.getContainerMerkleTree().getDataChecksum()); + TestHelper.validateData(KEY_NAME, data, store, volume, bucket); + } + + @Test + public void testDataChecksumReportedAtSCM() throws Exception { + // 1. Write data to a container. + // Read the key back and check its hash. + String volume = UUID.randomUUID().toString(); + String bucket = UUID.randomUUID().toString(); + Pair<Long, byte[]> containerAndData = getDataAndContainer(true, 20 * 1024 * 1024, volume, bucket); + long containerID = containerAndData.getLeft(); + byte[] data = containerAndData.getRight(); + // Get the datanodes where the container replicas are stored. + List<DatanodeDetails> dataNodeDetails = cluster.getStorageContainerManager().getContainerManager() + .getContainerReplicas(ContainerID.valueOf(containerID)) + .stream().map(ContainerReplica::getDatanodeDetails) + .collect(Collectors.toList()); + assertEquals(3, dataNodeDetails.size()); + HddsDatanodeService hddsDatanodeService = cluster.getHddsDatanode(dataNodeDetails.get(0)); + DatanodeStateMachine datanodeStateMachine = hddsDatanodeService.getDatanodeStateMachine(); + Container<?> container = datanodeStateMachine.getContainer().getContainerSet().getContainer(containerID); + KeyValueContainerData containerData = (KeyValueContainerData) container.getContainerData(); + ContainerProtos.ContainerChecksumInfo oldContainerChecksumInfo = readChecksumFile(container.getContainerData()); + KeyValueHandler kvHandler = (KeyValueHandler) datanodeStateMachine.getContainer().getDispatcher() + .getHandler(ContainerProtos.ContainerType.KeyValueContainer); + + long oldDataChecksum = oldContainerChecksumInfo.getContainerMerkleTree().getDataChecksum(); + // Check non-zero checksum after container close + StorageContainerLocationProtocolClientSideTranslatorPB scmClient = cluster.getStorageContainerLocationClient(); + List<HddsProtos.SCMContainerReplicaProto> containerReplicas = scmClient.getContainerReplicas(containerID, + ClientVersion.CURRENT_VERSION); + assertEquals(3, containerReplicas.size()); + for (HddsProtos.SCMContainerReplicaProto containerReplica: containerReplicas) { + assertNotEquals(0, containerReplica.getDataChecksum()); + } + + // 2. Delete some blocks to simulate missing blocks. + BlockManager blockManager = kvHandler.getBlockManager(); + List<BlockData> blockDataList = blockManager.listBlock(container, -1, 100); + String chunksPath = container.getContainerData().getChunksPath(); + try (DBHandle db = BlockUtils.getDB(containerData, conf); + BatchOperation op = db.getStore().getBatchHandler().initBatchOperation()) { + for (int i = 0; i < blockDataList.size(); i += 2) { + BlockData blockData = blockDataList.get(i); + // Delete the block metadata from the container db + db.getStore().getBlockDataTable().deleteWithBatch(op, containerData.getBlockKey(blockData.getLocalID())); + // Delete the block file. + Files.deleteIfExists(Paths.get(chunksPath + "/" + blockData.getBlockID().getLocalID() + ".block")); + } + db.getStore().getBatchHandler().commitBatchOperation(op); + db.getStore().flushDB(); + } + + // TODO: Use On-demand container scanner to build the new container merkle tree. (HDDS-10374) + Files.deleteIfExists(getContainerChecksumFile(container.getContainerData()).toPath()); + kvHandler.createContainerMerkleTree(container); + ContainerProtos.ContainerChecksumInfo containerChecksumAfterBlockDelete = + readChecksumFile(container.getContainerData()); + long dataChecksumAfterBlockDelete = containerChecksumAfterBlockDelete.getContainerMerkleTree().getDataChecksum(); + // Checksum should have changed after block delete. + assertNotEquals(oldDataChecksum, dataChecksumAfterBlockDelete); + + // Since the container is already closed, we have manually updated the container checksum file. + // This doesn't update the checksum reported to SCM, and we need to trigger an ICR. + // Marking a container unhealthy will send an ICR. + kvHandler.markContainerUnhealthy(container, MetadataScanResult.deleted()); + waitForDataChecksumsAtSCM(containerID, 2); + scmClient.reconcileContainer(containerID); + + waitForDataChecksumsAtSCM(containerID, 1); + // Check non-zero checksum after container reconciliation + containerReplicas = scmClient.getContainerReplicas(containerID, ClientVersion.CURRENT_VERSION); + assertEquals(3, containerReplicas.size()); + for (HddsProtos.SCMContainerReplicaProto containerReplica: containerReplicas) { + assertNotEquals(0, containerReplica.getDataChecksum()); + } + + // Check non-zero checksum after datanode restart + // Restarting all the nodes take more time in mini ozone cluster, so restarting only one node + cluster.restartHddsDatanode(0, true); + for (StorageContainerManager scm : cluster.getStorageContainerManagers()) { + cluster.restartStorageContainerManager(scm, false); + } + cluster.waitForClusterToBeReady(); + waitForDataChecksumsAtSCM(containerID, 1); + containerReplicas = scmClient.getContainerReplicas(containerID, ClientVersion.CURRENT_VERSION); + assertEquals(3, containerReplicas.size()); + for (HddsProtos.SCMContainerReplicaProto containerReplica: containerReplicas) { + assertNotEquals(0, containerReplica.getDataChecksum()); + } + TestHelper.validateData(KEY_NAME, data, store, volume, bucket); + } + + private void waitForDataChecksumsAtSCM(long containerID, int expectedSize) throws Exception { + GenericTestUtils.waitFor(() -> { + try { + Set<Long> dataChecksums = cluster.getStorageContainerLocationClient().getContainerReplicas(containerID, + ClientVersion.CURRENT_VERSION).stream() + .map(HddsProtos.SCMContainerReplicaProto::getDataChecksum) + .collect(Collectors.toSet()); + return dataChecksums.size() == expectedSize; + } catch (Exception ex) { + return false; + } + }, 500, 20000); + } + + private Pair<Long, byte[]> getDataAndContainer(boolean close, int dataLen, String volumeName, String bucketName) + throws Exception { + store.createVolume(volumeName); + OzoneVolume volume = store.getVolume(volumeName); + volume.createBucket(bucketName); + OzoneBucket bucket = volume.getBucket(bucketName); + + byte[] data = randomAlphabetic(dataLen).getBytes(UTF_8); + // Write Key + try (OzoneOutputStream os = TestHelper.createKey(KEY_NAME, RATIS, THREE, dataLen, store, volumeName, bucketName)) { + IOUtils.write(data, os); + } + + long containerID = bucket.getKey(KEY_NAME).getOzoneKeyLocations().stream() + .findFirst().get().getContainerID(); + if (close) { + TestHelper.waitForContainerClose(cluster, containerID); + TestHelper.waitForScmContainerState(cluster, containerID, HddsProtos.LifeCycleState.CLOSED); + } + return Pair.of(containerID, data); + } + + private long writeDataAndGetContainer(boolean close, String volume, String bucket) throws Exception { + return getDataAndContainer(close, 5, volume, bucket).getLeft(); + } + + public static void writeChecksumFileToDatanodes(long containerID, ContainerMerkleTreeWriter tree) throws Exception { + // Write Container Merkle Tree + for (HddsDatanodeService dn : cluster.getHddsDatanodes()) { + KeyValueHandler keyValueHandler = + (KeyValueHandler) dn.getDatanodeStateMachine().getContainer().getDispatcher() + .getHandler(ContainerProtos.ContainerType.KeyValueContainer); + KeyValueContainer keyValueContainer = + (KeyValueContainer) dn.getDatanodeStateMachine().getContainer().getController() + .getContainer(containerID); + if (keyValueContainer != null) { + keyValueHandler.getChecksumManager().writeContainerDataTree( + keyValueContainer.getContainerData(), tree); + } + } + } + + private static void setSecretKeysConfig() { + // Secret key lifecycle configs. + conf.set(HDDS_SECRET_KEY_ROTATE_CHECK_DURATION, "500s"); + conf.set(HDDS_SECRET_KEY_ROTATE_DURATION, "500s"); + conf.set(HDDS_SECRET_KEY_EXPIRY_DURATION, "500s"); + + // enable tokens + conf.setBoolean(HDDS_BLOCK_TOKEN_ENABLED, true); + conf.setBoolean(HDDS_CONTAINER_TOKEN_ENABLED, true); + } + + private static void createCredentialsInKDC() throws Exception { + ScmConfig scmConfig = conf.getObject(ScmConfig.class); + SCMHTTPServerConfig httpServerConfig = + conf.getObject(SCMHTTPServerConfig.class); + createPrincipal(ozoneKeytab, scmConfig.getKerberosPrincipal()); + createPrincipal(spnegoKeytab, httpServerConfig.getKerberosPrincipal()); + createPrincipal(testUserKeytab, testUserPrincipal); + } + + private static void createPrincipal(File keytab, String... principal) + throws Exception { + miniKdc.createPrincipal(keytab, principal); + } + + private static void startMiniKdc() throws Exception { + Properties securityProperties = MiniKdc.createConf(); + miniKdc = new MiniKdc(securityProperties, workDir); + miniKdc.start(); + } + + private static void setSecureConfig() throws IOException { + conf.setBoolean(OZONE_SECURITY_ENABLED_KEY, true); + host = InetAddress.getLocalHost().getCanonicalHostName() + .toLowerCase(); + conf.set(HADOOP_SECURITY_AUTHENTICATION, KERBEROS.name()); + String curUser = UserGroupInformation.getCurrentUser().getUserName(); + conf.set(OZONE_ADMINISTRATORS, curUser); + String realm = miniKdc.getRealm(); + String hostAndRealm = host + "@" + realm; + conf.set(HDDS_SCM_KERBEROS_PRINCIPAL_KEY, "scm/" + hostAndRealm); + conf.set(HDDS_SCM_HTTP_KERBEROS_PRINCIPAL_KEY, "HTTP_SCM/" + hostAndRealm); + conf.set(OZONE_OM_KERBEROS_PRINCIPAL_KEY, "scm/" + hostAndRealm); + conf.set(OZONE_OM_HTTP_KERBEROS_PRINCIPAL_KEY, "HTTP_OM/" + hostAndRealm); + conf.set(HDDS_DATANODE_KERBEROS_PRINCIPAL_KEY, "scm/" + hostAndRealm); + + ozoneKeytab = new File(workDir, "scm.keytab"); + spnegoKeytab = new File(workDir, "http.keytab"); + testUserKeytab = new File(workDir, "testuser.keytab"); + testUserPrincipal = "test@" + realm; + + conf.set(HDDS_SCM_KERBEROS_KEYTAB_FILE_KEY, ozoneKeytab.getAbsolutePath()); + conf.set(HDDS_SCM_HTTP_KERBEROS_KEYTAB_FILE_KEY, spnegoKeytab.getAbsolutePath()); + conf.set(OZONE_OM_KERBEROS_KEYTAB_FILE_KEY, ozoneKeytab.getAbsolutePath()); + conf.set(OZONE_OM_HTTP_KERBEROS_KEYTAB_FILE, spnegoKeytab.getAbsolutePath()); + conf.set(HDDS_DATANODE_KERBEROS_KEYTAB_FILE_KEY, ozoneKeytab.getAbsolutePath()); + } + + private static void startCluster() throws Exception { + OzoneManager.setTestSecureOmFlag(true); + cluster = MiniOzoneCluster.newHABuilder(conf) + .setSCMServiceId("SecureSCM") + .setNumOfStorageContainerManagers(3) + .setNumOfOzoneManagers(1) + .build(); + cluster.waitForClusterToBeReady(); + rpcClient = OzoneClientFactory.getRpcClient(conf); + store = rpcClient.getObjectStore(); + SecretKeyClient secretKeyClient = cluster.getStorageContainerManager().getSecretKeyManager(); + CertificateClient certClient = cluster.getStorageContainerManager().getScmCertificateClient(); + dnClient = new DNContainerOperationClient(conf, certClient, secretKeyClient); + } +} --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
