steveloughran commented on issue #1576: Hadoop 16520 dynamodb ms version race refactor. URL: https://github.com/apache/hadoop/pull/1576#issuecomment-537610443 I tested out on the command line while doing things to the DDB tables " Version marker in the dynamo table was null" How about "No version marker found in the DynamoDB table " + table name? Issue: we always patch the table, even on fsck, destroy and bucket-info calls. Should we? I'm happy with it as is, as trying to be clever complicates life. Issue: What do we think makes a good retry interval? I think it is too big, as a bucket info is now sleeping for 3 minutes before giving up. (that's on a table with no version Tag BTW) . ``` > bin/hadoop s3guard prune -tombstone -seconds 0 s3a://hwdev-steve-ireland-new ... 2019-10-02 17:37:49,476 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 629 ms before next retry 2019-10-02 17:37:50,127 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 1048 ms before next retry 2019-10-02 17:37:51,200 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 1149 ms before next retry 2019-10-02 17:37:52,375 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 3756 ms before next retry 2019-10-02 17:37:56,155 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 8933 ms before next retry 2019-10-02 17:38:05,115 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 10687 ms before next retry 2019-10-02 17:38:15,829 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 17912 ms before next retry 2019-10-02 17:38:33,766 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 55424 ms before next retry 2019-10-02 17:39:29,212 [main] WARN s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:getVersionMarkerItem(564)) - Version marker in the dynamo table was null. Sleeping 175876 ms before next retry 2019-10-02 17:42:25,409 [main] DEBUG s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:verifyVersionCompatibility(368)) - versionMarkerItem: null; versionMarkerFromTag: { Item: {parent=../VERSION, child=../VERSION, table_version=100, table_created=0} } 2019-10-02 17:42:25,409 [main] INFO s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:verifyVersionCompatibility(392)) - Table hwdev-steve-ireland-new contains no version marker ITEM but contains compatible version marker TAG. Restoring the version marker item from tag. 2019-10-02 17:42:25,409 [main] DEBUG s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:putItem(599)) - Putting item { Item: {parent=../VERSION, child=../VERSION, table_version=100, table_created=1570034545409} } 2019-10-02 17:42:25,470 [main] DEBUG s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:initTable(187)) - Using existing DynamoDB table hwdev-steve-ireland-new in region eu-west-1 created Wed Oct 02 17:42:25 BST 2019 ``` I verified that incompatible item or tag values raised errors * tag version = 150 * tag valid, item = 150 * item lacks a numeric version field at all In those situations, I think the error could include the full details which are logged at debug. I can see that being useful. ``` 2019-10-02 18:08:57,286 [main] DEBUG s3guard.DynamoDBMetadataStoreTableHandler (DynamoDBMetadataStoreTableHandler.java:verifyVersionCompatibility(368)) versionMarkerItem: { Item: {parent=../VERSION, table_created=1570034545409, table_version=120, child=../VERSION} }; versionMarkerFromTag: { Item: {parent=../VERSION, child=../VERSION, table_version=100, table_created=0} } ``` CLI run on an inconsistent version marker (even if there's a valid file entry) is noisy: it logs at error with stack and then throws. I'd prefer just the log without the stack here, ``` 2019-10-02 18:06:00,690 [main] ERROR s3guard.S3Guard (S3Guard.java:getMetadataStore(120)) - Failed to instantiate metadata store org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore defined in fs.s3a.metadatastore.impl: java.io.IOException: Database table is from an incompatible S3Guard version based on table TAG. Table hwdev-steve-ireland-new Expected version: 100 actual tag version: 150 java.io.IOException: Database table is from an incompatible S3Guard version based on table TAG. Table hwdev-steve-ireland-new Expected version: 100 actual tag version: 150 at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.throwExceptionOnVersionMismatch(DynamoDBMetadataStoreTableHandler.java:440) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.verifyVersionCompatibility(DynamoDBMetadataStoreTableHandler.java:418) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.initTable(DynamoDBMetadataStoreTableHandler.java:184) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:426) at org.apache.hadoop.fs.s3a.s3guard.S3Guard.getMetadataStore(S3Guard.java:109) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:413) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3391) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:137) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3440) at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3414) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:556) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$BucketInfo.run(S3GuardTool.java:1208) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:427) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1797) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1806) java.io.IOException: Database table is from an incompatible S3Guard version based on table TAG. Table hwdev-steve-ireland-new Expected version: 100 actual tag version: 150 at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.throwExceptionOnVersionMismatch(DynamoDBMetadataStoreTableHandler.java:440) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.verifyVersionCompatibility(DynamoDBMetadataStoreTableHandler.java:418) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStoreTableHandler.initTable(DynamoDBMetadataStoreTableHandler.java:184) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.initialize(DynamoDBMetadataStore.java:426) at org.apache.hadoop.fs.s3a.s3guard.S3Guard.getMetadataStore(S3Guard.java:109) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:413) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3391) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:137) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3440) at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3414) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:556) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool$BucketInfo.run(S3GuardTool.java:1208) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:427) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:1797) at org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:1806) 2019-10-02 18:06:00,693 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status -1: java.io.IOException: Database table is from an incompatible S3Guard version based on table TAG. Table hwdev-steve-ireland-new Exp ``` Side issue a prune with seconds 0 didn't delete tombstones from the DDB table. ``` > bin/hadoop s3guard prune -tombstone -seconds 0 s3a://hwdev-steve-ireland-new/ ... 2019-10-02 17:56:25,011 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(324)) - Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=hwdev-steve-ireland-new, tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/hwdev-steve-ireland-new} is initialized. 2019-10-02 17:56:25,027 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:prune(1519)) - Prune tombstones under /hwdev-steve-ireland-new with age 1569948985027 2019-10-02 17:56:25,033 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:<init>(72)) - Starting: Pruning DynamoDB Store 2019-10-02 17:56:25,066 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:close(87)) - Pruning DynamoDB Store: duration 0:00.032s 2019-10-02 17:56:25,066 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:close(2267)) - Auditing #(Prune-0001) 2019-10-02 17:56:25,066 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1596)) - Finished pruning 0 items in batches of 25 2019-10-02 17:56:25,068 [shutdown-hook-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:close(3048)) - Filesystem s3a://hwdev-steve-ireland-new is closed 2019-10-02 17:56:25,070 [shutdown-hook-0] DEBUG s3a.S3AFileSystem (S3AUtils.java:closeAll(1628)) - Closing NullMetadataStore 2019-10-02 17:56:25,070 [shutdown-hook-0] DEBUG s3a.S3AFileSystem (S3AUtils.java:closeAll(1628)) - Closing org.apache.hadoop.fs.s3a.S3AInstrumentation@27968269 2019-10-02 17:56:25,071 [shutdown-hook-0] DEBUG s3a.S3AFileSystem (S3AUtils.java:closeAutocloseables(1651)) - Closing AWSCredentialProviderList[refcount= 2: [TemporaryAWSCredentialsProvider, SimpleAWSCredentialsProvider, EnvironmentVariableCredentialsProvider, org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider@63a12c68] last provider: SimpleAWSCredentialsProvider 2019-10-02 17:56:25,071 [shutdown-hook-0] DEBUG s3a.AWSCredentialProviderList (AWSCredentialProviderList.java:close(311)) - Not closing AWSCredentialProviderList[refcount= 1: [TemporaryAWSCredentialsProvider, SimpleAWSCredentialsProvider, EnvironmentVariableCredentialsProvider, org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider@63a12c68] last provider: SimpleAWSCredentialsProvider ``` But when I said -seconds 1, it did ``` > bin/hadoop s3guard prune -tombstone -seconds 1 s3a://hwdev-steve-ireland-new/ ... 2019-10-02 17:58:55,815 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:prune(1519)) - Prune tombstones under /hwdev-steve-ireland-new with age 1570035534815 2019-10-02 17:58:55,821 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:<init>(72)) - Starting: Pruning DynamoDB Store 2019-10-02 17:58:55,882 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/terasort-directory/validate/_SUCCESS 2019-10-02 17:58:55,883 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/terasort-directory/validate/part-r-00000 2019-10-02 17:58:55,883 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/terasort-magic/validate/__magic/app-attempt-0001/tasks/attempt_1570028934823_0006_m_000000_0/__base 2019-10-02 17:58:55,883 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/user/stevel/target/test/data/Nhh2EnrZqv 2019-10-02 17:58:55,884 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/user/stevel/target/test/data/PEzrLmEa92 2019-10-02 17:58:55,884 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/user/stevel/target/test/data/rogKpvLQQU 2019-10-02 17:58:55,884 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1548)) - Prune entry s3a://hwdev-steve-ireland-new/terasort-magic/sortin/__magic/app-attempt-0001/tasks/attempt_1570019599148_0002_ ... 2019-10-02 17:58:57,212 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:get(711)) - result of get s3a://hwdev-steve-ireland-new/test/prune-cli-keep is: null 2019-10-02 17:58:57,212 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:lambda$removeAuthoritativeDirFlag$10(1640)) - No parent s3a://hwdev-steve-ireland-new/test/prune-cli-keep; skipping 2019-10-02 17:58:57,212 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:removeAuthoritativeDirFlag(1666)) - innerPut on metas: [] 2019-10-02 17:58:57,213 [main] INFO s3guard.DynamoDBMetadataStore (DurationInfo.java:close(87)) - Pruning DynamoDB Store: duration 0:01.391s 2019-10-02 17:58:57,213 [main] DEBUG s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:close(2267)) - Auditing #(Prune-0001) 2019-10-02 17:58:57,213 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:innerPrune(1596)) - Finished pruning 104 items in batches of 25 ``` I don't think it is at all related -but have a try on your system and see what happens.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
