steveloughran commented on code in PR #7723:
URL: https://github.com/apache/hadoop/pull/7723#discussion_r2363269891
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
@@ -109,6 +110,12 @@ public void testConnectorFrameWorkIntegration() throws
Throwable {
verifyStatisticCounterValue(ioStats, STREAM_READ_ANALYTICS_OPENED, 1);
fs.close();
verifyStatisticCounterValue(fs.getIOStatistics(),
ANALYTICS_STREAM_FACTORY_CLOSED, 1);
+
+ // Expect 4 audited requests. One HEAD, and 3 GETs. The 3 GETs are because
the read policy is WHOLE_FILE,
+ // in which case, AAL will start prefetching till EoF on file open in 8MB
chunks. The file read here
+ // s3://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz, has a size of
~21MB, resulting in 3 GETS:
+ // [0-8388607, 8388608-16777215, 16777216-21511173].
Review Comment:
slick. These are parallel GETs aren't they?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -69,6 +70,8 @@
import static org.apache.hadoop.fs.s3a.commit.CommitUtils.extractJobID;
import static org.apache.hadoop.fs.s3a.impl.HeaderProcessing.HEADER_REFERRER;
import static
org.apache.hadoop.fs.s3a.statistics.impl.StatisticsFromAwsSdkImpl.mapErrorStatusCodeToStatisticName;
+import static
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;
Review Comment:
again, need a copy in `org.apache.hadoop.fs.audit.AuditConstants`
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java:
##########
Review Comment:
what about assert in testMultiRowGroupParquet where you can assert that the
read took place after the open?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -49,6 +50,7 @@
import org.apache.hadoop.fs.store.LogExactlyOnce;
import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader;
import org.apache.hadoop.security.UserGroupInformation;
+import software.amazon.s3.analyticsaccelerator.request.RequestAttributes;
Review Comment:
even if it is a bug they fix, we can't bump the SDK right now. Is there some
clever way to use reflection to do this binding, through our DynMethods code?
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AWSRequestAnalyzer.java:
##########
@@ -50,6 +51,8 @@
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_LIST_REQUEST;
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_PUT_REQUEST;
import static
org.apache.hadoop.fs.statistics.StoreStatisticNames.STORE_EXISTS_PROBE;
+import static
software.amazon.s3.analyticsaccelerator.request.Constants.OPERATION_NAME;
Review Comment:
need to copy this so if AAL isn't on the classpath things don't break
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -267,8 +269,9 @@ HttpReferrerAuditHeader getReferrer(AuditSpanS3A span) {
*/
private class LoggingAuditSpan extends AbstractAuditSpanImpl {
- private final HttpReferrerAuditHeader referrer;
+ private HttpReferrerAuditHeader referrer;
+ private final HttpReferrerAuditHeader.Builder headerBuilder;
Review Comment:
add a newline after and a javadoc before to explain what it is (an
incomplete header) and why it is there
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java:
##########
@@ -384,12 +388,33 @@ public SdkHttpRequest
modifyHttpRequest(Context.ModifyHttpRequest context,
SdkHttpRequest httpRequest = context.httpRequest();
SdkRequest sdkRequest = context.request();
+ // If spanId and operationName are set in execution attributes, then use
these values,
+ // instead of the ones in the current span. This is useful when requests
are happening in dependencies such as
+ // the analytics accelerator library (AAL), where they cannot be
attached to the correct span. In which case, AAL
+ // will attach the current spanId and operationName via execution
attributes during it's request creation. These
+ // can then used to update the values in the logger and referrer header.
Without this overwriting, the operation
+ // name and corresponding span will be whichever is active on the thread
the request is getting executed on.
+ boolean isRequestAuditedOutsideCurrentSpan =
isRequestAuditedOutsideOfCurrentSpan(executionAttributes);
+
+ String spanId = isRequestAuditedOutsideCurrentSpan ?
+ executionAttributes.getAttribute(SPAN_ID) : getSpanId();
+
+ String operationName = isRequestAuditedOutsideCurrentSpan ?
+ executionAttributes.getAttribute(OPERATION_NAME) :
getOperationName();
+
+ if (isRequestAuditedOutsideCurrentSpan) {
+ this.headerBuilder.withSpanId(spanId);
+ this.headerBuilder.withOperationName(operationName);
+ this.referrer = this.headerBuilder.build();
+ }
+
// attach range for GetObject requests
attachRangeFromRequest(httpRequest, executionAttributes);
// for delete op, attach the number of files to delete
attachDeleteKeySizeAttribute(sdkRequest);
+
Review Comment:
cut
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]