[
https://issues.apache.org/jira/browse/HADOOP-15566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562025#comment-16562025
]
BOGDAN DRUTU commented on HADOOP-15566:
---------------------------------------
Hello all,
First sorry for jumping into this issue, but I will try to be short (edited
after I finished the comment: I was wrong) and as much possible project
independent (for the record I am one of the main contributor in OpenCensus,
also in my previous life I debugged a lot of BigTable issues using the same
technology as OpenCensus).
Some comments about other comments in this issue:
[~bensigelman] - FYI: OpenCensus does not enforce any wire format. The format
is configurable and we are adding support for the w3c standard.
[~elek] - About OT vs OC my personal opinion is the philosophy behind these
projects, OT was designed with a mindset of being an open-source API for
vendors to implement and because of these certain tradeoffs were made to help
some vendors (as [~michaelsembwever] mentioned), OC was designed to be a fully
implemented library that supports multiple different backend (Zipkin, Jagger,
Stackdriver, AppInsight, etc.) as well as in-process debugging capabilities.
For example one of the key feature that I used a lot when I debugged BigTable
issues is what OpenCensus calls z-pages (in-process handlers to track active
requests, in-memory latency based sampled spans, stats, etc.). You can take a
look here [https://opencensus.io/core-concepts/z-pages/#1].
Based on my small experience there are 3 components that are critical in the
instrumentation of a service:
# Wire propagation (I saw a previous discussion about this).
[https://github.com/w3c/distributed-tracing] - it is a w3c standard proposed by
couple of APM vendors and cloud providers. Even though the format is mostly
focus on HTTP requests HBase can define their own format if needed, the only
requirement being the ability to propagate all fields defined in the format
(trace-id, span-id, trace-options and tracestate). This part is critical when
HBase is used as a service (e.g. something like Google Bigtable which works
with the HBase client), having standard fields that are propagated allows
service owners to correlate incoming requests from a customer with the internal
trace. Also similar issue may occur when only HDFS is used as a service.
# APIs to start/end a span, record tracing events, etc. There are multiple
open source APIs including (OpenCensus, OpenTracing, Zipkin, etc.).
# In-process propagation. This can be implemented in two ways: explicitly
propagate the current "Span" between function calls, runnable, callable, etc.
or implicitly usually using a thread-local mechanism. From a previous comment
from [~stack] about keeping this working, my personal experience is that you
can achieve this using the "implicit" mechanism described before by having a
clean context api (for an example of a context api that works good I can
recommend the [https://grpc.io/grpc-java/javadoc/io/grpc/Context.html)] and
ensure that all async calls are wrapped accordingly (e.g wrapping all
Executors), the "explicit" mechanism may be very hard to maintain and based on
my experience annoying for developers. This part is very important when
instrumenting the HBase client (which I think should be instrumented in order
to debug more complex issues) because the client is used as a library and a
standard way to propagate the current Span is very important in order to
continue the same trace between client application and bigtable client.
When OpenCensus was designed I thought that it is very important that the
library ensures all these 3 components are covered. Some may say that the 1) it
is not important when deployed internally but with the new cloud providers this
becomes more common, others may say that 3) it is not important but when
instrument client libraries (like HBase client) this becomes very important in
my opinion. FYI there are other libraries that solve these issues as well like
Zipkin, etc. but I am not here to suggest one particular library, just to
explain the concepts, issues and what is important to think about.
In my personal opinion OpenTracing does not deal very well with 1 and 3
(probably on purpose) but I am not an expert in OpenTracing or one of the
owner/author/co-author so I cannot comment on what is good or what is bad in
their design choices.
These are my thoughts about what you should consider when you pick one library
vs other. Related to OpenCensus we are happy to help if you have any questions
about our design choices, or about stats/metrics support in OpenCensus and why
we think that these are very important as well.
PS: Hope the comment makes sense, it became larger than expected but I tried to
give an overview of the whole instrumentation issue.
> Remove HTrace support
> ---------------------
>
> Key: HADOOP-15566
> URL: https://issues.apache.org/jira/browse/HADOOP-15566
> Project: Hadoop Common
> Issue Type: Improvement
> Components: metrics
> Affects Versions: 3.1.0
> Reporter: Todd Lipcon
> Priority: Major
> Labels: security
> Attachments: Screen Shot 2018-06-29 at 11.59.16 AM.png,
> ss-trace-s3a.png
>
>
> The HTrace incubator project has voted to retire itself and won't be making
> further releases. The Hadoop project currently has various hooks with HTrace.
> It seems in some cases (eg HDFS-13702) these hooks have had measurable
> performance overhead. Given these two factors, I think we should consider
> removing the HTrace integration. If there is someone willing to do the work,
> replacing it with OpenTracing might be a better choice since there is an
> active community.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]