[
https://issues.apache.org/jira/browse/HADOOP-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939367#comment-15939367
]
Jonathan Eagles commented on HADOOP-14216:
------------------------------------------
Client-Side Performance Tests:
Setup: Essentially run normal user commands and see the performance gains with
only the client hadoop-common.jar replaced with a patch version
*Eyeball test*:
1. _hadoop fs -ls_
{code}
# baseline - ran dozens of times, this is a typical results
$ time hadoop fs -ls /
real 0m2.694s
user 0m6.633s
sys 0m0.303s
# patched version - ran dozens of times, this is a typical result
$ time HADOOP_USER_CLASSPATH_FIRST=true
HADOOP_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
hadoop fs -ls /
real 0m2.335s
user 0m4.963s
sys 0m0.292s
{code}
===========================
Result on a real cluster is roughly 300 ms real 1700 ms user faster per hadoop
fs -ls command
2. _yarn application -list_
{code}
$ time yarn application -list
real 0m1.867s
user 0m5.178s
sys 0m0.288s
$ time
YARN_USER_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
YARN_USER_CLASSPATH_FIRST=true yarn application -list
real 0m1.607s
user 0m3.911s
sys 0m0.225s
{code}
===========================
Result on a real cluster is roughly 250ms real and 1200 user faster per yarn
application -list command
*Performance Numbers at scale*
{code:title=ConfPerf.java}
import org.apache.hadoop.conf.Configuration;
public class ConfPerf {
public static void main(String[] args) throws Exception {
long start = System.currentTimeMillis();
long count = 0;
Configuration.addDefaultResource("core-default.xml");
Configuration.addDefaultResource("core-site.xml");
Configuration.addDefaultResource("yarn-default.xml");
Configuration.addDefaultResource("yarn-site.xml");
Configuration.addDefaultResource("mapred-default.xml");
Configuration.addDefaultResource("mapred-site.xml");
Configuration.addDefaultResource("hdfs-default.xml");
Configuration.addDefaultResource("hdfs-site.xml");
for (int i = 0; i < 3000; i++) {
Configuration conf = new Configuration();
conf.get("trigger.loading");
count += conf.size();
}
long end = System.currentTimeMillis();
System.out.println("duration: " + (end - start) + " count: " + count);
}
}
{code}
{code}
# setup performance tests
$ javac -cp ./:`hadoop classpath` ConfPerf.java
# baseline performance numbers
$ time java -cp ./:`hadoop classpath` ConfPerf
real 0m52.456s
user 1m2.209s
sys 0m3.601s
# performance numbers with patch
$ time java -cp
./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
classpath` ConfPerf
real 0m23.108s
user 0m27.434s
sys 0m1.816s
{code}
===========================
Result in a real cluster are roughly 29300 ms real and 34800 ms user faster
*Equality Test*
{code:title=ConfEquality.java}
import org.apache.hadoop.conf.Configuration;
public class ConfEquality {
public static void main(String[] args) throws Exception {
Configuration.addDefaultResource("core-default.xml");
Configuration.addDefaultResource("core-site.xml");
Configuration.addDefaultResource("yarn-default.xml");
Configuration.addDefaultResource("yarn-site.xml");
Configuration.addDefaultResource("mapred-default.xml");
Configuration.addDefaultResource("mapred-site.xml");
Configuration.addDefaultResource("hdfs-default.xml");
Configuration.addDefaultResource("hdfs-site.xml");
Configuration conf = new Configuration();
conf.get("trigger.loading");
conf.writeXml(System.out);
}
}
{code}
{code}
# prepare the equality test
$ javac -cp ./:`hadoop classpath` ConfEquality.java
# run the equality test
$ diff <(java -cp ./:`hadoop classpath` ConfEquality) <(java -cp
./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
classpath` ConfEquality)
{code}
> Improve Configuration XML Parsing Performance
> ---------------------------------------------
>
> Key: HADOOP-14216
> URL: https://issues.apache.org/jira/browse/HADOOP-14216
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: HADOOP-14216.1.patch
>
>
> JIRA is to improve XML parsing performance through reuse and a change in XML
> parser (STAX)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]