[ 
https://issues.apache.org/jira/browse/HADOOP-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939367#comment-15939367
 ] 

Jonathan Eagles commented on HADOOP-14216:
------------------------------------------

Client-Side Performance Tests:

Setup: Essentially run normal user commands and see the performance gains with 
only the client hadoop-common.jar replaced with a patch version

*Eyeball test*:
1. _hadoop fs -ls_
{code}
# baseline - ran dozens of times, this is a typical results
$ time hadoop fs -ls /
real    0m2.694s
user    0m6.633s
sys     0m0.303s

# patched version - ran dozens of times, this is a typical result
$ time HADOOP_USER_CLASSPATH_FIRST=true 
HADOOP_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
 hadoop fs -ls /
real    0m2.335s
user    0m4.963s
sys     0m0.292s
{code}
===========================
Result on a real cluster is roughly 300 ms real 1700 ms user faster per hadoop 
fs -ls command 

2. _yarn application -list_
{code}
$ time yarn application -list
real    0m1.867s
user    0m5.178s
sys     0m0.288s

$ time 
YARN_USER_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
 YARN_USER_CLASSPATH_FIRST=true yarn application -list
real    0m1.607s
user    0m3.911s
sys     0m0.225s
{code}

===========================
Result on a real cluster is roughly 250ms real and 1200 user faster per yarn 
application -list command

*Performance Numbers at scale*
{code:title=ConfPerf.java}
import org.apache.hadoop.conf.Configuration;

public class ConfPerf {
  public static void main(String[] args) throws Exception {
    long start = System.currentTimeMillis();
    long count = 0;
    Configuration.addDefaultResource("core-default.xml");
    Configuration.addDefaultResource("core-site.xml");
    Configuration.addDefaultResource("yarn-default.xml");
    Configuration.addDefaultResource("yarn-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    for (int i = 0; i < 3000; i++) {
      Configuration conf = new Configuration();
      conf.get("trigger.loading");
      count += conf.size();
    }
    long end = System.currentTimeMillis();
    System.out.println("duration: " + (end - start) + " count: " + count);
  }
}
{code}

{code}
# setup performance tests
$ javac -cp ./:`hadoop classpath` ConfPerf.java

# baseline performance numbers
$ time java -cp ./:`hadoop classpath` ConfPerf
real    0m52.456s
user    1m2.209s
sys     0m3.601s

# performance numbers with patch
$ time java -cp 
./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
 classpath` ConfPerf
real    0m23.108s
user    0m27.434s
sys     0m1.816s
{code}

===========================
Result in a real cluster are roughly 29300 ms real and 34800 ms user faster 

*Equality Test*
{code:title=ConfEquality.java}
import org.apache.hadoop.conf.Configuration;

public class ConfEquality {
  public static void main(String[] args) throws Exception {
    Configuration.addDefaultResource("core-default.xml");
    Configuration.addDefaultResource("core-site.xml");
    Configuration.addDefaultResource("yarn-default.xml");
    Configuration.addDefaultResource("yarn-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    Configuration conf = new Configuration();
    conf.get("trigger.loading");
    conf.writeXml(System.out);
  }
}
{code}
{code}
# prepare the equality test
$ javac -cp ./:`hadoop classpath` ConfEquality.java
# run the equality test
$ diff <(java -cp ./:`hadoop classpath` ConfEquality) <(java -cp 
./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
 classpath` ConfEquality)
{code}

> Improve Configuration XML Parsing Performance
> ---------------------------------------------
>
>                 Key: HADOOP-14216
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14216
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: HADOOP-14216.1.patch
>
>
> JIRA is to improve XML parsing performance through reuse and a change in XML 
> parser (STAX)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to