spark git commit: [Docs] Update monitoring.md to accurately describe the history server

andrewor14 Thu, 31 Mar 2016 12:06:50 -0700

Repository: spark
Updated Branches:
  refs/heads/master 8a333d2da -> 4d93b653f



[Docs] Update monitoring.md to accurately describe the history server

It looks like the docs were recently updated to reflect the History Server's 
support for incomplete applications, but they still had wording that suggested 
only completed applications were viewable.  This fixes that.

My editor also introduced several whitespace removal changes, that I hope are 
OK, as text files shouldn't have trailing whitespace.  To verify they're purely 
whitespace changes, add `&w=1` to your browser address.  If this isn't 
acceptable, let me know and I'll update the PR.

I also didn't think this required a JIRA.  Let me know if I should create one.

Not tested

Author: Michael Gummelt <[email protected]>

Closes #12045 from mgummelt/update-history-docs.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d93b653
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d93b653
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d93b653

Branch: refs/heads/master
Commit: 4d93b653f7294698526674950d3dc303691260f8
Parents: 8a333d2
Author: Michael Gummelt <[email protected]>
Authored: Thu Mar 31 12:06:16 2016 -0700
Committer: Andrew Or <[email protected]>
Committed: Thu Mar 31 12:06:21 2016 -0700

----------------------------------------------------------------------
 docs/monitoring.md | 58 ++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/4d93b653/docs/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/monitoring.md b/docs/monitoring.md
index c139e1c..32d2e02 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -8,7 +8,7 @@ There are several ways to monitor Spark applications: web UIs, 
metrics, and exte
 
 # Web Interfaces
 
-Every SparkContext launches a web UI, by default on port 4040, that 
+Every SparkContext launches a web UI, by default on port 4040, that
 displays useful information about the application. This includes:
 
 * A list of scheduler stages and tasks
@@ -32,19 +32,19 @@ Spark's Standalone Mode cluster manager also has its own
 the course of its lifetime, then the Standalone master's web UI will 
automatically re-render the
 application's UI after the application has finished.
 
-If Spark is run on Mesos or YARN, it is still possible to reconstruct the UI 
of a finished
+If Spark is run on Mesos or YARN, it is still possible to construct the UI of 
an
 application through Spark's history server, provided that the application's 
event logs exist.
 You can start the history server by executing:
 
     ./sbin/start-history-server.sh
 
 This creates a web interface at `http://<server-url>:18080` by default, 
listing incomplete
-and completed applications and attempts, and allowing them to be viewed
+and completed applications and attempts.
 
 When using the file-system provider class (see `spark.history.provider` 
below), the base logging
 directory must be supplied in the `spark.history.fs.logDirectory` 
configuration option,
 and should contain sub-directories that each represents an application's event 
logs.
- 
+
 The spark jobs themselves must be configured to log events, and to log them to 
the same shared,
 writeable directory. For example, if the server was configured with a log 
directory of
 `hdfs://namenode/shared/spark-logs`, then the client-side options would be:
@@ -53,7 +53,7 @@ writeable directory. For example, if the server was 
configured with a log direct
 spark.eventLog.enabled true
 spark.eventLog.dir hdfs://namenode/shared/spark-logs
 ```
- 
+
 The history server can be configured as follows:
 
 ### Environment Variables
@@ -135,9 +135,9 @@ The history server can be configured as follows:
     <td>false</td>
     <td>
       Indicates whether the history server should use kerberos to login. This 
is required
-      if the history server is accessing HDFS files on a secure Hadoop 
cluster. If this is 
+      if the history server is accessing HDFS files on a secure Hadoop 
cluster. If this is
       true, it uses the configs <code>spark.history.kerberos.principal</code> 
and
-      <code>spark.history.kerberos.keytab</code>. 
+      <code>spark.history.kerberos.keytab</code>.
     </td>
   </tr>
   <tr>
@@ -159,12 +159,12 @@ The history server can be configured as follows:
     <td>false</td>
     <td>
       Specifies whether acls should be checked to authorize users viewing the 
applications.
-      If enabled, access control checks are made regardless of what the 
individual application had 
+      If enabled, access control checks are made regardless of what the 
individual application had
       set for <code>spark.ui.acls.enable</code> when the application was run. 
The application owner
-      will always have authorization to view their own application and any 
users specified via 
+      will always have authorization to view their own application and any 
users specified via
       <code>spark.ui.view.acls</code> when the application was run will also 
have authorization
-      to view that application. 
-      If disabled, no access control checks are made. 
+      to view that application.
+      If disabled, no access control checks are made.
     </td>
   </tr>
   <tr>
@@ -298,14 +298,14 @@ keep the paths consistent in both modes.
 
 # Metrics
 
-Spark has a configurable metrics system based on the 
-[Coda Hale Metrics Library](http://metrics.codahale.com/). 
-This allows users to report Spark metrics to a variety of sinks including 
HTTP, JMX, and CSV 
-files. The metrics system is configured via a configuration file that Spark 
expects to be present 
-at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be 
specified via the 
+Spark has a configurable metrics system based on the
+[Coda Hale Metrics Library](http://metrics.codahale.com/).
+This allows users to report Spark metrics to a variety of sinks including 
HTTP, JMX, and CSV
+files. The metrics system is configured via a configuration file that Spark 
expects to be present
+at `$SPARK_HOME/conf/metrics.properties`. A custom file location can be 
specified via the
 `spark.metrics.conf` [configuration 
property](configuration.html#spark-properties).
-Spark's metrics are decoupled into different 
-_instances_ corresponding to Spark components. Within each instance, you can 
configure a 
+Spark's metrics are decoupled into different
+_instances_ corresponding to Spark components. Within each instance, you can 
configure a
 set of sinks to which metrics are reported. The following instances are 
currently supported:
 
 * `master`: The Spark standalone master process.
@@ -330,26 +330,26 @@ licensing restrictions:
 * `GangliaSink`: Sends metrics to a Ganglia node or multicast group.
 
 To install the `GangliaSink` you'll need to perform a custom build of Spark. 
_**Note that
-by embedding this library you will include 
[LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed 
-code in your Spark package**_. For sbt users, set the 
-`SPARK_GANGLIA_LGPL` environment variable before building. For Maven users, 
enable 
+by embedding this library you will include 
[LGPL](http://www.gnu.org/copyleft/lesser.html)-licensed
+code in your Spark package**_. For sbt users, set the
+`SPARK_GANGLIA_LGPL` environment variable before building. For Maven users, 
enable
 the `-Pspark-ganglia-lgpl` profile. In addition to modifying the cluster's 
Spark build
 user applications will need to link to the `spark-ganglia-lgpl` artifact.
 
-The syntax of the metrics configuration file is defined in an example 
configuration file, 
+The syntax of the metrics configuration file is defined in an example 
configuration file,
 `$SPARK_HOME/conf/metrics.properties.template`.
 
 # Advanced Instrumentation
 
 Several external tools can be used to help profile the performance of Spark 
jobs:
 
-* Cluster-wide monitoring tools, such as 
[Ganglia](http://ganglia.sourceforge.net/), can provide 
-insight into overall cluster utilization and resource bottlenecks. For 
instance, a Ganglia 
-dashboard can quickly reveal whether a particular workload is disk bound, 
network bound, or 
+* Cluster-wide monitoring tools, such as 
[Ganglia](http://ganglia.sourceforge.net/), can provide
+insight into overall cluster utilization and resource bottlenecks. For 
instance, a Ganglia
+dashboard can quickly reveal whether a particular workload is disk bound, 
network bound, or
 CPU bound.
-* OS profiling tools such as [dstat](http://dag.wieers.com/home-made/dstat/), 
-[iostat](http://linux.die.net/man/1/iostat), and 
[iotop](http://linux.die.net/man/1/iotop) 
+* OS profiling tools such as [dstat](http://dag.wieers.com/home-made/dstat/),
+[iostat](http://linux.die.net/man/1/iostat), and 
[iotop](http://linux.die.net/man/1/iotop)
 can provide fine-grained profiling on individual nodes.
-* JVM utilities such as `jstack` for providing stack traces, `jmap` for 
creating heap-dumps, 
-`jstat` for reporting time-series statistics and `jconsole` for visually 
exploring various JVM 
+* JVM utilities such as `jstack` for providing stack traces, `jmap` for 
creating heap-dumps,
+`jstat` for reporting time-series statistics and `jconsole` for visually 
exploring various JVM
 properties are useful for those comfortable with JVM internals.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [Docs] Update monitoring.md to accurately describe the history server

Reply via email to