Author: buildbot Date: Thu Dec 17 15:19:32 2015 New Revision: 975802 Log: Production update by buildbot for camel
Modified: websites/production/camel/content/apache-spark.html websites/production/camel/content/cache/main.pageCache Modified: websites/production/camel/content/apache-spark.html ============================================================================== --- websites/production/camel/content/apache-spark.html (original) +++ websites/production/camel/content/apache-spark.html Thu Dec 17 15:19:32 2015 @@ -84,13 +84,13 @@ <tbody> <tr> <td valign="top" width="100%"> -<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div class="confluence-information-macro-body"><p> Apache Spark component is available starting from Camel <strong>2.17</strong>.</p></div></div><p> </p><p><span style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a> component for the Apache Camel. The main purpose of the Spark integration with Camel is to provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides a way to route message from various transports, dynamically choose a task to execute, use incoming message as input data for that task and finally deliver the results of the execut ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can be used as a driver application deployed into an application server (or executed as a fat jar).</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&modificationDate=1449478362000&api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data -linked-resource-container-id="61331559" data-linked-resource-container-version="15"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&modificationDate=1449478393000&api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data-linked-resource-container-id="61331559" data-linked- resource-container-version="15"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed to work as a <em>long running job</em> serving as an bridge between Spark cluster and the other endpoints, you can also use it as a <em>fire-once</em> short job.  </span> </p><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi servers</span></h3><p>Currently the Spark component doesn't support execution in the OSGi container. Spark has been designed to be executed as a fat jar, usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server is at least challenging and is not support by Camel as well.</p><h3 id="ApacheSpark-URIformat">URI format</h3><p>Currently the Spark component supports only producers - it it intended to invoke a Spark job and return results. You can call RDD, data frame or Hive SQL job.</p><div><p> </p><div class="code panel pdl" s tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark URI format</b></div><div class="codeContent panelContent pdl"> +<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div class="confluence-information-macro-body"><p> Apache Spark component is available starting from Camel <strong>2.17</strong>.</p></div></div><p> </p><p><span style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a> component for the Apache Camel. The main purpose of the Spark integration with Camel is to provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides a way to route message from various transports, dynamically choose a task to execute, use incoming message as input data for that task and finally deliver the results of the execut ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported architectural styles</span></h3><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can be used as a driver application deployed into an application server (or executed as a fat jar).</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&modificationDate=1449478362000&api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data -linked-resource-container-id="61331559" data-linked-resource-container-version="16"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&modificationDate=1449478393000&api=v2" data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png" data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png" data-linked-resource-container-id="61331559" data-linked- resource-container-version="16"></span><br clear="none"></span></p><p><span style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed to work as a <em>long running job</em> serving as an bridge between Spark cluster and the other endpoints, you can also use it as a <em>fire-once</em> short job.  </span> </p><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running Spark in OSGi servers</span></h3><p>Currently the Spark component doesn't support execution in the OSGi container. Spark has been designed to be executed as a fat jar, usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server is at least challenging and is not support by Camel as well.</p><h3 id="ApacheSpark-URIformat">URI format</h3><p>Currently the Spark component supports only producers - it it intended to invoke a Spark job and return results. You can call RDD, data frame or Hive SQL job.</p><div><p> </p><div class="code panel pdl" s tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark URI format</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:{rdd|dataframe|hive}]]></script> </div></div><p> </p></div><h3 id="ApacheSpark-RDDjobs">RDD jobs </h3><p> </p><div>To invoke an RDD job, use the following URI:</div><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD producer</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:rdd?rdd=#testFileRdd&rddCallback=#transformation]]></script> -</div></div><p> Where <code>rdd</code> option refers to the name of an RDD instance (subclass of <code>org.apache.spark.api.java.AbstractJavaRDDLike</code>) from a Camel registry, while <code>rddCallback</code> refers to the implementation of <code>org.apache.camel.component.spark.RddCallback</code> interface (also from a registry). RDD callback provides a single method used to apply incoming messages against the given RDD. Results of callback computations are saved as a body to an exchange.</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl"> +</div></div><p> Where <code>rdd</code> option refers to the name of an RDD instance (subclass of <code>org.apache.spark.api.java.JavaRDDLike</code>) from a Camel registry, while <code>rddCallback</code> refers to the implementation of <code>org.apache.camel.component.spark.RddCallback</code> interface (also from a registry). RDD callback provides a single method used to apply incoming messages against the given RDD. Results of callback computations are saved as a body to an exchange.</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD callback</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[public interface RddCallback<T> { - T onRdd(AbstractJavaRDDLike rdd, Object... payloads); + T onRdd(JavaRDDLike rdd, Object... payloads); }]]></script> </div></div><p>The following snippet demonstrates how to send message as an input to the job and return results:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Calling spark job</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[String pattern = "job input"; @@ -99,7 +99,7 @@ long linesCount = producerTemplate.reque <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean RddCallback<Long> countLinesContaining() { return new RddCallback<Long>() { - Long onRdd(AbstractJavaRDDLike rdd, Object... payloads) { + Long onRdd(JavaRDDLike rdd, Object... payloads) { String pattern = (String) payloads[0]; return rdd.filter({line -> line.contains(pattern)}).count(); } @@ -107,15 +107,15 @@ RddCallback<Long> countLinesContai }]]></script> </div></div><p>The RDD definition in Spring could looks as follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD definition</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean -AbstractJavaRDDLike myRdd(JavaSparkContext sparkContext) { +JavaRDDLike myRdd(JavaSparkContext sparkContext) { return sparkContext.textFile("testrdd.txt"); }]]></script> -</div></div><p> </p><h4 id="ApacheSpark-RDDjobsoptions">RDD jobs options</h4><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1" class="confluenceTh">Option</th><th colspan="1" rowspan="1" class="confluenceTh">Description</th><th colspan="1" rowspan="1" class="confluenceTh">Default value</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>rdd</code></td><td colspan="1" rowspan="1" class="confluenceTd">RDD instance (subclass of <code>org.apache.spark.api.java.AbstractJavaRDDLike</code>).</td><td colspan="1" rowspan="1" class="confluenceTd"><code>null</code></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>rddCallback</code></td><td colspan="1" rowspan="1" class="confluenceTd">Instance of <code>org.apache.camel.component.spark.RddCallback</code> interface.</td><td colspan="1" rowspan="1" class="confluenceTd"><code><span style="color: rgb(0,51,102);">null</span></code></td></tr></tbody></ta ble></div><h4 id="ApacheSpark-VoidRDDcallbacks">Void RDD callbacks</h4><p>If your RDD callback doesn't return any value back to a Camel pipeline, you can either return <code>null</code> value or use <code>VoidRddCallback</code> base class:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD definition</b></div><div class="codeContent panelContent pdl"> +</div></div><p> </p><h4 id="ApacheSpark-RDDjobsoptions">RDD jobs options</h4><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1" class="confluenceTh">Option</th><th colspan="1" rowspan="1" class="confluenceTh">Description</th><th colspan="1" rowspan="1" class="confluenceTh">Default value</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>rdd</code></td><td colspan="1" rowspan="1" class="confluenceTd">RDD instance (subclass of <code>org.apache.spark.api.java.JavaRDDLike</code>).</td><td colspan="1" rowspan="1" class="confluenceTd"><code>null</code></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>rddCallback</code></td><td colspan="1" rowspan="1" class="confluenceTd">Instance of <code>org.apache.camel.component.spark.RddCallback</code> interface.</td><td colspan="1" rowspan="1" class="confluenceTd"><code><span style="color: rgb(0,51,102);">null</span></code></td></tr></tbody></table></di v><h4 id="ApacheSpark-VoidRDDcallbacks">Void RDD callbacks</h4><p>If your RDD callback doesn't return any value back to a Camel pipeline, you can either return <code>null</code> value or use <code>VoidRddCallback</code> base class:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD definition</b></div><div class="codeContent panelContent pdl"> <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean RddCallback<Void> rddCallback() { return new VoidRddCallback() { @Override - public void doOnRdd(AbstractJavaRDDLike rdd, Object... payloads) { + public void doOnRdd(JavaRDDLike rdd, Object... payloads) { rdd.saveAsTextFile(output.getAbsolutePath()); } }; @@ -125,7 +125,7 @@ RddCallback<Void> rddCallback() { RddCallback<Long> rddCallback(CamelContext context) { return new ConvertingRddCallback<Long>(context, int.class, int.class) { @Override - public Long doOnRdd(AbstractJavaRDDLike rdd, Object... payloads) { + public Long doOnRdd(JavaRDDLike rdd, Object... payloads) { return rdd.count() * (int) payloads[0] * (int) payloads[1]; } }; Modified: websites/production/camel/content/cache/main.pageCache ============================================================================== Binary files - no diff available.