main.pageCache download.html hdfs.html hdfs2.html

buildbot Tue, 16 Dec 2014 23:20:08 -0800
Modified: websites/production/camel/content/hdfs2.html
==============================================================================
--- websites/production/camel/content/hdfs2.html (original)
+++ websites/production/camel/content/hdfs2.html Wed Dec 17 07:19:49 2014
@@ -96,9 +96,13 @@
 </div></div><h3 id="HDFS2-URIformat">URI format</h3><div class="code panel 
pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
 <script class="theme: Default; brush: java; gutter: false" 
type="syntaxhighlighter"><![CDATA[hdfs2://hostname[:port][/path][?options]
 ]]></script>
-</div></div><p>You can append query options to the URI in the following 
format, <code>?option=value&amp;option=value&amp;...</code><br clear="none"> 
The path is treated in the following way:</p><ol><li>as a consumer, if it's a 
file, it just reads the file, otherwise if it represents a directory it scans 
all the file under the path satisfying the configured pattern. All the files 
under that directory must be of the same type.</li><li>as a producer, if at 
least one split strategy is defined, the path is considered a directory and 
under that directory the producer creates a different file per split named 
using the configured <a shape="rect" 
href="uuidgenerator.html">UuidGenerator</a>.</li></ol><h3 
id="HDFS2-Options">Options</h3><div class="confluenceTableSmall">
-<div class="table-wrap"><table class="confluenceTable"><tbody><tr><th 
colspan="1" rowspan="1" class="confluenceTh"><p> Name </p></th><th colspan="1" 
rowspan="1" class="confluenceTh"><p> Default Value </p></th><th colspan="1" 
rowspan="1" class="confluenceTh"><p> Description </p></th></tr><tr><td 
colspan="1" rowspan="1" class="confluenceTd"><p> <code>overwrite</code> 
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> <code>true</code> 
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> The file can be 
overwritten </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>append</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>false</code> </p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p> Append to existing file. 
Notice that not all HDFS file systems support the append option. 
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>bufferSize</code> </p></td><td colspan="1" rowspan="1" cla
 ss="confluenceTd"><p> <code>4096</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> The buffer size used by HDFS  </p></td></tr><tr><td 
colspan="1" rowspan="1" class="confluenceTd"><p> <code>replication</code> 
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> <code>3</code> 
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> The HDFS 
replication factor  </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>blockSize</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>67108864</code> </p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p> The size of the HDFS blocks  
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>fileType</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>NORMAL_FILE</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> It can be SEQUENCE_FILE, MAP_FILE, 
ARRAY_FILE, or BLOOMMAP_FILE, see Hadoop </p></td></tr>
 <tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>fileSystemType</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>HDFS</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> It can be LOCAL for local filesystem  
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>keyType</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>NULL</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> The type for the key in case of sequence or map files. 
See below.  </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>valueType</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>TEXT</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> The type for the key in case of sequence 
or map files. See below.  </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>splitStrategy</code> </p></td><td colspan="1" 
rowsp
 an="1" class="confluenceTd"><p>&#160;</p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> A string describing the strategy on how to split the 
file based on different criteria. See below.  </p></td></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>openedSuffix</code> </p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p> <code>opened</code> 
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> When a file is 
opened for reading/writing the file is renamed with this suffix to avoid to 
read it during the writing phase. </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>readSuffix</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>read</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> Once the file has been read is renamed 
with this suffix to avoid to read it again.  </p></td></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>initialDelay</code> </p></td><td co
 lspan="1" rowspan="1" class="confluenceTd"><p> <code>0</code> </p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p> For the consumer, how much to 
wait (milliseconds) before to start scanning the directory.  
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>delay</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>0</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> The interval (milliseconds) between the directory 
scans. </p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>pattern</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>*</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> The pattern used for scanning the directory  
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>chunkSize</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>4096</code> </p></td><td colspan="1" rowspan="1" 
class="co
 nfluenceTd"><p> When reading a normal file, this is split into chunks 
producing a message per chunk. </p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <code>connectOnStartup</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <code>true</code> </p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p> <strong>Camel 2.9.3/2.10.1:</strong> 
Whether to connect to the HDFS file system on starting the producer/consumer. 
If <code>false</code> then the connection is created on-demand. Notice that 
HDFS may take up till 15 minutes to establish a connection, as it has hardcoded 
45 x 20 sec redelivery. By setting this option to <code>false</code> allows 
your application to startup, and not block for up till 15 minutes. 
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>owner</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>&#160;</p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> The file owner must 
 match this owner for the consumer to pickup the file. Otherwise the file is 
skipped. </p></td></tr></tbody></table></div>
-</div><h4 id="HDFS2-KeyTypeandValueType">KeyType and 
ValueType</h4><ul><li>NULL it means that the key or the value is 
absent</li><li>BYTE for writing a byte, the java Byte class is mapped into a 
BYTE</li><li>BYTES for writing a sequence of bytes. It maps the java ByteBuffer 
class</li><li>INT for writing java integer</li><li>FLOAT for writing java 
float</li><li>LONG for writing java long</li><li>DOUBLE for writing java 
double</li><li>TEXT for writing java strings</li></ul><p>BYTES is also used 
with everything else, for example, in Camel a file is sent around as an 
InputStream, int this case is written in a sequence file or a map file as a 
sequence of bytes.</p><h3 id="HDFS2-SplittingStrategy">Splitting 
Strategy</h3><p>In the current version of Hadoop opening a file in append mode 
is disabled since it's not very reliable. So, for the moment, it's only 
possible to create new files. The Camel HDFS endpoint tries to solve this 
problem in this way:</p><ul><li>If the split strategy option 
 has been defined, the hdfs path will be used as a directory and files will be 
created using the configured <a shape="rect" 
href="uuidgenerator.html">UuidGenerator</a></li><li>Every time a splitting 
condition is met, a new file is created.<br clear="none"> The splitStrategy 
option is defined as a string with the following syntax:<br clear="none"> 
splitStrategy=&lt;ST&gt;:&lt;value&gt;,&lt;ST&gt;:&lt;value&gt;,*</li></ul><p>where
 &lt;ST&gt; can be:</p><ul><li>BYTES a new file is created, and the old is 
closed when the number of written bytes is more than 
&lt;value&gt;</li><li>MESSAGES a new file is created, and the old is closed 
when the number of written messages is more than &lt;value&gt;</li><li>IDLE a 
new file is created, and the old is closed when no writing happened in the last 
&lt;value&gt; milliseconds</li></ul>    <div class="aui-message warning 
shadowed information-macro">
+</div></div><p>You can append query options to the URI in the following 
format, <code>?option=value&amp;option=value&amp;...</code><br clear="none"> 
The path is treated in the following way:</p><ol><li>as a consumer, if it's a 
file, it just reads the file, otherwise if it represents a directory it scans 
all the file under the path satisfying the configured pattern. All the files 
under that directory must be of the same type.</li><li>as a producer, if at 
least one split strategy is defined, the path is considered a directory and 
under that directory the producer creates a different file per split named 
using the configured <a shape="rect" 
href="uuidgenerator.html">UuidGenerator</a>.</li></ol><p>&#160;</p>    <div 
class="aui-message warning shadowed information-macro">
+                            <span class="aui-icon icon-warning">Icon</span>
+                <div class="message-content">
+                            <p>When consuming from hdfs2 then in normal mode, 
a file is split into chunks, producing a message per chunk. You can configure 
the size of the chunk using the chunkSize option. If you want to read from hdfs 
and write to a regular file using the file component, then you can use the 
fileMode=Append to append each of the chunks together.</p>
+                    </div>
+    </div>
+<h3 id="HDFS2-Options">Options</h3><div class="confluenceTableSmall"><div 
class="table-wrap"><table class="confluenceTable"><tbody><tr><th colspan="1" 
rowspan="1" class="confluenceTh"><p>Name</p></th><th colspan="1" rowspan="1" 
class="confluenceTh"><p>Default Value</p></th><th colspan="1" rowspan="1" 
class="confluenceTh"><p>Description</p></th></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>overwrite</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p><code>true</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p>The file can be 
overwritten</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>append</code></p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>false</code></p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>Append to existing file. Notice that not all HDFS file 
systems support the append option.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>bufferSi
 ze</code></p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>4096</code></p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>The buffer size used by HDFS</p></td></tr><tr><td 
colspan="1" rowspan="1" 
class="confluenceTd"><p><code>replication</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>3</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>The HDFS replication 
factor</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>blockSize</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>67108864</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p>The size of the HDFS 
blocks</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>fileType</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>NORMAL_FILE</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p>It can be SEQUENCE_FILE, 
MAP_FILE, ARRAY_FILE, or BLOOMMAP_FILE, 
 see Hadoop</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>fileSystemType</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>HDFS</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>It can be LOCAL for local 
filesystem</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>keyType</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>NULL</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>The type for the key in case of sequence or 
map files. See below.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>valueType</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>TEXT</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>The type for the key in case of sequence or 
map files. See below.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>splitStrategy</code></p></td><td colspan="1" rows
 pan="1" class="confluenceTd"><p>&#160;</p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>A string describing the strategy on how to split the 
file based on different criteria. See below.</p></td></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>openedSuffix</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p><code>opened</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p>When a file is opened for 
reading/writing the file is renamed with this suffix to avoid to read it during 
the writing phase.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>readSuffix</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>read</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>Once the file has been read is renamed with 
this suffix to avoid to read it again.</p></td></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>initialDelay</code></p></td><td 
colspan="1" rowspan
 ="1" class="confluenceTd"><p><code>0</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>For the consumer, how much to wait 
(milliseconds) before to start scanning the directory.</p></td></tr><tr><td 
colspan="1" rowspan="1" class="confluenceTd"><p><code>delay</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p><code>0</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p>The interval (milliseconds) 
between the directory scans.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>pattern</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>*</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>The pattern used for scanning the 
directory</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>chunkSize</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>4096</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p>When reading a normal fi
 le, this is split into chunks producing a message per 
chunk.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>connectOnStartup</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>true</code></p></td><td colspan="1" 
rowspan="1" class="confluenceTd"><p><strong>Camel 2.9.3/2.10.1:</strong> 
Whether to connect to the HDFS file system on starting the producer/consumer. 
If <code>false</code> then the connection is created on-demand. Notice that 
HDFS may take up till 15 minutes to establish a connection, as it has hardcoded 
45 x 20 sec redelivery. By setting this option to <code>false</code> allows 
your application to startup, and not block for up till 15 
minutes.</p></td></tr><tr><td colspan="1" rowspan="1" 
class="confluenceTd"><p><code>owner</code></p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>&#160;</p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p>The file owner must match this owner for the consumer 
to pickup the fi
 le. Otherwise the file is 
skipped.</p></td></tr></tbody></table></div></div><h4 
id="HDFS2-KeyTypeandValueType">KeyType and ValueType</h4><ul><li>NULL it means 
that the key or the value is absent</li><li>BYTE for writing a byte, the java 
Byte class is mapped into a BYTE</li><li>BYTES for writing a sequence of bytes. 
It maps the java ByteBuffer class</li><li>INT for writing java 
integer</li><li>FLOAT for writing java float</li><li>LONG for writing java 
long</li><li>DOUBLE for writing java double</li><li>TEXT for writing java 
strings</li></ul><p>BYTES is also used with everything else, for example, in 
Camel a file is sent around as an InputStream, int this case is written in a 
sequence file or a map file as a sequence of bytes.</p><h3 
id="HDFS2-SplittingStrategy">Splitting Strategy</h3><p>In the current version 
of Hadoop opening a file in append mode is disabled since it's not very 
reliable. So, for the moment, it's only possible to create new files. The Camel 
HDFS endpoint tries to so
 lve this problem in this way:</p><ul><li>If the split strategy option has been 
defined, the hdfs path will be used as a directory and files will be created 
using the configured <a shape="rect" 
href="uuidgenerator.html">UuidGenerator</a></li><li>Every time a splitting 
condition is met, a new file is created.<br clear="none"> The splitStrategy 
option is defined as a string with the following syntax:<br clear="none"> 
splitStrategy=&lt;ST&gt;:&lt;value&gt;,&lt;ST&gt;:&lt;value&gt;,*</li></ul><p>where
 &lt;ST&gt; can be:</p><ul><li>BYTES a new file is created, and the old is 
closed when the number of written bytes is more than 
&lt;value&gt;</li><li>MESSAGES a new file is created, and the old is closed 
when the number of written messages is more than &lt;value&gt;</li><li>IDLE a 
new file is created, and the old is closed when no writing happened in the last 
&lt;value&gt; milliseconds</li></ul>    <div class="aui-message warning 
shadowed information-macro">
                             <span class="aui-icon icon-warning">Icon</span>
                 <div class="message-content">
                             <p>note that this strategy currently requires 
either setting an IDLE value or setting the HdfsConstants.HDFS_CLOSE header to 
false to use the BYTES/MESSAGES configuration...otherwise, the file will be 
closed with each message</p>
@@ -107,9 +111,7 @@
 <p>for example:</p><div class="code panel pdl" style="border-width: 1px;"><div 
class="codeContent panelContent pdl">
 <script class="theme: Default; brush: java; gutter: false" 
type="syntaxhighlighter"><![CDATA[hdfs2://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5
 ]]></script>
-</div></div><p>it means: a new file is created either when it has been idle 
for more than 1 second or if more than 5 bytes have been written. So, running 
<code>hadoop fs -ls /tmp/simple-file</code> you'll see that multiple files have 
been created.</p><h3 id="HDFS2-MessageHeaders">Message Headers</h3><p>The 
following headers are supported by this component:</p><h4 
id="HDFS2-Produceronly">Producer only</h4><div class="confluenceTableSmall">
-<div class="table-wrap"><table class="confluenceTable"><tbody><tr><th 
colspan="1" rowspan="1" class="confluenceTh"><p> Header </p></th><th 
colspan="1" rowspan="1" class="confluenceTh"><p> Description 
</p></th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p> 
<code>CamelFileName</code> </p></td><td colspan="1" rowspan="1" 
class="confluenceTd"><p> <strong>Camel 2.13:</strong> Specifies the name of the 
file to write (relative to the endpoint path). The name can be a 
<code>String</code> or an <a shape="rect" href="expression.html">Expression</a> 
object. Only relevant when not using a split strategy. 
</p></td></tr></tbody></table></div>
-</div><h3 id="HDFS2-Controllingtoclosefilestream">Controlling to close file 
stream</h3><p>When using the <a shape="rect" href="hdfs2.html">HDFS2</a> 
producer <strong>without</strong> a split strategy, then the file output stream 
is by default closed after the write. However you may want to keep the stream 
open, and only explicitly close the stream later. For that you can use the 
header <code>HdfsConstants.HDFS_CLOSE</code> (value = 
<code>"CamelHdfsClose"</code>) to control this. Setting this value to a boolean 
allows you to explicit control whether the stream should be closed or 
not.</p><p>Notice this does not apply if you use a split strategy, as there are 
various strategies that can control when the stream is closed.</p><h3 
id="HDFS2-UsingthiscomponentinOSGi">Using this component in OSGi</h3><p>There 
are some quirks when running this component in an OSGi environment related to 
the mechanism Hadoop 2.x uses to discover different 
<code>org.apache.hadoop.fs.FileSystem</code> implemen
 tations. Hadoop 2.x uses <code>java.util.ServiceLoader</code> which looks for 
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> files defining 
available filesystem types and implementations. These resources are not 
available when running inside OSGi.</p><p>As with&#160;<code>camel-hdfs</code> 
component, the default configuration files need to be visible from the bundle 
class loader. A typical way to deal with it is to keep a copy 
of&#160;<code>core-default.xml</code> (and e.g., <code>hdfs-default.xml</code>) 
in your bundle root.</p><h4 
id="HDFS2-Usingthiscomponentwithmanuallydefinedroutes">Using this component 
with manually defined routes</h4><p>There are two options:</p><ol><li>Package 
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> resource with 
bundle that defines the routes. This resource should list all the required 
Hadoop 2.x filesystem implementations.</li><li>Provide boilerplate 
initialization code which populates internal, static cache inside <
 code>org.apache.hadoop.fs.FileSystem</code> class:</li></ol><div class="code 
panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+</div></div><p>it means: a new file is created either when it has been idle 
for more than 1 second or if more than 5 bytes have been written. So, running 
<code>hadoop fs -ls /tmp/simple-file</code> you'll see that multiple files have 
been created.</p><h3 id="HDFS2-MessageHeaders">Message Headers</h3><p>The 
following headers are supported by this component:</p><h4 
id="HDFS2-Produceronly">Producer only</h4><div 
class="confluenceTableSmall"><div class="table-wrap"><table 
class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1" 
class="confluenceTh"><p>Header</p></th><th colspan="1" rowspan="1" 
class="confluenceTh"><p>Description</p></th></tr><tr><td colspan="1" 
rowspan="1" class="confluenceTd"><p><code>CamelFileName</code></p></td><td 
colspan="1" rowspan="1" class="confluenceTd"><p><strong>Camel 2.13:</strong> 
Specifies the name of the file to write (relative to the endpoint path). The 
name can be a <code>String</code> or an <a shape="rect" 
href="expression.html">Expression</a> ob
 ject. Only relevant when not using a split 
strategy.</p></td></tr></tbody></table></div></div><h3 
id="HDFS2-Controllingtoclosefilestream">Controlling to close file 
stream</h3><p>When using the <a shape="rect" href="hdfs2.html">HDFS2</a> 
producer <strong>without</strong> a split strategy, then the file output stream 
is by default closed after the write. However you may want to keep the stream 
open, and only explicitly close the stream later. For that you can use the 
header <code>HdfsConstants.HDFS_CLOSE</code> (value = 
<code>"CamelHdfsClose"</code>) to control this. Setting this value to a boolean 
allows you to explicit control whether the stream should be closed or 
not.</p><p>Notice this does not apply if you use a split strategy, as there are 
various strategies that can control when the stream is closed.</p><h3 
id="HDFS2-UsingthiscomponentinOSGi">Using this component in OSGi</h3><p>There 
are some quirks when running this component in an OSGi environment related to 
the mechanism Had
 oop 2.x uses to discover different 
<code>org.apache.hadoop.fs.FileSystem</code> implementations. Hadoop 2.x uses 
<code>java.util.ServiceLoader</code> which looks for 
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> files defining 
available filesystem types and implementations. These resources are not 
available when running inside OSGi.</p><p>As with&#160;<code>camel-hdfs</code> 
component, the default configuration files need to be visible from the bundle 
class loader. A typical way to deal with it is to keep a copy 
of&#160;<code>core-default.xml</code> (and e.g., <code>hdfs-default.xml</code>) 
in your bundle root.</p><h4 
id="HDFS2-Usingthiscomponentwithmanuallydefinedroutes">Using this component 
with manually defined routes</h4><p>There are two options:</p><ol><li>Package 
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> resource with 
bundle that defines the routes. This resource should list all the required 
Hadoop 2.x filesystem implementations.</li><li
 >Provide boilerplate initialization code which populates internal, static 
 >cache inside <code>org.apache.hadoop.fs.FileSystem</code> 
 >class:</li></ol><div class="code panel pdl" style="border-width: 1px;"><div 
 >class="codeContent panelContent pdl">
 <script class="theme: Default; brush: java; gutter: false" 
type="syntaxhighlighter"><![CDATA[org.apache.hadoop.conf.Configuration conf = 
new org.apache.hadoop.conf.Configuration();
 conf.setClass(&quot;fs.file.impl&quot;, 
org.apache.hadoop.fs.LocalFileSystem.class, FileSystem.class);
 conf.setClass(&quot;fs.hdfs.impl&quot;, 
org.apache.hadoop.hdfs.DistributedFileSystem.class, FileSystem.class);
svn commit: r933087 [2/2] - in /websites/production/camel/content: book-component-appendix.html book-in-one-page.html cache/main.pageCache download.html hdfs.html hdfs2.html

Reply via email to