Modified: websites/production/camel/content/hdfs2.html
==============================================================================
--- websites/production/camel/content/hdfs2.html (original)
+++ websites/production/camel/content/hdfs2.html Wed Dec 17 07:19:49 2014
@@ -96,9 +96,13 @@
</div></div><h3 id="HDFS2-URIformat">URI format</h3><div class="code panel
pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
<script class="theme: Default; brush: java; gutter: false"
type="syntaxhighlighter"><![CDATA[hdfs2://hostname[:port][/path][?options]
]]></script>
-</div></div><p>You can append query options to the URI in the following
format, <code>?option=value&option=value&...</code><br clear="none">
The path is treated in the following way:</p><ol><li>as a consumer, if it's a
file, it just reads the file, otherwise if it represents a directory it scans
all the file under the path satisfying the configured pattern. All the files
under that directory must be of the same type.</li><li>as a producer, if at
least one split strategy is defined, the path is considered a directory and
under that directory the producer creates a different file per split named
using the configured <a shape="rect"
href="uuidgenerator.html">UuidGenerator</a>.</li></ol><h3
id="HDFS2-Options">Options</h3><div class="confluenceTableSmall">
-<div class="table-wrap"><table class="confluenceTable"><tbody><tr><th
colspan="1" rowspan="1" class="confluenceTh"><p> Name </p></th><th colspan="1"
rowspan="1" class="confluenceTh"><p> Default Value </p></th><th colspan="1"
rowspan="1" class="confluenceTh"><p> Description </p></th></tr><tr><td
colspan="1" rowspan="1" class="confluenceTd"><p> <code>overwrite</code>
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> <code>true</code>
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> The file can be
overwritten </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>append</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>false</code> </p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p> Append to existing file.
Notice that not all HDFS file systems support the append option.
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>bufferSize</code> </p></td><td colspan="1" rowspan="1" cla
ss="confluenceTd"><p> <code>4096</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> The buffer size used by HDFS </p></td></tr><tr><td
colspan="1" rowspan="1" class="confluenceTd"><p> <code>replication</code>
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> <code>3</code>
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> The HDFS
replication factor </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>blockSize</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>67108864</code> </p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p> The size of the HDFS blocks
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>fileType</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>NORMAL_FILE</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> It can be SEQUENCE_FILE, MAP_FILE,
ARRAY_FILE, or BLOOMMAP_FILE, see Hadoop </p></td></tr>
<tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>fileSystemType</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>HDFS</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> It can be LOCAL for local filesystem
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>keyType</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>NULL</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> The type for the key in case of sequence or map files.
See below. </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>valueType</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>TEXT</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> The type for the key in case of sequence
or map files. See below. </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>splitStrategy</code> </p></td><td colspan="1"
rowsp
an="1" class="confluenceTd"><p> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> A string describing the strategy on how to split the
file based on different criteria. See below. </p></td></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>openedSuffix</code> </p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p> <code>opened</code>
</p></td><td colspan="1" rowspan="1" class="confluenceTd"><p> When a file is
opened for reading/writing the file is renamed with this suffix to avoid to
read it during the writing phase. </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>readSuffix</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>read</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> Once the file has been read is renamed
with this suffix to avoid to read it again. </p></td></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>initialDelay</code> </p></td><td co
lspan="1" rowspan="1" class="confluenceTd"><p> <code>0</code> </p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p> For the consumer, how much to
wait (milliseconds) before to start scanning the directory.
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>delay</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>0</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> The interval (milliseconds) between the directory
scans. </p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>pattern</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>*</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> The pattern used for scanning the directory
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>chunkSize</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>4096</code> </p></td><td colspan="1" rowspan="1"
class="co
nfluenceTd"><p> When reading a normal file, this is split into chunks
producing a message per chunk. </p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p> <code>connectOnStartup</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <code>true</code> </p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p> <strong>Camel 2.9.3/2.10.1:</strong>
Whether to connect to the HDFS file system on starting the producer/consumer.
If <code>false</code> then the connection is created on-demand. Notice that
HDFS may take up till 15 minutes to establish a connection, as it has hardcoded
45 x 20 sec redelivery. By setting this option to <code>false</code> allows
your application to startup, and not block for up till 15 minutes.
</p></td></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>owner</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> The file owner must
match this owner for the consumer to pickup the file. Otherwise the file is
skipped. </p></td></tr></tbody></table></div>
-</div><h4 id="HDFS2-KeyTypeandValueType">KeyType and
ValueType</h4><ul><li>NULL it means that the key or the value is
absent</li><li>BYTE for writing a byte, the java Byte class is mapped into a
BYTE</li><li>BYTES for writing a sequence of bytes. It maps the java ByteBuffer
class</li><li>INT for writing java integer</li><li>FLOAT for writing java
float</li><li>LONG for writing java long</li><li>DOUBLE for writing java
double</li><li>TEXT for writing java strings</li></ul><p>BYTES is also used
with everything else, for example, in Camel a file is sent around as an
InputStream, int this case is written in a sequence file or a map file as a
sequence of bytes.</p><h3 id="HDFS2-SplittingStrategy">Splitting
Strategy</h3><p>In the current version of Hadoop opening a file in append mode
is disabled since it's not very reliable. So, for the moment, it's only
possible to create new files. The Camel HDFS endpoint tries to solve this
problem in this way:</p><ul><li>If the split strategy option
has been defined, the hdfs path will be used as a directory and files will be
created using the configured <a shape="rect"
href="uuidgenerator.html">UuidGenerator</a></li><li>Every time a splitting
condition is met, a new file is created.<br clear="none"> The splitStrategy
option is defined as a string with the following syntax:<br clear="none">
splitStrategy=<ST>:<value>,<ST>:<value>,*</li></ul><p>where
<ST> can be:</p><ul><li>BYTES a new file is created, and the old is
closed when the number of written bytes is more than
<value></li><li>MESSAGES a new file is created, and the old is closed
when the number of written messages is more than <value></li><li>IDLE a
new file is created, and the old is closed when no writing happened in the last
<value> milliseconds</li></ul> <div class="aui-message warning
shadowed information-macro">
+</div></div><p>You can append query options to the URI in the following
format, <code>?option=value&option=value&...</code><br clear="none">
The path is treated in the following way:</p><ol><li>as a consumer, if it's a
file, it just reads the file, otherwise if it represents a directory it scans
all the file under the path satisfying the configured pattern. All the files
under that directory must be of the same type.</li><li>as a producer, if at
least one split strategy is defined, the path is considered a directory and
under that directory the producer creates a different file per split named
using the configured <a shape="rect"
href="uuidgenerator.html">UuidGenerator</a>.</li></ol><p> </p> <div
class="aui-message warning shadowed information-macro">
+ <span class="aui-icon icon-warning">Icon</span>
+ <div class="message-content">
+ <p>When consuming from hdfs2 then in normal mode,
a file is split into chunks, producing a message per chunk. You can configure
the size of the chunk using the chunkSize option. If you want to read from hdfs
and write to a regular file using the file component, then you can use the
fileMode=Append to append each of the chunks together.</p>
+ </div>
+ </div>
+<h3 id="HDFS2-Options">Options</h3><div class="confluenceTableSmall"><div
class="table-wrap"><table class="confluenceTable"><tbody><tr><th colspan="1"
rowspan="1" class="confluenceTh"><p>Name</p></th><th colspan="1" rowspan="1"
class="confluenceTh"><p>Default Value</p></th><th colspan="1" rowspan="1"
class="confluenceTh"><p>Description</p></th></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>overwrite</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p><code>true</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p>The file can be
overwritten</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>append</code></p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>false</code></p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p>Append to existing file. Notice that not all HDFS file
systems support the append option.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>bufferSi
ze</code></p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>4096</code></p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p>The buffer size used by HDFS</p></td></tr><tr><td
colspan="1" rowspan="1"
class="confluenceTd"><p><code>replication</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>3</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>The HDFS replication
factor</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>blockSize</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>67108864</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p>The size of the HDFS
blocks</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>fileType</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>NORMAL_FILE</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p>It can be SEQUENCE_FILE,
MAP_FILE, ARRAY_FILE, or BLOOMMAP_FILE,
see Hadoop</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>fileSystemType</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>HDFS</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>It can be LOCAL for local
filesystem</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>keyType</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>NULL</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>The type for the key in case of sequence or
map files. See below.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>valueType</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>TEXT</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>The type for the key in case of sequence or
map files. See below.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>splitStrategy</code></p></td><td colspan="1" rows
pan="1" class="confluenceTd"><p> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p>A string describing the strategy on how to split the
file based on different criteria. See below.</p></td></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>openedSuffix</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p><code>opened</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p>When a file is opened for
reading/writing the file is renamed with this suffix to avoid to read it during
the writing phase.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>readSuffix</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>read</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>Once the file has been read is renamed with
this suffix to avoid to read it again.</p></td></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>initialDelay</code></p></td><td
colspan="1" rowspan
="1" class="confluenceTd"><p><code>0</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>For the consumer, how much to wait
(milliseconds) before to start scanning the directory.</p></td></tr><tr><td
colspan="1" rowspan="1" class="confluenceTd"><p><code>delay</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p><code>0</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p>The interval (milliseconds)
between the directory scans.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>pattern</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>*</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>The pattern used for scanning the
directory</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>chunkSize</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>4096</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p>When reading a normal fi
le, this is split into chunks producing a message per
chunk.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>connectOnStartup</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>true</code></p></td><td colspan="1"
rowspan="1" class="confluenceTd"><p><strong>Camel 2.9.3/2.10.1:</strong>
Whether to connect to the HDFS file system on starting the producer/consumer.
If <code>false</code> then the connection is created on-demand. Notice that
HDFS may take up till 15 minutes to establish a connection, as it has hardcoded
45 x 20 sec redelivery. By setting this option to <code>false</code> allows
your application to startup, and not block for up till 15
minutes.</p></td></tr><tr><td colspan="1" rowspan="1"
class="confluenceTd"><p><code>owner</code></p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p>The file owner must match this owner for the consumer
to pickup the fi
le. Otherwise the file is
skipped.</p></td></tr></tbody></table></div></div><h4
id="HDFS2-KeyTypeandValueType">KeyType and ValueType</h4><ul><li>NULL it means
that the key or the value is absent</li><li>BYTE for writing a byte, the java
Byte class is mapped into a BYTE</li><li>BYTES for writing a sequence of bytes.
It maps the java ByteBuffer class</li><li>INT for writing java
integer</li><li>FLOAT for writing java float</li><li>LONG for writing java
long</li><li>DOUBLE for writing java double</li><li>TEXT for writing java
strings</li></ul><p>BYTES is also used with everything else, for example, in
Camel a file is sent around as an InputStream, int this case is written in a
sequence file or a map file as a sequence of bytes.</p><h3
id="HDFS2-SplittingStrategy">Splitting Strategy</h3><p>In the current version
of Hadoop opening a file in append mode is disabled since it's not very
reliable. So, for the moment, it's only possible to create new files. The Camel
HDFS endpoint tries to so
lve this problem in this way:</p><ul><li>If the split strategy option has been
defined, the hdfs path will be used as a directory and files will be created
using the configured <a shape="rect"
href="uuidgenerator.html">UuidGenerator</a></li><li>Every time a splitting
condition is met, a new file is created.<br clear="none"> The splitStrategy
option is defined as a string with the following syntax:<br clear="none">
splitStrategy=<ST>:<value>,<ST>:<value>,*</li></ul><p>where
<ST> can be:</p><ul><li>BYTES a new file is created, and the old is
closed when the number of written bytes is more than
<value></li><li>MESSAGES a new file is created, and the old is closed
when the number of written messages is more than <value></li><li>IDLE a
new file is created, and the old is closed when no writing happened in the last
<value> milliseconds</li></ul> <div class="aui-message warning
shadowed information-macro">
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content">
<p>note that this strategy currently requires
either setting an IDLE value or setting the HdfsConstants.HDFS_CLOSE header to
false to use the BYTES/MESSAGES configuration...otherwise, the file will be
closed with each message</p>
@@ -107,9 +111,7 @@
<p>for example:</p><div class="code panel pdl" style="border-width: 1px;"><div
class="codeContent panelContent pdl">
<script class="theme: Default; brush: java; gutter: false"
type="syntaxhighlighter"><![CDATA[hdfs2://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5
]]></script>
-</div></div><p>it means: a new file is created either when it has been idle
for more than 1 second or if more than 5 bytes have been written. So, running
<code>hadoop fs -ls /tmp/simple-file</code> you'll see that multiple files have
been created.</p><h3 id="HDFS2-MessageHeaders">Message Headers</h3><p>The
following headers are supported by this component:</p><h4
id="HDFS2-Produceronly">Producer only</h4><div class="confluenceTableSmall">
-<div class="table-wrap"><table class="confluenceTable"><tbody><tr><th
colspan="1" rowspan="1" class="confluenceTh"><p> Header </p></th><th
colspan="1" rowspan="1" class="confluenceTh"><p> Description
</p></th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><p>
<code>CamelFileName</code> </p></td><td colspan="1" rowspan="1"
class="confluenceTd"><p> <strong>Camel 2.13:</strong> Specifies the name of the
file to write (relative to the endpoint path). The name can be a
<code>String</code> or an <a shape="rect" href="expression.html">Expression</a>
object. Only relevant when not using a split strategy.
</p></td></tr></tbody></table></div>
-</div><h3 id="HDFS2-Controllingtoclosefilestream">Controlling to close file
stream</h3><p>When using the <a shape="rect" href="hdfs2.html">HDFS2</a>
producer <strong>without</strong> a split strategy, then the file output stream
is by default closed after the write. However you may want to keep the stream
open, and only explicitly close the stream later. For that you can use the
header <code>HdfsConstants.HDFS_CLOSE</code> (value =
<code>"CamelHdfsClose"</code>) to control this. Setting this value to a boolean
allows you to explicit control whether the stream should be closed or
not.</p><p>Notice this does not apply if you use a split strategy, as there are
various strategies that can control when the stream is closed.</p><h3
id="HDFS2-UsingthiscomponentinOSGi">Using this component in OSGi</h3><p>There
are some quirks when running this component in an OSGi environment related to
the mechanism Hadoop 2.x uses to discover different
<code>org.apache.hadoop.fs.FileSystem</code> implemen
tations. Hadoop 2.x uses <code>java.util.ServiceLoader</code> which looks for
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> files defining
available filesystem types and implementations. These resources are not
available when running inside OSGi.</p><p>As with <code>camel-hdfs</code>
component, the default configuration files need to be visible from the bundle
class loader. A typical way to deal with it is to keep a copy
of <code>core-default.xml</code> (and e.g., <code>hdfs-default.xml</code>)
in your bundle root.</p><h4
id="HDFS2-Usingthiscomponentwithmanuallydefinedroutes">Using this component
with manually defined routes</h4><p>There are two options:</p><ol><li>Package
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> resource with
bundle that defines the routes. This resource should list all the required
Hadoop 2.x filesystem implementations.</li><li>Provide boilerplate
initialization code which populates internal, static cache inside <
code>org.apache.hadoop.fs.FileSystem</code> class:</li></ol><div class="code
panel pdl" style="border-width: 1px;"><div class="codeContent panelContent pdl">
+</div></div><p>it means: a new file is created either when it has been idle
for more than 1 second or if more than 5 bytes have been written. So, running
<code>hadoop fs -ls /tmp/simple-file</code> you'll see that multiple files have
been created.</p><h3 id="HDFS2-MessageHeaders">Message Headers</h3><p>The
following headers are supported by this component:</p><h4
id="HDFS2-Produceronly">Producer only</h4><div
class="confluenceTableSmall"><div class="table-wrap"><table
class="confluenceTable"><tbody><tr><th colspan="1" rowspan="1"
class="confluenceTh"><p>Header</p></th><th colspan="1" rowspan="1"
class="confluenceTh"><p>Description</p></th></tr><tr><td colspan="1"
rowspan="1" class="confluenceTd"><p><code>CamelFileName</code></p></td><td
colspan="1" rowspan="1" class="confluenceTd"><p><strong>Camel 2.13:</strong>
Specifies the name of the file to write (relative to the endpoint path). The
name can be a <code>String</code> or an <a shape="rect"
href="expression.html">Expression</a> ob
ject. Only relevant when not using a split
strategy.</p></td></tr></tbody></table></div></div><h3
id="HDFS2-Controllingtoclosefilestream">Controlling to close file
stream</h3><p>When using the <a shape="rect" href="hdfs2.html">HDFS2</a>
producer <strong>without</strong> a split strategy, then the file output stream
is by default closed after the write. However you may want to keep the stream
open, and only explicitly close the stream later. For that you can use the
header <code>HdfsConstants.HDFS_CLOSE</code> (value =
<code>"CamelHdfsClose"</code>) to control this. Setting this value to a boolean
allows you to explicit control whether the stream should be closed or
not.</p><p>Notice this does not apply if you use a split strategy, as there are
various strategies that can control when the stream is closed.</p><h3
id="HDFS2-UsingthiscomponentinOSGi">Using this component in OSGi</h3><p>There
are some quirks when running this component in an OSGi environment related to
the mechanism Had
oop 2.x uses to discover different
<code>org.apache.hadoop.fs.FileSystem</code> implementations. Hadoop 2.x uses
<code>java.util.ServiceLoader</code> which looks for
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> files defining
available filesystem types and implementations. These resources are not
available when running inside OSGi.</p><p>As with <code>camel-hdfs</code>
component, the default configuration files need to be visible from the bundle
class loader. A typical way to deal with it is to keep a copy
of <code>core-default.xml</code> (and e.g., <code>hdfs-default.xml</code>)
in your bundle root.</p><h4
id="HDFS2-Usingthiscomponentwithmanuallydefinedroutes">Using this component
with manually defined routes</h4><p>There are two options:</p><ol><li>Package
<code>/META-INF/services/org.apache.hadoop.fs.FileSystem</code> resource with
bundle that defines the routes. This resource should list all the required
Hadoop 2.x filesystem implementations.</li><li
>Provide boilerplate initialization code which populates internal, static
>cache inside <code>org.apache.hadoop.fs.FileSystem</code>
>class:</li></ol><div class="code panel pdl" style="border-width: 1px;"><div
>class="codeContent panelContent pdl">
<script class="theme: Default; brush: java; gutter: false"
type="syntaxhighlighter"><![CDATA[org.apache.hadoop.conf.Configuration conf =
new org.apache.hadoop.conf.Configuration();
conf.setClass("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class, FileSystem.class);
conf.setClass("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class, FileSystem.class);