[14/15] git commit: Merge branch '1.5.2-SNAPSHOT' into 1.6.1-SNAPSHOT

elserj Tue, 22 Jul 2014 22:33:13 -0700

Merge branch '1.5.2-SNAPSHOT' into 1.6.1-SNAPSHOT

Conflicts:
        docs/src/main/resources/examples/README.mapred
        docs/src/main/resources/examples/README.maxmutation



Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/d7c1125d
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/d7c1125d
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/d7c1125d

Branch: refs/heads/1.6.1-SNAPSHOT
Commit: d7c1125d3d8a8101e121f112303762a65c30f7da
Parents: e8916f1 9f3cbb3
Author: Josh Elser <els...@apache.org>
Authored: Wed Jul 23 01:08:37 2014 -0400
Committer: Josh Elser <els...@apache.org>
Committed: Wed Jul 23 01:08:37 2014 -0400

----------------------------------------------------------------------
 docs/src/main/resources/examples/README.batch       |  2 +-
 docs/src/main/resources/examples/README.bloom       |  8 ++++----
 docs/src/main/resources/examples/README.maxmutation | 12 +++++++-----
 docs/src/main/resources/examples/README.regex       |  3 +--
 4 files changed, 13 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/d7c1125d/docs/src/main/resources/examples/README.batch
----------------------------------------------------------------------
diff --cc docs/src/main/resources/examples/README.batch
index 05f2304,0000000..463481b
mode 100644,000000..100644
--- a/docs/src/main/resources/examples/README.batch
+++ b/docs/src/main/resources/examples/README.batch
@@@ -1,55 -1,0 +1,55 @@@
 +Title: Apache Accumulo Batch Writing and Scanning Example
 +Notice:    Licensed to the Apache Software Foundation (ASF) under one
 +           or more contributor license agreements.  See the NOTICE file
 +           distributed with this work for additional information
 +           regarding copyright ownership.  The ASF licenses this file
 +           to you under the Apache License, Version 2.0 (the
 +           "License"); you may not use this file except in compliance
 +           with the License.  You may obtain a copy of the License at
 +           .
 +             http://www.apache.org/licenses/LICENSE-2.0
 +           .
 +           Unless required by applicable law or agreed to in writing,
 +           software distributed under the License is distributed on an
 +           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 +           KIND, either express or implied.  See the License for the
 +           specific language governing permissions and limitations
 +           under the License.
 +
 +This tutorial uses the following Java classes, which can be found in 
org.apache.accumulo.examples.simple.client in the examples-simple module:
 +
 + * SequentialBatchWriter.java - writes mutations with sequential rows and 
random values
 + * RandomBatchWriter.java - used by SequentialBatchWriter to generate random 
values
 + * RandomBatchScanner.java - reads random rows and verifies their values
 +
 +This is an example of how to use the batch writer and batch scanner. To 
compile
 +the example, run maven and copy the produced jar into the accumulo lib dir.
 +This is already done in the tar distribution.
 +
 +Below are commands that add 10000 entries to accumulo and then do 100 random
 +queries. The write command generates random 50 byte values.
 +
 +Be sure to use the name of your instance (given as instance here) and the 
appropriate
 +list of zookeeper nodes (given as zookeepers here).
 +
 +Before you run this, you must ensure that the user you are running has the
 +"exampleVis" authorization. (you can set this in the shell with "setauths -u 
username -s exampleVis")
 +
 +    $ ./bin/accumulo shell -u root -e "setauths -u username -s exampleVis"
 +
 +You must also create the table, batchtest1, ahead of time. (In the shell, use 
"createtable batchtest1")
 +
 +    $ ./bin/accumulo shell -u username -e "createtable batchtest1"
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.SequentialBatchWriter -i instance -z 
zookeepers -u username -p password -t batchtest1 --start 0 --num 10000 --size 
50 --batchMemory 20M --batchLatency 500 --batchThreads 20 --vis exampleVis
-     $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner -i instance -z 
zookeepers -u username -p password -t batchtest1 --num 100 --min 0 --max 10000 
--size 50 --scanThreads 20 --vis exampleVis
++    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner -i instance -z 
zookeepers -u username -p password -t batchtest1 --num 100 --min 0 --max 10000 
--size 50 --scanThreads 20 --auths exampleVis
 +    07 11:33:11,103 [client.CountingVerifyingReceiver] INFO : Generating 100 
random queries...
 +    07 11:33:11,112 [client.CountingVerifyingReceiver] INFO : finished
 +    07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : 694.44 
lookups/sec   0.14 secs
 +
 +    07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : num results : 
100
 +
 +    07 11:33:11,364 [client.CountingVerifyingReceiver] INFO : Generating 100 
random queries...
 +    07 11:33:11,370 [client.CountingVerifyingReceiver] INFO : finished
 +    07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : 2173.91 
lookups/sec   0.05 secs
 +
 +    07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : num results : 
100

http://git-wip-us.apache.org/repos/asf/accumulo/blob/d7c1125d/docs/src/main/resources/examples/README.bloom
----------------------------------------------------------------------
diff --cc docs/src/main/resources/examples/README.bloom
index 6fe4602,0000000..555f06d
mode 100644,000000..100644
--- a/docs/src/main/resources/examples/README.bloom
+++ b/docs/src/main/resources/examples/README.bloom
@@@ -1,219 -1,0 +1,219 @@@
 +Title: Apache Accumulo Bloom Filter Example
 +Notice:    Licensed to the Apache Software Foundation (ASF) under one
 +           or more contributor license agreements.  See the NOTICE file
 +           distributed with this work for additional information
 +           regarding copyright ownership.  The ASF licenses this file
 +           to you under the Apache License, Version 2.0 (the
 +           "License"); you may not use this file except in compliance
 +           with the License.  You may obtain a copy of the License at
 +           .
 +             http://www.apache.org/licenses/LICENSE-2.0
 +           .
 +           Unless required by applicable law or agreed to in writing,
 +           software distributed under the License is distributed on an
 +           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 +           KIND, either express or implied.  See the License for the
 +           specific language governing permissions and limitations
 +           under the License.
 +
 +This example shows how to create a table with bloom filters enabled.  It also
 +shows how bloom filters increase query performance when looking for values 
that
 +do not exist in a table.
 +
 +Below table named bloom_test is created and bloom filters are enabled.
 +
 +    $ ./bin/accumulo shell -u username -p password
 +    Shell - Apache Accumulo Interactive Shell
 +    - version: 1.5.0
 +    - instance name: instance
 +    - instance id: 00000000-0000-0000-0000-000000000000
 +    -
 +    - type 'help' for a list of available commands
 +    -
 +    username@instance> setauths -u username -s exampleVis
 +    username@instance> createtable bloom_test
 +    username@instance bloom_test> config -t bloom_test -s 
table.bloom.enabled=true
 +    username@instance bloom_test> exit
 +
 +Below 1 million random values are inserted into accumulo. The randomly
 +generated rows range between 0 and 1 billion. The random number generator is
 +initialized with the seed 7.
 +
-     $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test --num 1000000 -min 
0 -max 1000000000 -valueSize 50 -batchMemory 2M -batchLatency 60s -batchThreads 
3 --vis exampleVis
++    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test --num 1000000 
--min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s 
--batchThreads 3 --vis exampleVis
 +
 +Below the table is flushed:
 +
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test -w'
 +    05 10:40:06,069 [shell.Shell] INFO : Flush of table bloom_test completed.
 +
 +After the flush completes, 500 random queries are done against the table. The
 +same seed is used to generate the queries, therefore everything is found in 
the
 +table.
 +
-     $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 
--max 1000000000 --size 50 -batchThreads 20 --vis exampleVis
++    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 
--max 1000000000 --size 50 --scanThreads 20 --auths exampleVis
 +    Generating 500 random queries...finished
 +    96.19 lookups/sec   5.20 secs
 +    num results : 500
 +    Generating 500 random queries...finished
 +    102.35 lookups/sec   4.89 secs
 +    num results : 500
 +
 +Below another 500 queries are performed, using a different seed which results
 +in nothing being found. In this case the lookups are much faster because of
 +the bloom filters.
 +
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 8 -i 
instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 
--max 1000000000 --size 50 -batchThreads 20 -auths exampleVis
 +    Generating 500 random queries...finished
 +    2212.39 lookups/sec   0.23 secs
 +    num results : 0
 +    Did not find 500 rows
 +    Generating 500 random queries...finished
 +    4464.29 lookups/sec   0.11 secs
 +    num results : 0
 +    Did not find 500 rows
 +
 
+********************************************************************************
 +
 +Bloom filters can also speed up lookups for entries that exist. In accumulo
 +data is divided into tablets and each tablet has multiple map files. Every
 +lookup in accumulo goes to a specific tablet where a lookup is done on each
 +map file in the tablet. So if a tablet has three map files, lookup performance
 +can be three times slower than a tablet with one map file. However if the map
 +files contain unique sets of data, then bloom filters can help eliminate map
 +files that do not contain the row being looked up. To illustrate this two
 +identical tables were created using the following process. One table had bloom
 +filters, the other did not. Also the major compaction ratio was increased to
 +prevent the files from being compacted into one file.
 +
 + * Insert 1 million entries using  RandomBatchWriter with a seed of 7
 + * Flush the table using the shell
 + * Insert 1 million entries using  RandomBatchWriter with a seed of 8
 + * Flush the table using the shell
 + * Insert 1 million entries using  RandomBatchWriter with a seed of 9
 + * Flush the table using the shell
 +
 +After following the above steps, each table will have a tablet with three map
 +files. Flushing the table after each batch of inserts will create a map file.
 +Each map file will contain 1 million entries generated with a different seed.
 +This is assuming that Accumulo is configured with enough memory to hold 1
 +million inserts. If not, then more map files will be created.
 +
 +The commands for creating the first table without bloom filters are below.
 +
 +    $ ./bin/accumulo shell -u username -p password
 +    Shell - Apache Accumulo Interactive Shell
 +    - version: 1.5.0
 +    - instance name: instance
 +    - instance id: 00000000-0000-0000-0000-000000000000
 +    -
 +    - type 'help' for a list of available commands
 +    -
 +    username@instance> setauths -u username -s exampleVis
 +    username@instance> createtable bloom_test1
 +    username@instance bloom_test1> config -t bloom_test1 -s 
table.compaction.major.ratio=7
 +    username@instance bloom_test1> exit
 +
-     $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test1 
--num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M 
--batchLatency 60s --batchThreads 3 --auths exampleVis"
++    $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test1 
--num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M 
--batchLatency 60s --batchThreads 3 --vis exampleVis"
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 
-w'
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 
-w'
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 
-w'
 +
 +The commands for creating the second table with bloom filers are below.
 +
 +    $ ./bin/accumulo shell -u username -p password
 +    Shell - Apache Accumulo Interactive Shell
 +    - version: 1.5.0
 +    - instance name: instance
 +    - instance id: 00000000-0000-0000-0000-000000000000
 +    -
 +    - type 'help' for a list of available commands
 +    -
 +    username@instance> setauths -u username -s exampleVis
 +    username@instance> createtable bloom_test2
 +    username@instance bloom_test2> config -t bloom_test2 -s 
table.compaction.major.ratio=7
 +    username@instance bloom_test2> config -t bloom_test2 -s 
table.bloom.enabled=true
 +    username@instance bloom_test2> exit
 +
-     $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test2 
--num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M 
--batchLatency 60s --batchThreads 3 --auths exampleVis"
++    $ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test2 
--num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M 
--batchLatency 60s --batchThreads 3 --vis exampleVis"
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 
-w'
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 
-w'
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
 +    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 
-w'
 +
 +Below 500 lookups are done against the table without bloom filters using 
random
 +NG seed 7. Even though only one map file will likely contain entries for this
 +seed, all map files will be interrogated.
 +
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test1 --num 500 --min 0 
--max 1000000000 --size 50 --scanThreads 20 --auths exampleVis
 +    Generating 500 random queries...finished
 +    35.09 lookups/sec  14.25 secs
 +    num results : 500
 +    Generating 500 random queries...finished
 +    35.33 lookups/sec  14.15 secs
 +    num results : 500
 +
 +Below the same lookups are done against the table with bloom filters. The
 +lookups were 2.86 times faster because only one map file was used, even 
though three
 +map files existed.
 +
 +    $ ./bin/accumulo 
org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i 
instance -z zookeepers -u username -p password -t bloom_test2 --num 500 --min 0 
--max 1000000000 --size 50 -scanThreads 20 --auths exampleVis
 +    Generating 500 random queries...finished
 +    99.03 lookups/sec   5.05 secs
 +    num results : 500
 +    Generating 500 random queries...finished
 +    101.15 lookups/sec   4.94 secs
 +    num results : 500
 +
 +You can verify the table has three files by looking in HDFS. To look in HDFS
 +you will need the table ID, because this is used in HDFS instead of the table
 +name. The following command will show table ids.
 +
 +    $ ./bin/accumulo shell -u username -p password -e 'tables -l'
 +    accumulo.metadata    =>        !0
 +    accumulo.root        =>        +r
 +    bloom_test1          =>        o7
 +    bloom_test2          =>        o8
 +    trace                =>         1
 +
 +So the table id for bloom_test2 is o8. The command below shows what files this
 +table has in HDFS. This assumes Accumulo is at the default location in HDFS.
 +
 +    $ hadoop fs -lsr /accumulo/tables/o8
 +    drwxr-xr-x   - username supergroup          0 2012-01-10 14:02 
/accumulo/tables/o8/default_tablet
 +    -rw-r--r--   3 username supergroup   52672650 2012-01-10 14:01 
/accumulo/tables/o8/default_tablet/F00000dj.rf
 +    -rw-r--r--   3 username supergroup   52436176 2012-01-10 14:01 
/accumulo/tables/o8/default_tablet/F00000dk.rf
 +    -rw-r--r--   3 username supergroup   52850173 2012-01-10 14:02 
/accumulo/tables/o8/default_tablet/F00000dl.rf
 +
 +Running the rfile-info command shows that one of the files has a bloom filter
 +and its 1.5MB.
 +
 +    $ ./bin/accumulo rfile-info /accumulo/tables/o8/default_tablet/F00000dj.rf
 +    Locality group         : <DEFAULT>
 +      Start block          : 0
 +      Num   blocks         : 752
 +      Index level 0        : 43,598 bytes  1 blocks
 +      First key            : row_0000001169 foo:1 [exampleVis] 1326222052539 
false
 +      Last key             : row_0999999421 foo:1 [exampleVis] 1326222052058 
false
 +      Num entries          : 999,536
 +      Column families      : [foo]
 +
 +    Meta block     : BCFile.index
 +      Raw size             : 4 bytes
 +      Compressed size      : 12 bytes
 +      Compression type     : gz
 +
 +    Meta block     : RFile.index
 +      Raw size             : 43,696 bytes
 +      Compressed size      : 15,592 bytes
 +      Compression type     : gz
 +
 +    Meta block     : acu_bloom
 +      Raw size             : 1,540,292 bytes
 +      Compressed size      : 1,433,115 bytes
 +      Compression type     : gz
 +

http://git-wip-us.apache.org/repos/asf/accumulo/blob/d7c1125d/docs/src/main/resources/examples/README.maxmutation
----------------------------------------------------------------------
diff --cc docs/src/main/resources/examples/README.maxmutation
index 7fb3e08,0000000..45b80d4
mode 100644,000000..100644
--- a/docs/src/main/resources/examples/README.maxmutation
+++ b/docs/src/main/resources/examples/README.maxmutation
@@@ -1,47 -1,0 +1,49 @@@
 +Title: Apache Accumulo MaxMutation Constraints Example
 +Notice:    Licensed to the Apache Software Foundation (ASF) under one
 +           or more contributor license agreements.  See the NOTICE file
 +           distributed with this work for additional information
 +           regarding copyright ownership.  The ASF licenses this file
 +           to you under the Apache License, Version 2.0 (the
 +           "License"); you may not use this file except in compliance
 +           with the License.  You may obtain a copy of the License at
 +           .
 +             http://www.apache.org/licenses/LICENSE-2.0
 +           .
 +           Unless required by applicable law or agreed to in writing,
 +           software distributed under the License is distributed on an
 +           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 +           KIND, either express or implied.  See the License for the
 +           specific language governing permissions and limitations
 +           under the License.
 +
 +This an example of how to limit the size of mutations that will be accepted 
into
 +a table. Under the default configuration, accumulo does not provide a 
limitation
 +on the size of mutations that can be ingested. Poorly behaved writers might
 +inadvertently create mutations so large, that they cause the tablet servers to
 +run out of memory. A simple contraint can be added to a table to reject very
 +large mutations.
 +
 +    $ ./bin/accumulo shell -u username -p password
 +
 +    Shell - Apache Accumulo Interactive Shell
 +    -
 +    - version: 1.5.0
 +    - instance name: instance
 +    - instance id: 00000000-0000-0000-0000-000000000000
 +    -
 +    - type 'help' for a list of available commands
 +    -
 +    username@instance> createtable test_ingest
 +    username@instance test_ingest> config -t test_ingest -s 
table.constraint.1=org.apache.accumulo.examples.simple.constraints.MaxMutationSize
 +    username@instance test_ingest>
 +
 +
- Now the table will reject any mutation that is larger than 1/256th of the
- working memory of the tablet server. The following command attempts to ingest
- a single row with 10000 columns, which exceeds the memory limit:
++Now the table will reject any mutation that is larger than 1/256th of the 
++working memory of the tablet server.  The following command attempts to 
ingest 
++a single row with 10000 columns, which exceeds the memory limit. Depending on 
the
++amount of Java heap your tserver(s) are given, you may have to increase the 
number
++of columns provided to see the failure.
 +
-     $ ./bin/accumulo org.apache.accumulo.test.TestIngest -i instance -z 
zookeepers -u username -p password --rows 1 --cols 10000
- ERROR : Constraint violates : 
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.MaxMutationSize,
 violationCode:0, violationDescription:mutation exceeded maximum size of 
188160, numberOfViolatingMutations:1)
++    $ ./bin/accumulo org.apache.accumulo.test.TestIngest -i instance -z 
zookeepers -u username -p password --rows 1 --cols 10000 
++    ERROR : Constraint violates : 
ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.MaxMutationSize,
 violationCode:0, violationDescription:mutation exceeded maximum size of 
188160, numberOfViolatingMutations:1)
 +

http://git-wip-us.apache.org/repos/asf/accumulo/blob/d7c1125d/docs/src/main/resources/examples/README.regex
----------------------------------------------------------------------
diff --cc docs/src/main/resources/examples/README.regex
index a5cc854,0000000..ea9f208
mode 100644,000000..100644
--- a/docs/src/main/resources/examples/README.regex
+++ b/docs/src/main/resources/examples/README.regex
@@@ -1,58 -1,0 +1,57 @@@
 +Title: Apache Accumulo Regex Example
 +Notice:    Licensed to the Apache Software Foundation (ASF) under one
 +           or more contributor license agreements.  See the NOTICE file
 +           distributed with this work for additional information
 +           regarding copyright ownership.  The ASF licenses this file
 +           to you under the Apache License, Version 2.0 (the
 +           "License"); you may not use this file except in compliance
 +           with the License.  You may obtain a copy of the License at
 +           .
 +             http://www.apache.org/licenses/LICENSE-2.0
 +           .
 +           Unless required by applicable law or agreed to in writing,
 +           software distributed under the License is distributed on an
 +           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 +           KIND, either express or implied.  See the License for the
 +           specific language governing permissions and limitations
 +           under the License.
 +
 +This example uses mapreduce and accumulo to find items using regular 
expressions.
 +This is accomplished using a map-only mapreduce job and a scan-time iterator.
 +
 +To run this example you will need some data in a table. The following will
 +put a trivial amount of data into accumulo using the accumulo shell:
 +
 +    $ ./bin/accumulo shell -u username -p password
 +    Shell - Apache Accumulo Interactive Shell
 +    - version: 1.5.0
 +    - instance name: instance
 +    - instance id: 00000000-0000-0000-0000-000000000000
 +    -
 +    - type 'help' for a list of available commands
 +    -
 +    username@instance> createtable input
 +    username@instance> insert dogrow dogcf dogcq dogvalue
 +    username@instance> insert catrow catcf catcq catvalue
 +    username@instance> quit
 +
 +The RegexExample class sets an iterator on the scanner. This does pattern 
matching
 +against each key/value in accumulo, and only returns matching items. It will 
do this
 +in parallel and will store the results in files in hdfs.
 +
 +The following will search for any rows in the input table that starts with 
"dog":
 +
 +    $ bin/tool.sh lib/accumulo-examples-simple.jar 
org.apache.accumulo.examples.simple.mapreduce.RegexExample -u user -p passwd -i 
instance -t input --rowRegex 'dog.*' --output /tmp/output
 +
 +    $ hadoop fs -ls /tmp/output
 +    Found 3 items
 +    -rw-r--r--   1 username supergroup          0 2013-01-10 14:11 
/tmp/output/_SUCCESS
 +    drwxr-xr-x   - username supergroup          0 2013-01-10 14:10 
/tmp/output/_logs
 +    -rw-r--r--   1 username supergroup         51 2013-01-10 14:10 
/tmp/output/part-m-00000
 +
 +We can see the output of our little map-reduce job:
 +
-     $ hadoop fs -text /tmp/output/output/part-m-00000
++    $ hadoop fs -text /tmp/output/part-m-00000
 +    dogrow dogcf:dogcq [] 1357844987994 false dogvalue
-     $
 +
 +

[14/15] git commit: Merge branch '1.5.2-SNAPSHOT' into 1.6.1-SNAPSHOT

Reply via email to