[09/16] accumulo-website git commit: ACCUMULO-4630 Refactored documentation for website

mwalch Mon, 22 May 2017 10:30:09 -0700

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/design.md
----------------------------------------------------------------------
diff --git a/docs/master/design.md b/docs/master/design.md
deleted file mode 100644
index 6c77cb6..0000000
--- a/docs/master/design.md
+++ /dev/null
@@ -1,180 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Accumulo Design
-
-=== Data Model
-
-Accumulo provides a richer data model than simple key-value stores, but is not 
a
-fully relational database. Data is represented as key-value pairs, where the 
key and
-value are comprised of the following elements:
-
-[width="75%",cols="^,^,^,^,^,^"]
-|===========================================================================
-    5+|Key                                                      .3+^.^|Value
-.2+^.^|Row ID 3+|Column                        .2+^.^|Timestamp
-                |Family |Qualifier |Visibility
-|===========================================================================
-
-All elements of the Key and the Value are represented as byte arrays except for
-Timestamp, which is a Long. Accumulo sorts keys by element and 
lexicographically
-in ascending order. Timestamps are sorted in descending order so that later
-versions of the same Key appear first in a sequential scan. Tables consist of 
a set of
-sorted key-value pairs.
-
-=== Architecture
-
-Accumulo is a distributed data storage and retrieval system and as such 
consists of
-several architectural components, some of which run on many individual servers.
-Much of the work Accumulo does involves maintaining certain properties of the
-data, such as organization, availability, and integrity, across many 
commodity-class
-machines.
-
-=== Components
-
-An instance of Accumulo includes many TabletServers, one Garbage Collector 
process,
-one Master server and many Clients.
-
-==== Tablet Server
-
-The TabletServer manages some subset of all the tablets (partitions of 
tables). This includes receiving writes from clients, persisting writes to a
-write-ahead log, sorting new key-value pairs in memory, periodically
-flushing sorted key-value pairs to new files in HDFS, and responding
-to reads from clients, forming a merge-sorted view of all keys and
-values from all the files it has created and the sorted in-memory
-store.
-
-TabletServers also perform recovery of a tablet
-that was previously on a server that failed, reapplying any writes
-found in the write-ahead log to the tablet.
-
-==== Garbage Collector
-
-Accumulo processes will share files stored in HDFS. Periodically, the Garbage
-Collector will identify files that are no longer needed by any process, and
-delete them. Multiple garbage collectors can be run to provide hot-standby 
support.
-They will perform leader election among themselves to choose a single active 
instance.
-
-==== Master
-
-The Accumulo Master is responsible for detecting and responding to TabletServer
-failure. It tries to balance the load across TabletServer by assigning tablets 
carefully
-and instructing TabletServers to unload tablets when necessary. The Master 
ensures all
-tablets are assigned to one TabletServer each, and handles table creation, 
alteration,
-and deletion requests from clients. The Master also coordinates startup, 
graceful
-shutdown and recovery of changes in write-ahead logs when Tablet servers fail.
-
-Multiple masters may be run. The masters will choose among themselves a single 
master,
-and the others will become backups if the master should fail.
-
-==== Tracer
-
-The Accumulo Tracer process supports the distributed timing API provided by 
Accumulo.
-One to many of these processes can be run on a cluster which will write the 
timing
-information to a given Accumulo table for future reference. Seeing the section 
on
-Tracing for more information on this support.
-
-==== Monitor
-
-The Accumulo Monitor is a web application that provides a wealth of 
information about
-the state of an instance. The Monitor shows graphs and tables which contain 
information
-about read/write rates, cache hit/miss rates, and Accumulo table information 
such as scan
-rate and active/queued compactions. Additionally, the Monitor should always be 
the first
-point of entry when attempting to debug an Accumulo problem as it will show 
high-level problems
-in addition to aggregated errors from all nodes in the cluster. See the 
section on <<monitoring>>
-for more information.
-
-Multiple Monitors can be run to provide hot-standby support in the face of 
failure. Due to the
-forwarding of logs from remote hosts to the Monitor, only one Monitor process 
should be active
-at one time. Leader election will be performed internally to choose the active 
Monitor.
-
-==== Client
-
-Accumulo includes a client library that is linked to every application. The 
client
-library contains logic for finding servers managing a particular tablet, and
-communicating with TabletServers to write and retrieve key-value pairs.
-
-=== Data Management
-
-Accumulo stores data in tables, which are partitioned into tablets. Tablets are
-partitioned on row boundaries so that all of the columns and values for a 
particular
-row are found together within the same tablet. The Master assigns Tablets to 
one
-TabletServer at a time. This enables row-level transactions to take place 
without
-using distributed locking or some other complicated synchronization mechanism. 
As
-clients insert and query data, and as machines are added and removed from the
-cluster, the Master migrates tablets to ensure they remain available and that 
the
-ingest and query load is balanced across the cluster.
-
-image::data_distribution.png[width=500]
-
-=== Tablet Service
-
-
-When a write arrives at a TabletServer it is written to a Write-Ahead Log and
-then inserted into a sorted data structure in memory called a MemTable. When 
the
-MemTable reaches a certain size, the TabletServer writes out the sorted
-key-value pairs to a file in HDFS called a Relative Key File (RFile), which is 
a
-kind of Indexed Sequential Access Method (ISAM) file. This process is called a
-minor compaction. A new MemTable is then created and the fact of the compaction
-is recorded in the Write-Ahead Log.
-
-When a request to read data arrives at a TabletServer, the TabletServer does a
-binary search across the MemTable as well as the in-memory indexes associated
-with each RFile to find the relevant values. If clients are performing a scan,
-several key-value pairs are returned to the client in order from the MemTable
-and the set of RFiles by performing a merge-sort as they are read.
-
-=== Compactions
-
-In order to manage the number of files per tablet, periodically the 
TabletServer
-performs Major Compactions of files within a tablet, in which some set of 
RFiles
-are combined into one file. The previous files will eventually be removed by 
the
-Garbage Collector. This also provides an opportunity to permanently remove
-deleted key-value pairs by omitting key-value pairs suppressed by a delete 
entry
-when the new file is created.
-
-=== Splitting
-
-When a table is created it has one tablet. As the table grows its initial
-tablet eventually splits into two tablets. Its likely that one of these
-tablets will migrate to another tablet server. As the table continues to grow,
-its tablets will continue to split and be migrated. The decision to
-automatically split a tablet is based on the size of a tablets files. The
-size threshold at which a tablet splits is configurable per table. In addition
-to automatic splitting, a user can manually add split points to a table to
-create new tablets. Manually splitting a new table can parallelize reads and
-writes giving better initial performance without waiting for automatic
-splitting.
-
-As data is deleted from a table, tablets may shrink. Over time this can lead
-to small or empty tablets. To deal with this, merging of tablets was
-introduced in Accumulo 1.4. This is discussed in more detail later.
-
-=== Fault-Tolerance
-
-If a TabletServer fails, the Master detects it and automatically reassigns the 
tablets
-assigned from the failed server to other servers. Any key-value pairs that 
were in
-memory at the time the TabletServer fails are automatically reapplied from the 
Write-Ahead
-Log(WAL) to prevent any loss of data.
-
-Tablet servers write their WALs directly to HDFS so the logs are available to 
all tablet
-servers for recovery. To make the recovery process efficient, the updates 
within a log are
-grouped by tablet.  TabletServers can quickly apply the mutations from the 
sorted logs
-that are destined for the tablets they have now been assigned.
-
-TabletServer failures are noted on the Master's monitor page, accessible via
-+http://master-address:9995/monitor+.
-
-image::failure_handling.png[width=500]


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/development_clients.md
----------------------------------------------------------------------
diff --git a/docs/master/development_clients.md 
b/docs/master/development_clients.md
deleted file mode 100644
index 18821e3..0000000
--- a/docs/master/development_clients.md
+++ /dev/null
@@ -1,107 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Development Clients
-
-Normally, Accumulo consists of lots of moving parts. Even a stand-alone 
version of
-Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, 
etc. If
-you want to write a unit test that uses Accumulo, you need a lot of 
infrastructure
-in place before your test can run.
-
-=== Mock Accumulo
-
-Mock Accumulo supplies mock implementations for much of the client API. It 
presently
-does not enforce users, logins, permissions, etc. It does support Iterators 
and Combiners.
-Note that MockAccumulo holds all data in memory, and will not retain any data 
or
-settings between runs.
-
-While normal interaction with the Accumulo client looks like this:
-
-[source,java]
-Instance instance = new ZooKeeperInstance(...);
-Connector conn = instance.getConnector(user, passwordToken);
-
-To interact with the MockAccumulo, just replace the ZooKeeperInstance with 
MockInstance:
-
-[source,java]
-Instance instance = new MockInstance();
-
-In fact, you can use the +--fake+ option to the Accumulo shell and interact 
with
-MockAccumulo:
-
-----
-$ accumulo shell --fake -u root -p ''
-
-Shell - Apache Accumulo Interactive Shell
--
-- version: 2.x.x
-- instance name: fake
-- instance id: mock-instance-id
--
-- type 'help' for a list of available commands
--
-
-root@fake> createtable test
-
-root@fake test> insert row1 cf cq value
-root@fake test> insert row2 cf cq value2
-root@fake test> insert row3 cf cq value3
-
-root@fake test> scan
-row1 cf:cq []    value
-row2 cf:cq []    value2
-row3 cf:cq []    value3
-
-root@fake test> scan -b row2 -e row2
-row2 cf:cq []    value2
-
-root@fake test>
-----
-
-When testing Map Reduce jobs, you can also set the Mock Accumulo on the 
AccumuloInputFormat
-and AccumuloOutputFormat classes:
-
-[source,java]
-// ... set up job configuration
-AccumuloInputFormat.setMockInstance(job, "mockInstance");
-AccumuloOutputFormat.setMockInstance(job, "mockInstance");
-
-=== Mini Accumulo Cluster
-
-While the Mock Accumulo provides a lightweight implementation of the client 
API for unit
-testing, it is often necessary to write more realistic end-to-end integration 
tests that
-take advantage of the entire ecosystem. The Mini Accumulo Cluster makes this 
possible by
-configuring and starting Zookeeper, initializing Accumulo, and starting the 
Master as well
-as some Tablet Servers. It runs against the local filesystem instead of having 
to start
-up HDFS.
-
-To start it up, you will need to supply an empty directory and a root password 
as arguments:
-
-[source,java]
-File tempDirectory = // JUnit and Guava supply mechanisms for creating temp 
directories
-MiniAccumuloCluster accumulo = new MiniAccumuloCluster(tempDirectory, 
"password");
-accumulo.start();
-
-Once we have our mini cluster running, we will want to interact with the 
Accumulo client API:
-
-[source,java]
-Instance instance = new ZooKeeperInstance(accumulo.getInstanceName(), 
accumulo.getZooKeepers());
-Connector conn = instance.getConnector("root", new PasswordToken("password"));
-
-Upon completion of our development code, we will want to shutdown our 
MiniAccumuloCluster:
-
-[source,java]
-accumulo.stop();
-// delete your temporary folder

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/high_speed_ingest.md
----------------------------------------------------------------------
diff --git a/docs/master/high_speed_ingest.md b/docs/master/high_speed_ingest.md
deleted file mode 100644
index 2a7a702..0000000
--- a/docs/master/high_speed_ingest.md
+++ /dev/null
@@ -1,124 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== High-Speed Ingest
-
-Accumulo is often used as part of a larger data processing and storage system. 
To
-maximize the performance of a parallel system involving Accumulo, the ingestion
-and query components should be designed to provide enough parallelism and
-concurrency to avoid creating bottlenecks for users and other systems writing 
to
-and reading from Accumulo. There are several ways to achieve high ingest
-performance.
-
-=== Pre-Splitting New Tables
-
-New tables consist of a single tablet by default. As mutations are applied, 
the table
-grows and splits into multiple tablets which are balanced by the Master across
-TabletServers. This implies that the aggregate ingest rate will be limited to 
fewer
-servers than are available within the cluster until the table has reached the 
point
-where there are tablets on every TabletServer.
-
-Pre-splitting a table ensures that there are as many tablets as desired 
available
-before ingest begins to take advantage of all the parallelism possible with 
the cluster
-hardware. Tables can be split at any time by using the shell:
-
-  user@myinstance mytable> addsplits -sf /local_splitfile -t mytable
-
-For the purposes of providing parallelism to ingest it is not necessary to 
create more
-tablets than there are physical machines within the cluster as the aggregate 
ingest
-rate is a function of the number of physical machines. Note that the aggregate 
ingest
-rate is still subject to the number of machines running ingest clients, and the
-distribution of rowIDs across the table. The aggregation ingest rate will be
-suboptimal if there are many inserts into a small number of rowIDs.
-
-=== Multiple Ingester Clients
-
-Accumulo is capable of scaling to very high rates of ingest, which is 
dependent upon
-not just the number of TabletServers in operation but also the number of ingest
-clients. This is because a single client, while capable of batching mutations 
and
-sending them to all TabletServers, is ultimately limited by the amount of data 
that
-can be processed on a single machine. The aggregate ingest rate will scale 
linearly
-with the number of clients up to the point at which either the aggregate I/O of
-TabletServers or total network bandwidth capacity is reached.
-
-In operational settings where high rates of ingest are paramount, clusters are 
often
-configured to dedicate some number of machines solely to running Ingester 
Clients.
-The exact ratio of clients to TabletServers necessary for optimum ingestion 
rates
-will vary according to the distribution of resources per machine and by data 
type.
-
-=== Bulk Ingest
-
-Accumulo supports the ability to import files produced by an external process 
such
-as MapReduce into an existing table. In some cases it may be faster to load 
data this
-way rather than via ingesting through clients using BatchWriters. This allows 
a large
-number of machines to format data the way Accumulo expects. The new files can
-then simply be introduced to Accumulo via a shell command.
-
-To configure MapReduce to format data in preparation for bulk loading, the job
-should be set to use a range partitioner instead of the default hash 
partitioner. The
-range partitioner uses the split points of the Accumulo table that will 
receive the
-data. The split points can be obtained from the shell and used by the MapReduce
-RangePartitioner. Note that this is only useful if the existing table is 
already split
-into multiple tablets.
-
-  user@myinstance mytable> getsplits
-  aa
-  ab
-  ac
-  ...
-  zx
-  zy
-  zz
-
-Run the MapReduce job, using the AccumuloFileOutputFormat to create the files 
to
-be introduced to Accumulo. Once this is complete, the files can be added to
-Accumulo via the shell:
-
-  user@myinstance mytable> importdirectory /files_dir /failures
-
-Note that the paths referenced are directories within the same HDFS instance 
over
-which Accumulo is running. Accumulo places any files that failed to be added 
to the
-second directory specified.
-
-See the 
https://github.com/apache/accumulo-examples/blob/master/docs/bulkIngest.md[Bulk 
Ingest example]
-for a complete example.
-
-=== Logical Time for Bulk Ingest
-
-Logical time is important for bulk imported data, for which the client code may
-be choosing a timestamp. At bulk import time, the user can choose to enable
-logical time for the set of files being imported. When its enabled, Accumulo
-uses a specialized system iterator to lazily set times in a bulk imported file.
-This mechanism guarantees that times set by unsynchronized multi-node
-applications (such as those running on MapReduce) will maintain some semblance
-of causal ordering. This mitigates the problem of the time being wrong on the
-system that created the file for bulk import. These times are not set when the
-file is imported, but whenever it is read by scans or compactions. At import, a
-time is obtained and always used by the specialized system iterator to set that
-time.
-
-The timestamp assigned by Accumulo will be the same for every key in the file.
-This could cause problems if the file contains multiple keys that are identical
-except for the timestamp. In this case, the sort order of the keys will be
-undefined. This could occur if an insert and an update were in the same bulk
-import file.
-
-=== MapReduce Ingest
-
-It is possible to efficiently write many mutations to Accumulo in parallel via 
a
-MapReduce job. In this scenario the MapReduce is written to process data that 
lives
-in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See
-the MapReduce section under Analytics for details. The 
https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md[MapReduce
 example]
-is also a good reference for example code.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/implementation.md
----------------------------------------------------------------------
diff --git a/docs/master/implementation.md b/docs/master/implementation.md
deleted file mode 100644
index 520f538..0000000
--- a/docs/master/implementation.md
+++ /dev/null
@@ -1,86 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements. See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License. You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Implementation Details
-
-=== Fault-Tolerant Executor (FATE)
-
-Accumulo must implement a number of distributed, multi-step operations to 
support
-the client API. Creating a new table is a simple example of an atomic client 
call
-which requires multiple steps in the implementation: get a unique table ID, 
configure
-default table permissions, populate information in ZooKeeper to record the 
table's
-existence, create directories in HDFS for the table's data, etc. Implementing 
these
-steps in a way that is tolerant to node failure and other concurrent 
operations is
-very difficult to achieve. Accumulo includes a Fault-Tolerant Executor (FATE) 
which
-is widely used server-side to implement the client API safely and correctly.
-
-FATE is the implementation detail which ensures that tables in creation when 
the
-Master dies will be successfully created when another Master process is 
started.
-This alleviates the need for any external tools to correct some bad state -- 
Accumulo can
-undo the failure and self-heal without any external intervention.
-
-=== Overview
-
-FATE consists of two primary components: a repeatable, persisted operation 
(REPO), a storage
-layer for REPOs and an execution system to run REPOs. Accumulo uses ZooKeeper 
as the storage
-layer for FATE and the Accumulo Master acts as the execution system to run 
REPOs.
-
-The important characteristic of REPOs are that they implemented in a way that 
is idempotent:
-every operation must be able to undo or replay a partial execution of itself. 
Requiring the
-implementation of the operation to support this functional greatly simplifies 
the execution
-of these operations. This property is also what guarantees safety in light of 
failure conditions.
-
-=== Administration
-
-Sometimes, it is useful to inspect the current FATE operations, both pending 
and executing.
-For example, a command that is not completing could be blocked on the 
execution of another
-operation. Accumulo provides an Accumulo shell command to interact with fate.
-
-The +fate+ shell command accepts a number of arguments for different 
functionality:
-+list+/+print+, +fail+, +delete+, +dump+.
-
-==== List/Print
-
-Without any additional arguments, this command will print all operations that 
still exist in
-the FATE store (ZooKeeper). This will include active, pending, and completed 
operations (completed
-operations are lazily removed from the store). Each operation includes a 
unique "transaction ID", the
-state of the operation (e.g. +NEW+, +IN_PROGRESS+, +FAILED+), any locks the
-transaction actively holds and any locks it is waiting to acquire.
-
-This option can also accept transaction IDs which will restrict the list of 
transactions shown.
-
-==== Fail
-
-This command can be used to manually fail a FATE transaction and requires a 
transaction ID
-as an argument. Failing an operation is not a normal procedure and should only 
be performed
-by an administrator who understands the implications of why they are failing 
the operation.
-
-==== Delete
-
-This command requires a transaction ID and will delete any locks that the 
transaction
-holds. Like the fail command, this command should only be used in extreme 
circumstances
-by an administrator that understands the implications of the command they are 
about to
-invoke. It is not normal to invoke this command.
-
-==== Dump
-
-This command accepts zero more transaction IDs.  If given no transaction IDs,
-it will dump all active transactions.  A FATE operations is compromised as a
-sequence of REPOs.  In order to start a FATE transaction, a REPO is pushed onto
-a per transaction REPO stack.  The top of the stack always contains the next
-REPO the FATE transaction should execute.  When a REPO is successful it may
-return another REPO which is pushed on the stack.  The +dump+ command will
-print all of the REPOs on each transactions stack.  The REPOs are serialized to
-JSON in order to make them human readable.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/introduction.md
----------------------------------------------------------------------
diff --git a/docs/master/introduction.md b/docs/master/introduction.md
deleted file mode 100644
index 1b964b4..0000000
--- a/docs/master/introduction.md
+++ /dev/null
@@ -1,25 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements.  See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License.  You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Introduction
-Apache Accumulo is a highly scalable structured store based on Google's 
BigTable.
-Accumulo is written in Java and operates over the Hadoop Distributed File 
System
-(HDFS), which is part of the popular Apache Hadoop project. Accumulo supports
-efficient storage and retrieval of structured data, including queries for 
ranges, and
-provides support for using Accumulo tables as input and output for MapReduce
-jobs.
-
-Accumulo features automatic load-balancing and partitioning, data compression
-and fine-grained security labels.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/iterator_design.md
----------------------------------------------------------------------
diff --git a/docs/master/iterator_design.md b/docs/master/iterator_design.md
deleted file mode 100644
index 4beaeb0..0000000
--- a/docs/master/iterator_design.md
+++ /dev/null
@@ -1,401 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements. See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License. You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Iterator Design
-
-Accumulo SortedKeyValueIterators, commonly referred to as Iterators for short, 
are server-side programming constructs
-that allow users to implement custom retrieval or computational purpose within 
Accumulo TabletServers.  The name rightly
-brings forward similarities to the Java Iterator interface; however, Accumulo 
Iterators are more complex than Java
-Iterators. Notably, in addition to the expected methods to retrieve the 
current element and advance to the next element
-in the iteration, Accumulo Iterators must also support the ability to "move" 
(`seek`) to an specified point in the
-iteration (the Accumulo table). Accumulo Iterators are designed to be 
concatenated together, similar to applying a
-series of transformations to a list of elements. Accumulo Iterators can 
duplicate their underlying source to create
-multiple "pointers" over the same underlying data (which is extremely powerful 
since each stream is sorted) or they can
-merge multiple Iterators into a single view. In this sense, a collection of 
Iterators operating in tandem is close to
-a tree-structure than a list, but there is always a sense of a flow of 
Key-Value pairs through some Iterators. Iterators
-are not designed to act as triggers nor are they designed to operate outside 
of the purview of a single table.
-
-Understanding how TabletServers invoke the methods on a SortedKeyValueIterator 
can be obtuse as the actual code is
-buried within the implementation of the TabletServer; however, it is generally 
unnecessary to have a strong
-understanding of this as the interface provides clear definitions about what 
each action each method should take. This
-chapter aims to provide a more detailed description of how Iterators are 
invoked, some best practices and some common
-pitfalls.
-
-=== Instantiation
-
-To invoke an Accumulo Iterator inside of the TabletServer, the Iterator class 
must be on the classpath of every
-TabletServer. For production environments, it is common to place a JAR file 
which contains the Iterator in
-`lib/`.  In development environments, it is convenient to instead place the 
JAR file in `lib/ext/` as JAR files
-in this directory are dynamically reloaded by the TabletServers alleviating 
the need to restart Accumulo while
-testing an Iterator. Advanced classloader features which enable other types of 
filesystems and per-table classpath
-configurations (as opposed to process-wide classpaths). These features are not 
covered here, but elsewhere in the user
-manual.
-
-Accumulo references the Iterator class by name and uses Java reflection to 
instantiate the Iterator. This means that
-Iterators must have a public no-args constructor.
-
-=== Interface
-
-A normal implementation of the SortedKeyValueIterator defines functionality 
for the following methods:
-
-[source,java]
-----
-void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> 
options, IteratorEnvironment env) throws IOException;
-
-boolean hasTop();
-
-void next() throws IOException;
-
-void seek(Range range, Collection<ByteSequence> columnFamilies, boolean 
inclusive) throws IOException;
-
-Key getTopKey();
-
-Value getTopValue();
-
-SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env);
-----
-
-==== `init`
-
-The `init` method is called by the TabletServer after it constructs an 
instance of the Iterator.  This method should
-clear/reset any internal state in the Iterator and prepare it to process data. 
 The first argument, the `source`, is the
-Iterator "below" this Iterator (where the client is at "top" and the Iterator 
for files in HDFS are at the "bottom").
-The "source" Iterator provides the Key-Value pairs which this Iterator will 
operate upon.
-
-The second argument, a Map of options, is made up of options provided by the 
user, options set in the table's
-configuration, and/or options set in the containing namespace's configuration.
-These options allow for Iterators to dynamically configure themselves on the 
fly. If no options are used in the current context
-(a Scan or Compaction), the Map will be empty. An example of a configuration 
item for an Iterator could be a pattern used to filter
-Key-Value pairs in a regular expression Iterator.
-
-The third argument, the `IteratorEnvironment`, is a special object which 
provides information to this Iterator about the
-context in which it was invoked. Commonly, this information is not necessary 
to inspect. For example, if an Iterator
-knows that it is running in the context of a full-major compaction (reading 
all of the data) as opposed to a user scan
-(which may strongly limit the number of columns), the Iterator might make 
different algorithmic decisions in an attempt to
-optimize itself.
-
-==== `seek`
-
-The `seek` method is likely the most confusing method on the Iterator 
interface. The purpose of this method is to
-advance the stream of Key-Value pairs to a certain point in the iteration (the 
Accumulo table). It is common that before
-the implementation of this method returns some additional processing is 
performed which may further advance the current
-position past the `startKey` of the `Range`. This, however, is dependent on 
the functionality the iterator provides. For
-example, a filtering iterator would consume a number Key-Value pairs which do 
not meets its criteria before `seek`
-returns. The important condition for `seek` to meet is that this Iterator 
should be ready to return the first Key-Value
-pair, or none if no such pair is available, when the method returns. The 
Key-Value pair would be returned by `getTopKey`
-and `getTopValue`, respectively, and `hasTop` should return a boolean denoting 
whether or not there is
-a Key-Value pair to return.
-
-The arguments passed to seek are as follows:
-
-The TabletServer first provides a `Range`, an object which defines some 
collection of Accumulo `Key`s, which defines the
-Key-Value pairs that this Iterator should return. Each `Range` has a 
`startKey` and `endKey` with an inclusive flag for
-both. While this Range is often similar to the Range(s) set by the client on a 
Scanner or BatchScanner, it is not
-guaranteed to be a Range that the client set. Accumulo will split up larger 
ranges and group them together based on
-Tablet boundaries per TabletServer. Iterators should not attempt to implement 
any custom logic based on the Range(s)
-provided to `seek` and Iterators should not return any Keys that fall outside 
of the provided Range.
-
-The second argument, a `Collection<ByteSequence>`, is the set of column 
families which should be retained or
-excluded by this Iterator. The third argument, a boolean, defines whether the 
collection of column families
-should be treated as an inclusion collection (true) or an exclusion collection 
(false).
-
-It is likely that all implementations of `seek` will first make a call to the 
`seek` method on the
-"source" Iterator that was provided in the `init` method. The collection of 
column families and
-the boolean `include` argument should be passed down as well as the `Range`. 
Somewhat commonly, the Iterator will
-also implement some sort of additional logic to find or compute the first 
Key-Value pair in the provided
-Range. For example, a regular expression Iterator would consume all records 
which do not match the given
-pattern before returning from `seek`.
-
-It is important to retain the original Range passed to this method to know 
when this Iterator should stop
-reading more Key-Value pairs. Ignoring this typically does not affect scans 
from a Scanner, but it
-will result in duplicate keys emitting from a BatchScan if the scanned table 
has more than one tablet.
-Best practice is to never emit entries outside the seek range.
-
-==== `next`
-
-The `next` method is analogous to the `next` method on a Java Iterator: this 
method should advance
-the Iterator to the next Key-Value pair. For implementations that perform some 
filtering or complex
-logic, this may result in more than one Key-Value pair being inspected. This 
method alters
-some internal state that is exposed via the `hasTop`, `getTopKey`, and 
`getTopValue` methods.
-
-The result of this method is commonly caching a Key-Value pair which 
`getTopKey` and `getTopValue`
-can later return. While there is another Key-Value pair to return, `hasTop` 
should return true.
-If there are no more Key-Value pairs to return from this Iterator since the 
last call to
-`seek`, `hasTop` should return false.
-
-==== `hasTop`
-
-The `hasTop` method is similar to the `hasNext` method on a Java Iterator in 
that it informs
-the caller if there is a Key-Value pair to be returned. If there is no pair to 
return, this method
-should return false. Like a Java Iterator, multiple calls to `hasTop` (without 
calling `next`) should not
-alter the internal state of the Iterator.
-
-==== `getTopKey` and `getTopValue`
-
-These methods simply return the current Key-Value pair for this iterator. If 
`hasTop` returns true,
-both of these methods should return non-null objects. If `hasTop` returns 
false, it is undefined
-what these methods should return. Like `hasTop`, multiple calls to these 
methods should not alter
-the state of the Iterator.
-
-Users should take caution when either
-
-1. caching the Key/Value from `getTopKey`/`getTopValue`, for use after calling 
`next` on the source iterator.
-In this case, the cached Key/Value object is aliased to the reference returned 
by the source iterator.
-Iterators may reuse the same Key/Value object in a `next` call for performance 
reasons, changing the data
-that the cached Key/Value object references and resulting in a logic bug.
-2. modifying the Key/Value from `getTopKey`/`getTopValue`. If the source 
iterator reuses data stored in the Key/Value,
-then the source iterator may use the modified data that the Key/Value 
references. This may/may not result in a logic bug.
-
-In both cases, copying the Key/Value's data into a new object ensures iterator 
correctness. If neither case applies,
-it is safe to not copy the Key/Value.  The general guideline is to be aware of 
who else may use Key/Value objects
-returned from `getTopKey`/`getTopValue`.
-
-==== `deepCopy`
-
-The `deepCopy` method is similar to the `clone` method from the Java 
`Cloneable` interface.
-Implementations of this method should return a new object of the same type as 
the Accumulo Iterator
-instance it was called on. Any internal state from the instance `deepCopy` was 
called
-on should be carried over to the returned copy. The returned copy should be 
ready to have
-`seek` called on it. The SortedKeyValueIterator interface guarantees that 
`init` will be called on
-an iterator before `deepCopy` and that `init` will not be called on the 
iterator returned by
-`deepCopy`.
-
-Typically, implementations of `deepCopy` call a copy-constructor which will 
initialize
-internal data structures. As with `seek`, it is common for the 
`IteratorEnvironment`
-argument to be ignored as most Iterator implementations can be written without 
the explicit
-information the environment provides.
-
-In the analogy of a series of Iterators representing a tree, `deepCopy` can be 
thought of as
-early programming assignments which implement their own tree data structures. 
`deepCopy` calls
-copy on its sources (the children), copies itself, attaches the copies of the 
children, and
-then returns itself.
-
-=== TabletServer invocation of Iterators
-
-The following code is a general outline for how TabletServers invoke Iterators.
-
-[source,java]
-----
- List<KeyValue> batch;
- Range range = getRangeFromClient();
- while(!overSizeLimit(batch)){
-   SortedKeyValueIterator source = getSystemIterator();
-
-   for(String clzName : getUserIterators()){
-    Class<?> clz = Class.forName(clzName);
-    SortedKeyValueIterator iter = (SortedKeyValueIterator) clz.newInstance();
-    iter.init(source, opts, env);
-    source = iter;
-   }
-
-   // read a batch of data to return to client
-   // the last iterator, the "top"
-   SortedKeyValueIterator topIter = source;
-   topIter.seek(getRangeFromUser(), ...)
-
-   while(topIter.hasTop() && !overSizeLimit(batch)){
-     key = topIter.getTopKey()
-     val = topIter.getTopValue()
-     batch.add(new KeyValue(key, val)
-     if(systemDataSourcesChanged()){
-       // code does not show isolation case, which will
-       // keep using same data sources until a row boundry is hit 
-       range = new Range(key, false, range.endKey(), range.endKeyInclusive());
-       break;
-     }
-   }
- }
- //return batch of key values to client
-----
-
-Additionally, the obtuse "re-seek" case can be outlined as the following:
-
-[source,java]
-----
-  // Given the above
-  List<KeyValue> batch = getNextBatch();
-
-  // Store off lastKeyReturned for this client
-  lastKeyReturned = batch.get(batch.size() - 1).getKey();
-
-  // thread goes away (client stops asking for the next batch).
-
-  // Eventually client comes back
-  // Setup as before...
-
-  Range userRange = getRangeFromUser();
-  Range actualRange = new Range(lastKeyReturned, false
-      userRange.getEndKey(), userRange.isEndKeyInclusive());
-
-  // Use the actualRange, not the user provided one
-  topIter.seek(actualRange);
-----
-
-
-=== Isolation
-
-Accumulo provides a feature which clients can enable to prevent the viewing of 
partially
-applied mutations within the context of rows. If a client is submitting 
multiple column
-updates to rows at a time, isolation would ensure that a client would either 
see all of
-updates made to that row or none of the updates (until they are all applied).
-
-When using Isolation, there are additional concerns in iterator design. A scan 
time iterator in accumulo
-reads from a set of data sources. While an iterator is reading data it has an 
isolated view. However, after it returns a
-key/value it is possible that accumulo may switch data sources and re-seek the 
iterator. This is done so that resources
-may be reclaimed. When the user does not request isolation this can occur 
after any key is returned. When a user enables
-Isolation, this will only occur after a new row is returned, in which case it 
will re-seek to the very beginning of the
-next possible row.
-
-=== Abstract Iterators
-
-A number of Abstract implementations of Iterators are provided to allow for 
faster creation
-of common patterns. The most commonly used abstract implementations are the 
`Filter` and
-`Combiner` classes. When possible these classes should be used instead as they 
have been
-thoroughly tested inside Accumulo itself.
-
-==== Filter
-
-The `Filter` abstract Iterator provides a very simple implementation which 
allows implementations
-to define whether or not a Key-Value pair should be returned via an 
`accept(Key, Value)` method.
-
-Filters are extremely simple to implement; however, when the implementation is 
filtering a
-large percentage of Key-Value pairs with respect to the total number of pairs 
examined,
-it can be very inefficient. For example, if a Filter implementation can 
determine after examining
-part of the row that no other pairs in this row will be accepted, there is no 
mechanism to
-efficiently skip the remaining Key-Value pairs. Concretely, take a row which 
is comprised of
-1000 Key-Value pairs. After examining the first 10 Key-Value pairs, it is 
determined
-that no other Key-Value pairs in this row will be accepted. The Filter must 
still examine each
-remaining 990 Key-Value pairs in this row. Another way to express this 
deficiency is that
-Filters have no means to leverage the `seek` method to efficiently skip large 
portions
-of Key-Value pairs.
-
-As such, the `Filter` class functions well for filtering small amounts of 
data, but is
-inefficient for filtering large amounts of data. The decision to use a 
`Filter` strongly
-depends on the use case and distribution of data being filtered.
-
-==== Combiner
-
-The `Combiner` class is another common abstract Iterator. Similar to the 
`Combiner` interface
-define in Hadoop's MapReduce framework, implementations of this abstract class 
reduce
-multiple Values for different versions of a Key (Keys which only differ by 
timestamps) into one Key-Value pair.
-Combiners provide a simple way to implement common operations like summation 
and
-aggregation without the need to implement the entire Accumulo Iterator 
interface.
-
-One important consideration when choosing to design a Combiner is that the 
"reduction" operation
-is often best represented when it is associative and commutative. Operations 
which do not meet
-these criteria can be implemented; however, the implementation can be 
difficult.
-
-A second consideration is that a Combiner is not guaranteed to see every 
Key-Value pair
-which differ only by timestamp every time it is invoked. For example, if there 
are 5 Key-Value
-pairs in a table which only differ by the timestamps 1, 2, 3, 4, and 5, it is 
not guaranteed that
-every invocation of the Combiner will see 5 timestamps. One invocation might 
see the Values for
-Keys with timestamp 1 and 4, while another invocation might see the Values for 
Keys with the
-timestamps 1, 2, 4 and 5.
-
-Finally, when configuring an Accumulo table to use a Combiner, be sure to 
disable the Versioning Iterator or set the
-Combiner at a priority less than the Combiner (the Versioning Iterator is 
added at a priority of 20 by default). The
-Versioning Iterator will filter out multiple Key-Value pairs that differ only 
by timestamp and return only the Key-Value
-pair that has the largest timestamp.
-
-=== Best practices
-
-Because of the flexibility that the `SortedKeyValueInterface` provides, it 
doesn't directly disallow
-many implementations which are poor design decisions. The following are some 
common recommendations to
-follow and pitfalls to avoid in Iterator implementations.
-
-==== Avoid special logic encoded in Ranges
-
-Commonly, granular Ranges that a client passes to an Iterator from a `Scanner` 
or `BatchScanner` are unmodified.
-If a `Range` falls within the boundaries of a Tablet, an Iterator will often 
see that same Range in the
-`seek` method. However, there is no guarantee that the `Range` will remain 
unaltered from client to server. As such, Iterators
-should *never* make assumptions about the current state/context based on the 
`Range`.
-
-The common failure condition is referred to as a "re-seek". In the context of 
a Scan, TabletServers construct the
-"stack" of Iterators and batch up Key-Value pairs to send back to the client. 
When a sufficient number of Key-Value
-pairs are collected, it is common for the Iterators to be "torn down" until 
the client asks for the next batch of
-Key-Value pairs. This is done by the TabletServer to add fairness in ensuring 
one Scan does not monopolize the available
-resources. When the client asks for the next batch, the implementation 
modifies the original Range so that servers know
-the point to resume the iteration (to avoid returning duplicate Key-Value 
pairs). Specifically, the new Range is created
-from the original but is shortened by setting the startKey of the original 
Range to the Key last returned by the Scan,
-non-inclusive.
-
-==== `seek`'ing backwards
-
-The ability for an Iterator to "skip over" large blocks of Key-Value pairs is 
a major tenet behind Iterators.
-By `seek`'ing when it is known that there is a collection of Key-Value pairs 
which can be ignored can
-greatly increase the speed of a scan as many Key-Value pairs do not have to be 
deserialized and processed.
-
-While the `seek` method provides the `Range` that should be used to `seek` the 
underlying source Iterator,
-there is no guarantee that the implementing Iterator uses that `Range` to 
perform the `seek` on its
-"source" Iterator. As such, it is possible to seek to any `Range` and the 
interface has no assertions
-to prevent this from happening.
-
-Since Iterators are allowed to `seek` to arbitrary Keys, it also allows 
Iterators to create infinite loops
-inside Scans that will repeatedly read the same data without end. If an 
arbitrary Range is constructed, it should
-construct a completely new Range as it allows for bugs to be introduced which 
will break Accumulo.
-
-Thus, `seek`'s should always be thought of as making "forward progress" in the 
view of the total iteration. The
-`startKey` of a `Range` should always be greater than the current Key seen by 
the Iterator while the `endKey` of the
-`Range` should always retain the original `endKey` (and `endKey` inclusivity) 
of the last `Range` seen by your
-Iterator's implementation of seek.
-
-==== Take caution in constructing new data in an Iterator
-
-Implementations of Iterator might be tempted to open BatchWriters inside of an 
Iterator as a means
-to implement triggers for writing additional data outside of their client 
application. The lifecycle of an Iterator
-is *not* managed in such a way that guarantees that this is safe nor 
efficient. Specifically, there
-is no way to guarantee that the internal ThreadPool inside of the BatchWriter 
is closed (and the thread(s)
-are reaped) without calling the close() method. `close`'ing and recreating a 
`BatchWriter` after every
-Key-Value pair is also prohibitively performance limiting to be considered an 
option.
-
-The only safe way to generate additional data in an Iterator is to alter the 
current Key-Value pair.
-For example, the `WholeRowIterator` serializes the all of the Key-Values pairs 
that fall within each
-row. A safe way to generate more data in an Iterator would be to construct an 
Iterator that is
-"higher" (at a larger priority) than the `WholeRowIterator`, that is, the 
Iterator receives the Key-Value pairs which are
-a serialization of many Key-Value pairs. The custom Iterator could deserialize 
the pairs, compute
-some function, and add a new Key-Value pair to the original collection, 
re-serializing the collection
-of Key-Value pairs back into a single Key-Value pair.
-
-Any other situation is likely not guaranteed to ensure that the caller (a Scan 
or a Compaction) will
-always see all intended data that is generated.
-
-=== Final things to remember
-
-Some simple recommendations/points to keep in mind:
-
-==== Method call order
-
-On an instance of an Iterator: `init` is always called before `seek`, `seek` 
is always called before `hasTop`,
-`getTopKey` and `getTopValue` will not be called if `hasTop` returns false.
-
-==== Teardown
-
-As mentioned, instance of Iterators may be torn down inside of the server 
transparently. When a complex
-collection of iterators is performing some advanced functionality, they will 
not be torn down until a Key-Value
-pair is returned out of the "stack" of Iterators (and added into the batch of 
Key-Values to be returned
-to the caller). Being torn-down is equivalent to a new instance of the 
Iterator being creating and `deepCopy`
-being called on the new instance with the old instance provided as the 
argument to `deepCopy`. References
-to the old instance are removed and the object is lazily garbage collected by 
the JVM.
-
-=== Compaction-time Iterators
-
-When Iterators are configured to run during compactions, at the `minc` or 
`majc` scope, these Iterators sometimes need
-to make different assertions than those who only operate at scan time. 
Iterators won't see the delete entries; however,
-Iterators will not necessarily see all of the Key-Value pairs in ever 
invocation. Because compactions often do not rewrite
-all files (only a subset of them), it is possible that the logic take this 
into consideration.
-
-For example, a Combiner that runs over data at during compactions, might not 
see all of the values for a given Key. The
-Combiner must recognize this and not perform any function that would be 
incorrect due
-to the missing values.

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/7cc70b2e/docs/master/iterator_test_harness.md
----------------------------------------------------------------------
diff --git a/docs/master/iterator_test_harness.md 
b/docs/master/iterator_test_harness.md
deleted file mode 100644
index 91ae53a..0000000
--- a/docs/master/iterator_test_harness.md
+++ /dev/null
@@ -1,110 +0,0 @@
-// Licensed to the Apache Software Foundation (ASF) under one or more
-// contributor license agreements. See the NOTICE file distributed with
-// this work for additional information regarding copyright ownership.
-// The ASF licenses this file to You under the Apache License, Version 2.0
-// (the "License"); you may not use this file except in compliance with
-// the License. You may obtain a copy of the License at
-//
-//     http://www.apache.org/licenses/LICENSE-2.0
-//
-// Unless required by applicable law or agreed to in writing, software
-// distributed under the License is distributed on an "AS IS" BASIS,
-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-// See the License for the specific language governing permissions and
-// limitations under the License.
-
-== Iterator Testing
-
-Iterators, while extremely powerful, are notoriously difficult to test. While 
the API defines
-the methods an Iterator must implement and each method's functionality, the 
actual invocation
-of these methods by Accumulo TabletServers can be surprisingly difficult to 
mimic in unit tests.
-
-The Apache Accumulo "Iterator Test Harness" is designed to provide a 
generalized testing framework
-for all Accumulo Iterators to leverage to identify common pitfalls in 
user-created Iterators.
-
-=== Framework Use
-
-The harness provides an abstract class for use with JUnit4. Users must define 
the following for this
-abstract class:
-
-  * A `SortedMap` of input data (`Key`-`Value` pairs)
-  * A `Range` to use in tests
-  * A `Map` of options (`String` to `String` pairs)
-  * A `SortedMap` of output data (`Key`-`Value` pairs)
-  * A list of `IteratorTestCase`s (these can be automatically discovered)
-
-The majority of effort a user must make is in creating the input dataset and 
the expected
-output dataset for the iterator being tested.
-
-=== Normal Test Outline
-
-Most iterator tests will follow the given outline:
-
-[source,java]
-----
-import java.util.List;
-import java.util.SortedMap;
-
-import org.apache.accumulo.core.data.Key;
-import org.apache.accumulo.core.data.Range;
-import org.apache.accumulo.core.data.Value;
-import org.apache.accumulo.iteratortest.IteratorTestCaseFinder;
-import org.apache.accumulo.iteratortest.IteratorTestInput;
-import org.apache.accumulo.iteratortest.IteratorTestOutput;
-import org.apache.accumulo.iteratortest.junit4.BaseJUnit4IteratorTest;
-import org.apache.accumulo.iteratortest.testcases.IteratorTestCase;
-import org.junit.runners.Parameterized.Parameters;
-
-public class MyIteratorTest extends BaseJUnit4IteratorTest {
-
-  @Parameters
-  public static Object[][] parameters() {
-    final IteratorTestInput input = createIteratorInput();
-    final IteratorTestOutput output = createIteratorOutput();
-    final List<IteratorTestCase> testCases = 
IteratorTestCaseFinder.findAllTestCases();
-    return BaseJUnit4IteratorTest.createParameters(input, output, tests);
-  }
-
-  private static SortedMap<Key,Value> INPUT_DATA = createInputData();
-  private static SortedMap<Key,Value> OUTPUT_DATA = createOutputData();
-
-  private static SortedMap<Key,Value> createInputData() {
-    // TODO -- implement this method
-  }
-
-  private static SortedMap<Key,Value> createOutputData() {
-    // TODO -- implement this method
-  }
-
-  private static IteratorTestInput createIteratorInput() {
-    final Map<String,String> options = createIteratorOptions(); 
-    final Range range = createRange();
-    return new IteratorTestInput(MyIterator.class, options, range, INPUT_DATA);
-  }
-
-  private static Map<String,String> createIteratorOptions() {
-    // TODO -- implement this method
-    // Tip: Use INPUT_DATA if helpful in generating output
-  }
-
-  private static Range createRange() {
-    // TODO -- implement this method
-  }
-
-  private static IteratorTestOutput createIteratorOutput() {
-    return new IteratorTestOutput(OUTPUT_DATA);
-  }
-
-}
-----
-
-=== Limitations
-
-While the provided `IteratorTestCase`s should exercise common edge-cases in 
user iterators,
-there are still many limitations to the existing test harness. Some of them 
are:
-
-  * Can only specify a single iterator, not many (a "stack")
-  * No control over provided IteratorEnvironment for tests
-  * Exercising delete keys (especially with major compactions that do not 
include all files)
-
-These are left as future improvements to the harness.

[09/16] accumulo-website git commit: ACCUMULO-4630 Refactored documentation for website

Reply via email to