[ 
https://issues.apache.org/jira/browse/ACCUMULO-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308704#comment-16308704
 ] 

Keith Turner commented on ACCUMULO-4749:
----------------------------------------

Below is a possible design for this issue.

Generators that do the following in a loop
  * Create 1 million random numbers that point to the previous 1 million (if 
there is a previous)
  * Write the 1 million to a file in hdfs in directory A.
  * Check the size of the hdfs directory where files are output to.  If its too 
large then wait.

Ingest processes does the following in a loop.
  * Move all files from dir A to dir B.
  * Run a map reduce job to sort files in B into dir C. This step creates the 
rfiles needed for bulk import.
  * Bulk import files in dir C into Accumulo
 




> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4749
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4749
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: test
>            Reporter: Ivan Bella
>            Assignee: Jared R
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are some known cases at least in past versions where bulk loading may 
> fail leaving the ~blip in place but no transaction left to handle it.  This 
> will result in directories of files being left around that are not loaded.  
> We should create a continuous ingest variant that uses bulk loading instead.  
> Then if this is run with agitation, the continuous ingest verification can 
> find data that has been essentially orphaned.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to