[
https://issues.apache.org/jira/browse/ACCUMULO-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309738#comment-16309738
]
Jared R commented on ACCUMULO-4749:
-----------------------------------
So in that case if we were to do the route you suggest then I would need still
need the class for the generator (create numbers linked list -> file -> set of
files from directories to loader) and class for the loader(ingest) (turns the
files from the directory into rfiles -> loads batch of rfiles to accumulo).
When I talked to Chris we discussed multiple directories where the generators
put files of random numbers in multiple directories, then are sequenced and
send to the loader which uses a mapreduce job to turn into Rfiles then loads to
Accumulo. I wish I had a way to post the diagram we made.
If this can be done without having to copy files and possible can use the
current batchwriter and ingest code and make some changes for new classes then
that's great.
> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>
> Key: ACCUMULO-4749
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4749
> Project: Accumulo
> Issue Type: Improvement
> Components: test
> Reporter: Ivan Bella
> Assignee: Jared R
> Labels: pull-request-available
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> There are some known cases at least in past versions where bulk loading may
> fail leaving the ~blip in place but no transaction left to handle it. This
> will result in directories of files being left around that are not loaded.
> We should create a continuous ingest variant that uses bulk loading instead.
> Then if this is run with agitation, the continuous ingest verification can
> find data that has been essentially orphaned.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)