Enumerate the file locations (map) , put them in a queue like rabbit or Kafka (Persist the map), have a bunch of threads , workers, containers, whatever pop off the queue , process the item (reduce).
-- Rahul Singh [email protected] Anant Corporation On May 20, 2018, 7:24 AM -0400, Raymond Xie <[email protected]>, wrote: > I know how to do indexing on file system like single file or folder, but > how do I do that in a parallel way? The data I need to index is of huge > volume and can't be put on HDFS. > > Thank you > > *------------------------------------------------* > *Sincerely yours,* > > > *Raymond*
