would MapReduceIndexerTool option ?
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v 1-latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html On 7/18/15, 9:38 AM, "步青云" <mailliup...@qq.com> wrote: >I need help. I have several hundreds of GB files in hdfs and I want to >creat indexes for these files so that I can search quickly. How can I >create indexes for these files in hdfs? I know tika embeded in solr could >extact the content of files in local file system and then solr would >create indexes for these files. What I need to do is to set the path of >file. Then, ContentStreamUpdateRequest would extract the content of files >using Tika and create indexes in solr. The java code is as follows: > public void indexFilesSolrCell(FileBean fileBean) > throws IOException, SolrServerException { > try{ > SolrServer solr = FtrsSolrServer.getServer(); > ContentStreamUpdateRequest up = new > ContentStreamUpdateRequest( > "/update/extract"); > up.addFile(new File(fileBean.getLocalPath()), >fileBean.getContentType()); //set the path of file > up.setParam("literal.id", UUID.randomUUID().toString()); > up.setParam("literal.create_time", > fileBean.getCreateTime()); > up.setParam("literal.title", fileBean.getTitle()); > up.setParam("literal.creator", fileBean.getCreator()); > up.setParam("literal.description", > fileBean.getDescription()); > up.setParam("literal.file_name", > fileBean.getFileName()); > up.setParam("literal.folder_path", > fileBean.getFolderPath()); > up.setParam("literal.fid", fileBean.getFid()); > up.setParam("fmap.content", "content"); > up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, > true); > > solr.request(up); > } catch(Exception e){ > e.printStackTrace(); > } > } > >But the code above is not work for the files in hdfs. When I set the file >path as "hdfs://hadoop1:8020/.....", errors occured. The error message is >just like the mean "FileSystem should not start with 'hdfs://". Could >tika not extract the files in hdfs or there are some mistakes in my java >code? If tika could not extract, how can I create indexes for the files >in hdfs? >Thanks for any reply. I urgently need help. >Best wishes.