Re: How to create indexes for files in hdfs using tika embeded in solr?

Raja Pothuganti Sat, 18 Jul 2015 07:43:46 -0700

 would MapReduceIndexerTool option ?


http://www.cloudera.com/content/cloudera/en/documentation/cloudera-search/v
1-latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html



On 7/18/15, 9:38 AM, "步青云" <mailliup...@qq.com> wrote:

>I need help. I have several hundreds of GB files in hdfs and I want to
>creat indexes for these files so that I can search quickly. How can I
>create indexes for these files in hdfs? I know tika embeded in solr could
>extact the content of files in local file system and then solr would
>create indexes for these files. What I need to do is to set the path of
>file. Then, ContentStreamUpdateRequest would extract the content of files
>using Tika and create indexes in solr. The java code is as follows:
>      public void indexFilesSolrCell(FileBean fileBean)
>                       throws IOException, SolrServerException {
>               try{
>                       SolrServer solr = FtrsSolrServer.getServer();
>                       ContentStreamUpdateRequest up = new 
> ContentStreamUpdateRequest(
>                                       "/update/extract");
>                       up.addFile(new File(fileBean.getLocalPath()),
>fileBean.getContentType()); //set the path of file
>                       up.setParam("literal.id", UUID.randomUUID().toString());
>                       up.setParam("literal.create_time", 
> fileBean.getCreateTime());
>                       up.setParam("literal.title", fileBean.getTitle());
>                       up.setParam("literal.creator", fileBean.getCreator());
>                       up.setParam("literal.description", 
> fileBean.getDescription());
>                       up.setParam("literal.file_name", 
> fileBean.getFileName());
>                       up.setParam("literal.folder_path", 
> fileBean.getFolderPath());
>                       up.setParam("literal.fid", fileBean.getFid());
>                       up.setParam("fmap.content", "content");
>                       up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, 
> true);
>                       
>                       solr.request(up);
>               } catch(Exception e){
>                       e.printStackTrace();
>               }
>       }
>
>But the code above is not work for the files in hdfs. When I set the file
>path as "hdfs://hadoop1:8020/.....", errors occured. The error message is
>just like the mean "FileSystem should not start with 'hdfs://". Could
>tika not extract the files in hdfs or there are some mistakes in my java
>code? If tika could not extract, how can I create indexes for the files
>in hdfs?
>Thanks for any reply. I urgently need help.
>Best wishes.

Re: How to create indexes for files in hdfs using tika embeded in solr?

Reply via email to