Re: Parsing and indexing parts of the input file paths

Andrew Musselman Tue, 21 Jul 2015 13:42:14 -0700

Thanks, so by the time we would get to an Analyzer the file path is
forgotten?


https://cwiki.apache.org/confluence/display/solr/Analyzers

On Tue, Jul 21, 2015 at 1:27 PM, Upayavira <u...@odoko.co.uk> wrote:

> Solr generally does not interact with the file system in that way (with
> the exception of the DIH).
>
> It is the job of the code that pushes a file to Solr to process the
> filename and send that along with the request.
>
> See here for more info:
>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>
> You could provide literal.filename=blah/blah
>
> Upayavira
>
>
> On Tue, Jul 21, 2015, at 07:37 PM, Andrew Musselman wrote:
> > I'm not sure, it's a remote team but will get more info.  For now,
> > assuming
> > that a certain directory is specified, like "/user/andrew/", and a regex
> > is
> > applied to capture anything two directories below matching "*/*/*.pdf".
> >
> > Would there be a way to capture the wild-carded values and index them as
> > fields?
> >
> > On Tue, Jul 21, 2015 at 11:20 AM, Upayavira <u...@odoko.co.uk> wrote:
> >
> > > Keeping to the user list (the right place for this question).
> > >
> > > More information is needed here - how are you getting these documents
> > > into Solr? Are you posting them to /update/extract? Or using DIH, or?
> > >
> > > Upayavira
> > >
> > > On Tue, Jul 21, 2015, at 06:31 PM, Andrew Musselman wrote:
> > > > Dear user and dev lists,
> > > >
> > > > We are loading files from a directory and would like to index a
> portion
> > > > of
> > > > each file path as a field as well as the text inside the file.
> > > >
> > > > E.g., on HDFS we have this file path:
> > > >
> > > > /user/andrew/1234/1234/file.pdf
> > > >
> > > > And we would like the "1234" token parsed from the file path and
> indexed
> > > > as
> > > > an additional field that can be searched on.
> > > >
> > > > From my initial searches I can't see how to do this easily, so would
> I
> > > > need
> > > > to write some custom code, or a plugin?
> > > >
> > > > Thanks!
> > >
>

Re: Parsing and indexing parts of the input file paths

Reply via email to