Re: Parsing and indexing parts of the input file paths

Andrew Musselman Wed, 22 Jul 2015 09:49:43 -0700

Trying to figure out how to parse the file path, which when I run the
"cloud" instance becomes the "id" for each PDF document.


Is that "id" field the thing to parse with PatternReplaceFilterFactory in
the config?  If not, is there a "file-path" field I can parse?

On Wed, Jul 22, 2015 at 9:42 AM, Erick Erickson <[email protected]>
wrote:

> Don't understand your question. If you're talking two different
> fields, use copyField.
>
> On Wed, Jul 22, 2015 at 8:55 AM, Andrew Musselman
> <[email protected]> wrote:
> > Fwding to user..
> >
> > ---------- Forwarded message ----------
> > From: Andrew Musselman <[email protected]>
> > Date: Wed, Jul 22, 2015 at 8:54 AM
> > Subject: Re: Parsing and indexing parts of the input file paths
> > To: [email protected]
> >
> >
> > Thanks, and tell it to index the "id" field, which eventually contains
> the
> > file path?
> >
> > On Wed, Jul 22, 2015 at 8:48 AM, Erick Erickson <[email protected]
> >
> > wrote:
> >
> >> PatternReplacecFilterFactory would be just a configuration solution,
> >> construct a fieldType in schema.xml and you're done. It would require
> >> re-indexing of course.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jul 21, 2015 at 5:59 PM, Andrew Musselman
> >> <[email protected]> wrote:
> >> > Erik, thanks; the prefix starting with "/user/andrew/" will be known,
> and
> >> > can be put into config, let's assume.  Would this be config-only or
> >> would it
> >> > require some code, and could you point to some classes I can start
> with
> >> if I
> >> > need to write code, and some up-to-date docs?
> >> >
> >> > Same for the update processor, is there an example I could read?
> >> >
> >> > On Tue, Jul 21, 2015 at 11:19 AM, Erik Hatcher <
> [email protected]>
> >> > wrote:
> >> >>
> >> >> If this is only for search, then an analysis chain could be crafted,
> >> >> likely with the pattern regex filter in the mix, to pull out pieces
> of
> >> the
> >> >> path.  How will you know the prefix of the file though?
> >> >>
> >> >> There’s also the ability to do this sort of thing in an update
> >> processor,
> >> >> most easily using the script update processor, using a bit of
> >> JavaScript to
> >> >> pull out the piece(s) you want to index (and even store at this
> point).
> >> >>
> >> >> —
> >> >> Erik Hatcher, Senior Solutions Architect
> >> >> http://www.lucidworks.com
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Jul 21, 2015, at 1:31 PM, Andrew Musselman <
> >> [email protected]>
> >> >> wrote:
> >> >>
> >> >> Dear user and dev lists,
> >> >>
> >> >> We are loading files from a directory and would like to index a
> portion
> >> of
> >> >> each file path as a field as well as the text inside the file.
> >> >>
> >> >> E.g., on HDFS we have this file path:
> >> >>
> >> >> /user/andrew/1234/1234/file.pdf
> >> >>
> >> >> And we would like the "1234" token parsed from the file path and
> indexed
> >> >> as an additional field that can be searched on.
> >> >>
> >> >> From my initial searches I can't see how to do this easily, so would
> I
> >> >> need to write some custom code, or a plugin?
> >> >>
> >> >> Thanks!
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>

Re: Parsing and indexing parts of the input file paths

Reply via email to