Fwding to user.. ---------- Forwarded message ---------- From: Andrew Musselman <andrew.mussel...@gmail.com> Date: Wed, Jul 22, 2015 at 8:54 AM Subject: Re: Parsing and indexing parts of the input file paths To: d...@lucene.apache.org
Thanks, and tell it to index the "id" field, which eventually contains the file path? On Wed, Jul 22, 2015 at 8:48 AM, Erick Erickson <erickerick...@gmail.com> wrote: > PatternReplacecFilterFactory would be just a configuration solution, > construct a fieldType in schema.xml and you're done. It would require > re-indexing of course. > > Best, > Erick > > On Tue, Jul 21, 2015 at 5:59 PM, Andrew Musselman > <andrew.mussel...@gmail.com> wrote: > > Erik, thanks; the prefix starting with "/user/andrew/" will be known, and > > can be put into config, let's assume. Would this be config-only or > would it > > require some code, and could you point to some classes I can start with > if I > > need to write code, and some up-to-date docs? > > > > Same for the update processor, is there an example I could read? > > > > On Tue, Jul 21, 2015 at 11:19 AM, Erik Hatcher <erik.hatc...@gmail.com> > > wrote: > >> > >> If this is only for search, then an analysis chain could be crafted, > >> likely with the pattern regex filter in the mix, to pull out pieces of > the > >> path. How will you know the prefix of the file though? > >> > >> There’s also the ability to do this sort of thing in an update > processor, > >> most easily using the script update processor, using a bit of > JavaScript to > >> pull out the piece(s) you want to index (and even store at this point). > >> > >> — > >> Erik Hatcher, Senior Solutions Architect > >> http://www.lucidworks.com > >> > >> > >> > >> > >> On Jul 21, 2015, at 1:31 PM, Andrew Musselman < > andrew.mussel...@gmail.com> > >> wrote: > >> > >> Dear user and dev lists, > >> > >> We are loading files from a directory and would like to index a portion > of > >> each file path as a field as well as the text inside the file. > >> > >> E.g., on HDFS we have this file path: > >> > >> /user/andrew/1234/1234/file.pdf > >> > >> And we would like the "1234" token parsed from the file path and indexed > >> as an additional field that can be searched on. > >> > >> From my initial searches I can't see how to do this easily, so would I > >> need to write some custom code, or a plugin? > >> > >> Thanks! > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >