Also the openNlp solr POS tagger [1] uses the typeAsSynonymFilter to
store the POS: 

" Index the POS for each token as a synonym, after prefixing the POS with @ "

Not sure how to deal with POS after such indexing, but this looks
interesting approach ?

[1] 
http://lucene.apache.org/solr/guide/7_3/language-analysis.html#opennlp-part-of-speech-filter
On Fri, Oct 25, 2019 at 06:25:36PM +0200, Nicolas Paris wrote:
> > Do you use the POS tagger at query time, or just at index time? 
> 
> I have the POS tagger pipeline ready but nothing done yet on the solr
> part. Right now I am wondering how to use it but still looking for
> relevant implementation.
> 
> I guess having the POS information ready before indexation gives the
> flexibility to test multiple scenario.
> 
> In case of acronyms, one possible way is indeed to consider the user
> query as NOUNS, and from the index side, only keep the acronyms that
> are tagged with NOUNS. (i.e. detect acronyms within text, and look for
> it's POS; remove it in case it's not a NOUN)
> 
> Definitely, I prefer the pre-processing approach for this, than creating
> dedicated solr analysers because my context is batch processing, and
> also this simplifies testing and debugging - while offering large panel
> of NLP tools to deal with.
> 
> On Fri, Oct 25, 2019 at 04:09:29PM +0000, Audrey Lorberfeld - 
> audrey.lorberf...@ibm.com wrote:
> > Nicolas,
> > 
> > Do you use the POS tagger at query time, or just at index time? 
> > 
> > We are thinking of using it to filter the tokens we will eventually perform 
> > ML on. Basically, we have a bunch of acronyms in our corpus. However, many 
> > departments use the same acronyms but expand those acronyms to different 
> > things. Eventually, we are thinking of using ML on our index to determine 
> > which expansion is meant by a particular query according to the context we 
> > find in certain documents. However, since we don't want to run ML on all 
> > tokens in a query, and since we think that acronyms are usually the nouns 
> > in a multi-token query, we want to only feed nouns to the ML model (TBD).
> > 
> > Does that make sense? So, we'd want both an index-side POS tagger (could be 
> > slow), and also a query-side POS tagger (must be fast).
> > 
> > -- 
> > Audrey Lorberfeld
> > Data Scientist, w3 Search
> > IBM
> > audrey.lorberf...@ibm.com
> >  
> > 
> > On 10/25/19, 11:57 AM, "Nicolas Paris" <nicolas.pa...@riseup.net> wrote:
> > 
> >     Also we are using stanford POS tagger for french. The processing time is
> >     mitigated by the spark-corenlp package which distribute the process over
> >     multiple node.
> >     
> >     Also I am interesting in the way you use POS information within solr
> >     queries, or solr fields. 
> >     
> >     Thanks,
> >     On Fri, Oct 25, 2019 at 10:42:43AM -0400, David Hastings wrote:
> >     > ah, yeah its not the fastest but it proved to be the best for my 
> > purposes,
> >     > I use it to pre-process data before indexing, to apply more metadata 
> > to the
> >     > documents in a separate field(s)
> >     > 
> >     > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld -
> >     > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
> >     > 
> >     > > No, I meant for part-of-speech tagging __ But that's interesting 
> > that you
> >     > > use StanfordNLP. I've read that it's very slow, so we are concerned 
> > that it
> >     > > might not work for us at query-time. Do you use it at query-time, 
> > or just
> >     > > index-time?
> >     > >
> >     > > --
> >     > > Audrey Lorberfeld
> >     > > Data Scientist, w3 Search
> >     > > IBM
> >     > > audrey.lorberf...@ibm.com
> >     > >
> >     > >
> >     > > On 10/25/19, 10:30 AM, "David Hastings" 
> > <hastings.recurs...@gmail.com>
> >     > > wrote:
> >     > >
> >     > >     Do you mean for entity extraction?
> >     > >     I make a LOT of use from the stanford nlp project, and get out 
> > the
> >     > > entities
> >     > >     and use them for different purposes in solr
> >     > >     -Dave
> >     > >
> >     > >     On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld -
> >     > >     audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
> >     > >
> >     > >     > Hi All,
> >     > >     >
> >     > >     > Does anyone use a POS tagger with their Solr instance other 
> > than
> >     > >     > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson.
> >     > >     >
> >     > >     > Thanks!
> >     > >     >
> >     > >     > --
> >     > >     > Audrey Lorberfeld
> >     > >     > Data Scientist, w3 Search
> >     > >     > IBM
> >     > >     > audrey.lorberf...@ibm.com
> >     > >     >
> >     > >     >
> >     > >
> >     > >
> >     > >
> >     
> >     -- 
> >     nicolas
> >     
> > 
> 
> -- 
> nicolas
> 

-- 
nicolas

Reply via email to