Re: POS Tagger

2019-10-25 Thread Nicolas Paris
Also the openNlp solr POS tagger [1] uses the typeAsSynonymFilter to store the POS: " Index the POS for each token as a synonym, after prefixing the POS with @ " Not sure how to deal with POS after such indexing, but this looks interesting approach ? [1] http://lucene.apache.org/solr/guide/7_3

Re: POS Tagger

2019-10-25 Thread Dave
Yeah. My mistake in explanation. But it really does help with better relevance in the returned documents > On Oct 25, 2019, at 12:39 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com > wrote: > > Oh I see I see > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh I see I see -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:21 PM, "David Hastings" wrote: oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
How can a field itself be tagged with a part of speech? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:12 PM, "David Hastings" wrote: nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at 12

Re: POS Tagger

2019-10-25 Thread Nicolas Paris
> Do you use the POS tagger at query time, or just at index time? I have the POS tagger pipeline ready but nothing done yet on the solr part. Right now I am wondering how to use it but still looking for relevant implementation. I guess having the POS information ready before indexation gives the

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
oh i see what you mean, sorry, i explained it incorrectly. those sentences are what would be in the index, and a general search for 'rush limbaugh' would come back with results where he is an entity higher than if it was two words in a sentence On Fri, Oct 25, 2019 at 12:12 PM David Hastings < ha

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
nope, i boost the fields already tagged at query time against teh query On Fri, Oct 25, 2019 at 12:11 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > So then you do run your POS tagger at query-time, Dave? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
So then you do run your POS tagger at query-time, Dave? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 12:06 PM, "David Hastings" wrote: I use them for query boosting, so if someone searches for: i dont want to rush limbaugh out the do

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Nicolas, Do you use the POS tagger at query time, or just at index time? We are thinking of using it to filter the tokens we will eventually perform ML on. Basically, we have a bunch of acronyms in our corpus. However, many departments use the same acronyms but expand those acronyms to differe

Re: POS Tagger

2019-10-25 Thread David Hastings
I use them for query boosting, so if someone searches for: i dont want to rush limbaugh out the door vs i talked to rush limbaugh through the door my documents where 'rush limbaugh' is a known entity (noun) and a person (look at the sentence, its obviously a person and the nlp finds that) have 'r

Re: POS Tagger

2019-10-25 Thread Nicolas Paris
Also we are using stanford POS tagger for french. The processing time is mitigated by the spark-corenlp package which distribute the process over multiple node. Also I am interesting in the way you use POS information within solr queries, or solr fields. Thanks, On Fri, Oct 25, 2019 at 10:42:43A

Re: Re: POS Tagger

2019-10-25 Thread David Hastings
ah, yeah its not the fastest but it proved to be the best for my purposes, I use it to pre-process data before indexing, to apply more metadata to the documents in a separate field(s) On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > No, I meant for part-of-

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
No, I meant for part-of-speech tagging __ But that's interesting that you use StanfordNLP. I've read that it's very slow, so we are concerned that it might not work for us at query-time. Do you use it at query-time, or just index-time? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.l

Re: POS Tagger

2019-10-25 Thread David Hastings
https://nlp.stanford.edu/ On Fri, Oct 25, 2019 at 10:29 AM David Hastings < hastings.recurs...@gmail.com> wrote: > Do you mean for entity extraction? > I make a LOT of use from the stanford nlp project, and get out the > entities and use them for different purposes in solr > -Dave > > On Fri, Oct

Re: POS Tagger

2019-10-25 Thread David Hastings
Do you mean for entity extraction? I make a LOT of use from the stanford nlp project, and get out the entities and use them for different purposes in solr -Dave On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > Does anyone use a POS tagger with t