Nicolas, Do you use the POS tagger at query time, or just at index time?
We are thinking of using it to filter the tokens we will eventually perform ML on. Basically, we have a bunch of acronyms in our corpus. However, many departments use the same acronyms but expand those acronyms to different things. Eventually, we are thinking of using ML on our index to determine which expansion is meant by a particular query according to the context we find in certain documents. However, since we don't want to run ML on all tokens in a query, and since we think that acronyms are usually the nouns in a multi-token query, we want to only feed nouns to the ML model (TBD). Does that make sense? So, we'd want both an index-side POS tagger (could be slow), and also a query-side POS tagger (must be fast). -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/25/19, 11:57 AM, "Nicolas Paris" <nicolas.pa...@riseup.net> wrote: Also we are using stanford POS tagger for french. The processing time is mitigated by the spark-corenlp package which distribute the process over multiple node. Also I am interesting in the way you use POS information within solr queries, or solr fields. Thanks, On Fri, Oct 25, 2019 at 10:42:43AM -0400, David Hastings wrote: > ah, yeah its not the fastest but it proved to be the best for my purposes, > I use it to pre-process data before indexing, to apply more metadata to the > documents in a separate field(s) > > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > No, I meant for part-of-speech tagging __ But that's interesting that you > > use StanfordNLP. I've read that it's very slow, so we are concerned that it > > might not work for us at query-time. Do you use it at query-time, or just > > index-time? > > > > -- > > Audrey Lorberfeld > > Data Scientist, w3 Search > > IBM > > audrey.lorberf...@ibm.com > > > > > > On 10/25/19, 10:30 AM, "David Hastings" <hastings.recurs...@gmail.com> > > wrote: > > > > Do you mean for entity extraction? > > I make a LOT of use from the stanford nlp project, and get out the > > entities > > and use them for different purposes in solr > > -Dave > > > > On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld - > > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > > > Hi All, > > > > > > Does anyone use a POS tagger with their Solr instance other than > > > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson. > > > > > > Thanks! > > > > > > -- > > > Audrey Lorberfeld > > > Data Scientist, w3 Search > > > IBM > > > audrey.lorberf...@ibm.com > > > > > > > > > > > > -- nicolas