Also the openNlp solr POS tagger [1] uses the typeAsSynonymFilter to store the POS:
" Index the POS for each token as a synonym, after prefixing the POS with @ " Not sure how to deal with POS after such indexing, but this looks interesting approach ? [1] http://lucene.apache.org/solr/guide/7_3/language-analysis.html#opennlp-part-of-speech-filter On Fri, Oct 25, 2019 at 06:25:36PM +0200, Nicolas Paris wrote: > > Do you use the POS tagger at query time, or just at index time? > > I have the POS tagger pipeline ready but nothing done yet on the solr > part. Right now I am wondering how to use it but still looking for > relevant implementation. > > I guess having the POS information ready before indexation gives the > flexibility to test multiple scenario. > > In case of acronyms, one possible way is indeed to consider the user > query as NOUNS, and from the index side, only keep the acronyms that > are tagged with NOUNS. (i.e. detect acronyms within text, and look for > it's POS; remove it in case it's not a NOUN) > > Definitely, I prefer the pre-processing approach for this, than creating > dedicated solr analysers because my context is batch processing, and > also this simplifies testing and debugging - while offering large panel > of NLP tools to deal with. > > On Fri, Oct 25, 2019 at 04:09:29PM +0000, Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > Nicolas, > > > > Do you use the POS tagger at query time, or just at index time? > > > > We are thinking of using it to filter the tokens we will eventually perform > > ML on. Basically, we have a bunch of acronyms in our corpus. However, many > > departments use the same acronyms but expand those acronyms to different > > things. Eventually, we are thinking of using ML on our index to determine > > which expansion is meant by a particular query according to the context we > > find in certain documents. However, since we don't want to run ML on all > > tokens in a query, and since we think that acronyms are usually the nouns > > in a multi-token query, we want to only feed nouns to the ML model (TBD). > > > > Does that make sense? So, we'd want both an index-side POS tagger (could be > > slow), and also a query-side POS tagger (must be fast). > > > > -- > > Audrey Lorberfeld > > Data Scientist, w3 Search > > IBM > > audrey.lorberf...@ibm.com > > > > > > On 10/25/19, 11:57 AM, "Nicolas Paris" <nicolas.pa...@riseup.net> wrote: > > > > Also we are using stanford POS tagger for french. The processing time is > > mitigated by the spark-corenlp package which distribute the process over > > multiple node. > > > > Also I am interesting in the way you use POS information within solr > > queries, or solr fields. > > > > Thanks, > > On Fri, Oct 25, 2019 at 10:42:43AM -0400, David Hastings wrote: > > > ah, yeah its not the fastest but it proved to be the best for my > > purposes, > > > I use it to pre-process data before indexing, to apply more metadata > > to the > > > documents in a separate field(s) > > > > > > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - > > > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > > > > > No, I meant for part-of-speech tagging __ But that's interesting > > that you > > > > use StanfordNLP. I've read that it's very slow, so we are concerned > > that it > > > > might not work for us at query-time. Do you use it at query-time, > > or just > > > > index-time? > > > > > > > > -- > > > > Audrey Lorberfeld > > > > Data Scientist, w3 Search > > > > IBM > > > > audrey.lorberf...@ibm.com > > > > > > > > > > > > On 10/25/19, 10:30 AM, "David Hastings" > > <hastings.recurs...@gmail.com> > > > > wrote: > > > > > > > > Do you mean for entity extraction? > > > > I make a LOT of use from the stanford nlp project, and get out > > the > > > > entities > > > > and use them for different purposes in solr > > > > -Dave > > > > > > > > On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld - > > > > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > > > > > > > > Hi All, > > > > > > > > > > Does anyone use a POS tagger with their Solr instance other > > than > > > > > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson. > > > > > > > > > > Thanks! > > > > > > > > > > -- > > > > > Audrey Lorberfeld > > > > > Data Scientist, w3 Search > > > > > IBM > > > > > audrey.lorberf...@ibm.com > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > nicolas > > > > > > -- > nicolas > -- nicolas