Nicolas,

Do you use the POS tagger at query time, or just at index time? 

We are thinking of using it to filter the tokens we will eventually perform ML 
on. Basically, we have a bunch of acronyms in our corpus. However, many 
departments use the same acronyms but expand those acronyms to different 
things. Eventually, we are thinking of using ML on our index to determine which 
expansion is meant by a particular query according to the context we find in 
certain documents. However, since we don't want to run ML on all tokens in a 
query, and since we think that acronyms are usually the nouns in a multi-token 
query, we want to only feed nouns to the ML model (TBD).

Does that make sense? So, we'd want both an index-side POS tagger (could be 
slow), and also a query-side POS tagger (must be fast).

-- 
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
 

On 10/25/19, 11:57 AM, "Nicolas Paris" <nicolas.pa...@riseup.net> wrote:

    Also we are using stanford POS tagger for french. The processing time is
    mitigated by the spark-corenlp package which distribute the process over
    multiple node.
    
    Also I am interesting in the way you use POS information within solr
    queries, or solr fields. 
    
    Thanks,
    On Fri, Oct 25, 2019 at 10:42:43AM -0400, David Hastings wrote:
    > ah, yeah its not the fastest but it proved to be the best for my purposes,
    > I use it to pre-process data before indexing, to apply more metadata to 
the
    > documents in a separate field(s)
    > 
    > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld -
    > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
    > 
    > > No, I meant for part-of-speech tagging __ But that's interesting that 
you
    > > use StanfordNLP. I've read that it's very slow, so we are concerned 
that it
    > > might not work for us at query-time. Do you use it at query-time, or 
just
    > > index-time?
    > >
    > > --
    > > Audrey Lorberfeld
    > > Data Scientist, w3 Search
    > > IBM
    > > audrey.lorberf...@ibm.com
    > >
    > >
    > > On 10/25/19, 10:30 AM, "David Hastings" <hastings.recurs...@gmail.com>
    > > wrote:
    > >
    > >     Do you mean for entity extraction?
    > >     I make a LOT of use from the stanford nlp project, and get out the
    > > entities
    > >     and use them for different purposes in solr
    > >     -Dave
    > >
    > >     On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld -
    > >     audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
    > >
    > >     > Hi All,
    > >     >
    > >     > Does anyone use a POS tagger with their Solr instance other than
    > >     > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson.
    > >     >
    > >     > Thanks!
    > >     >
    > >     > --
    > >     > Audrey Lorberfeld
    > >     > Data Scientist, w3 Search
    > >     > IBM
    > >     > audrey.lorberf...@ibm.com
    > >     >
    > >     >
    > >
    > >
    > >
    
    -- 
    nicolas
    

Reply via email to