Strange Synonym Graph Filter Bug in Admin UI

2020-05-26 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, We are coming across a strange bug in the Analysis section of the Admin UI. For our non-English schema components, instead of the Synonym Graph Filter (SGF) showing in the UI, it's showing something called a "List Based Token Stream" (LBTS) in its place. We found an old issue that docum

RE: Indexing Korean

2020-05-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Oh wow, I had no idea this existed. Thank you so much! Best, Audrey On 5/1/20, 12:58 PM, "Markus Jelsma" wrote: Hello, Although it is not mentioned in Solr's language analysis page in the manual, Lucene has had support for Korean for quite a while now. https://urldefense.proofp

RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-05-01 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
over at snowballstem.org? On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > I agree with Erick. I think that's just how the cookie crumbles when > stemming. If you have some time on your hands, you can integrate > OpenNLP with

Indexing Korean

2020-05-01 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, My team would like to index Korean, but it looks like Solr OOTB does not have explicit support for Korean. If any of you have schema pipelines you could share for your Korean documents, I would love to see them! I'm assuming I would just use some combination of the OOTB CJK factories..

RE: Solr fields mapping

2020-04-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
but it does not fit in my use case. Reason is while giving as output we have to show each field with its value, with copy it combines the value but we do not know field and value relationship. regards sam On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld - au

RE: Possible issue with Stemming and nouns ended with suffix 'ion'

2020-04-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I agree with Erick. I think that's just how the cookie crumbles when stemming. If you have some time on your hands, you can integrate OpenNLP with your Solr instance and start using the lemmas of tokens instead of the stems. In this case, I believe if you were to lemmatize both "identify" and "i

Re: Solr fields mapping

2020-04-29 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi, Sam! Have you tried creating a copyField? https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/copying-fields.html Best, Audrey On 4/28/20, 1:07 PM, "sambasivarao giddaluri" wrote: Hi All, Is there a way we can map fields in a single field? Ex: s

Japanese text handling in Solr

2020-03-31 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, We are adding Japanese to our index, and I would love to know if any of you have a synonyms file you use for Japanese? Thank you! Best, Audrey Lorberfeld

Re: Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
uot;soap powder" anymore, rather it expands separate synonyms > for > "soap" and "powder". > > > > Best Regards, > Atin Janki > > > On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld - > audrey.lorberf...@ib

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
;atin janki" wrote: Using sow=true, does split the word on whitespaces but it will not look for synonyms of "soap powder" anymore, rather it expands separate synonyms for "soap" and "powder". Best Regards, Atin

Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

2020-03-16 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Have you set sow=true in your search handler? I know that we have it set to false (sow = split on whitespace) because we WANT multi-token synonyms retained as multiple tokens. On 3/16/20, 10:49 AM, "atin janki" wrote: Hello everyone, I am using solr 8.3. After I include

Re: configuring suggester with api

2020-03-12 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Manoj, In the handler, I think you are missing the suggest.dictionary parameter, which should be set to the name of your suggestion component. In this case, I believe it would should be set to "titleSuggester." In this sample URL from the documentation, they have a suggest.dictionary field,

exactMatchFirst Solr Suggestion Component

2020-03-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, Would anyone be able to help me debug my suggestion component? Right now, our config looks like this: mySuggester FuzzyLookupFactory FileDictionaryFactory ./conf/queries_list_with_weights.txt , conf keywords_w3_en false We like the idea of the F

Re: Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-28 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
r search "bag". Thus, for a single search, S/D can be either 0 or 1 - you're right, it's binary! Hope this helps. Loved your questions! :) On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Paras, > >

Re: Re: Re: Re: Query Autocomplete Evaluation

2020-02-27 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
those searches where Selection was not made because there were no results while S/D will not count this - it only counts cases where the result was displayed. Hope I'm clear. :) On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > This article >

Re: Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
(i.e., the query is related to the context; 60% of the pairs) and unrelated (40% of the pairs)." On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" wrote: Thank you, Walter & Paras! So, from the MRR equation, I was under the impression the s

Re: Re: Query Autocomplete Evaluation

2020-02-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
uot;In practice, this is inverted to obtain the reciprocal rank, e.g., if the > searcher clicks on the 4th result, the reciprocal rank is 0.25. The average > of these reciprocal ranks is called the mean reciprocal rank (MRR)." > > nDCG may require human intervent

Re: Re: Query Autocomplete Evaluation

2020-02-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
ecomes apparent if you observe the URL after clicking a suggestion on dir.indiamart.com. However, not everything would benefit you. Do let me know for any related query or explanation. Hope this helps. :) On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - audrey.lorberf...@ibm

Query Autocomplete Evaluation

2020-02-14 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi all, How do you all evaluate the success of your query autocomplete (i.e. suggester) component if you use it? We cannot use MRR for various reasons (I can go into them if you're interested), so we're thinking of using nDCG since we already use that for relevance eval of our system as a who

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-31 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
tandard /select queries. (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming) Erik > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote:

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-26 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
ead of adding weights inthe document you can also use LTR >> with in Solr to rerank on the features. >> >> Regards, >> Lucky Sharma >> >> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - >> audrey.lorberf...@ibm.com, wrote: >>

Re: Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
th > in Solr to rerank on the features. > > Regards, > Lucky Sharma > > On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - > audrey.lorberf...@ibm.com, > wrote: > > > Erik, > > > > Thank you! Yes, that's exactly

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
0 at 17:02, David Hastings wrote: > Not a bad idea at all, however ive never used an external file before, just > a field in the index, so not an area im familiar with > > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
standard /select queries. (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming) Erik > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: >

Re: Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies! On 1/20/20, 12:01 PM, "fiedzia" wrote: >From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you >would have to type: > >Regional sales ma

Re: Re: Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
l make the start/reload times pretty slow On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > We plan to incorporate a query autocomplete functionality into our search > engine (like this: https://ur

Anyone have experience with Query Auto-Suggestor?

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, We plan to incorporate a query autocomplete functionality into our search engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html ). And I was wondering if anyone has personal experience with this component and would like to share? Basically, we are just looking for so

Re: Re: Re: Handling overlapping synonyms

2020-01-20 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
From my understanding, if you want regional sales manager to be indexed as both director of sales and area manager, you would have to type: Regional sales manager -> director of sales, area manager I do not believe you can chain synonyms. Re: bigrams/trigrams, I was more interested in you want

Re: Re: Handling overlapping synonyms

2020-01-17 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hmm what is the reasoning behind adding the bigrams and trigrams manually like that? Maybe if we knew the end goal, we could figure out a different strategy. Happy that at least the matching is working now! On 1/17/20, 10:28 AM, "fiedzia" wrote: > Doing it the other way (new york cit

Re: Handling overlapping synonyms

2020-01-17 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
If you instead write "new york => new_york, new_york_city" it should work (https://doc.lucidworks.com/fusion/3.1/Collections/Synonyms-Files.html) On 1/17/20, 6:29 AM, "fiedzia" wrote: Having synonyms defined for new york -> new_york new york city -> new_york_city I'd

Re: Ref Guide - Precision & Recall of Analyzers

2019-11-06 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I would also love to know what filter to use to ignore capitalized acronyms... which one can do this OOTB? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 11/6/19, 3:54 AM, "Paras Lehana" wrote: Hi Community, In Ref Guide 8.3's *Understanding An

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
query time against teh query > > On Fri, Oct 25, 2019 at 12:11 PM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > >> So then you do run your POS tagger at query-time, Dave? >> >> -- >> Audrey Lorberfeld >>

Re: Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
5, 2019 at 12:11 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > So then you do run your POS tagger at query-time, Dave? > > -- > Audrey Lorberfeld > Data Scientist, w3 Search > IBM > audrey.lorberf...@ibm.com > >

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
ts in a separate field(s) > > > > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - > > audrey.lorberf...@ibm.com wrote: > > > > > No, I meant for part-of-speech tagging __ But that's interesting that > you > > > use Stan

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
the > documents in a separate field(s) > > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > No, I meant for part-of-speech tagging __ But that's interesting that you > > use StanfordNLP. I

Re: Re: POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
, 2019 at 10:16 AM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All, > > Does anyone use a POS tagger with their Solr instance other than > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson. > > Thanks! > > -- &g

POS Tagger

2019-10-25 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, Does anyone use a POS tagger with their Solr instance other than OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson. Thanks! -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com

Re: Re: using the df parameter to set a default to search all fields

2019-10-22 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
en^1.5 url^0.5 -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/22/19, 1:50 PM, "Shawn Heisey" wrote: On 10/22/2019 11:42 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > I think you actually can search over all fields, but n

Re: Re: using the df parameter to set a default to search all fields

2019-10-22 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I think you actually can search over all fields, but not in the df parameter. We have a big list of fields we want to search over. So, we just put a dummy one in the df param field, and then we use the fl parameter. With the edismax parser, this works. It looks something like this:

Re: Re: Query on autoGeneratePhraseQueries

2019-10-15 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
I'm not sure how your config file is setup, but I know that the way we do multi-token synonyms is to have the sow (split on whitespace) parameter set to False while using the edismax parser. I'm not sure if this would work with PhraseQueries , but it might be worth a try! In our config file we

Re: Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
e anything close to a decent server you wont notice it all. im at about 21 million documents, index varies between 450gb to 800gb depending on merges, and about 60k searches a day and stay sub second non stop, and this is on a single core/non cloud environment On Wed, Oct 9, 20

Re: Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
would have to retrieve every single document in our corpus and rank them. That's a high computational cost, no? -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 10/9/19, 2:31 PM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com" wrote:

Re: Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
orner cases. Consider a > search for “to be or not to be” if they’re all stopwords. > > Best, > Erick > > > On Oct 9, 2019, at 9:38 AM, Audrey Lorberfeld - > audrey.lorberf...@ibm.com wrote: > > > > Hey Alex, > > >

Re: Re: Protecting Tokens from Any Analysis

2019-10-09 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
utions. E.g. "IT:ibm -> > term365". As long as it is done on both indexing and query, they will > still match. You may have to have a bunch of them or write some sort > of lookup map. > > Regards, >Alex. > > On Tue, 8 Oct

Protecting Tokens from Any Analysis

2019-10-08 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, This is likely a rudimentary question, but I can’t seem to find a straight-forward answer on forums or the documentation…is there a way to protect tokens from ANY analysis? I know things like the KeywordMarkerFilterFactory protect tokens from stemming, but we have some terms we don’t e

Re: Re: SolR: How to sort (or boost) by Availability dates

2019-09-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Yay! -- Audrey Lorberfeld Data Scientist, w3 Search IBM audrey.lorberf...@ibm.com On 9/24/19, 10:15 AM, "digi_business" wrote: Hi all, reading your suggestions i've juste come out of the darkness! Just for explaining, my problem is that i want to show all my items (not only

Re: SolR: How to sort (or boost) by Availability dates

2019-09-24 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi Federico, I am not sure exactly what syntax would get you the functionality that you're looking for, but I'd recommend writing a boost function. That's what we're doing right now for boosting more recent results in our search engine. You'd somehow have to work with date math and possibly mak

Re: Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
heck it ignores Keyword marked word) 3) RemoveDuplicatesTokenFilterFactory That may give what you are after without custom coding. Regards, Alex. On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > Toke, >

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
...@ibm.com On 9/3/19, 2:58 PM, "Toke Eskildsen" wrote: Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Do you find that searching over both the original title field and the normalized title > field increases the time it takes for your search engine to retri

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
audrey.lorberf...@ibm.com On 8/31/19, 3:01 PM, "Toke Eskildsen" wrote: Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Just wanting to test the waters here – for those of you with search engines > that index multiple languages, do you use ASCII-folding in your sche

Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
d the like. > > MappingCFF works.. > >> On Aug 30, 2019, at 1:54 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: >> >> Aita, >> >> Thanks for that insight! >> >> As the conversation has pro

Re: Re: Re: Multi-lingual Search & Accent Marks

2019-09-03 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
9, at 1:54 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > > Aita, > > Thanks for that insight! > > As the conversation has progressed, we are now leaning towards not having the ASCII-folding filter in our pipelines in order to keep marks li

Re: Re: Multi-lingual Search & Accent Marks

2019-08-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
rman index, we neutralize accents before index i.e. umlauts to 'ae', 'ue'.. Etc and similar what we do at the query time too for an appropriate match. On Fri, Aug 30, 2019, 4:22 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote: > Hi All,

Multi-lingual Search & Accent Marks

2019-08-30 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, Just wanting to test the waters here – for those of you with search engines that index multiple languages, do you use ASCII-folding in your schema? We are onboarding Spanish documents into our index right now and keep going back and forth on whether we should preserve accent marks. From

Re: Re: Multi-language Spellcheck

2019-08-29 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
away with a generic fieldtype that does not do > anything language specific, but I doubt. > > > Am 29.08.2019 um 16:20 schrieb Audrey Lorberfeld - > audrey.lorberf...@ibm.com : > > > > Hi All, > > > > We are starting up an intern

Multi-language Spellcheck

2019-08-29 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, We are starting up an internal search engine that has to work for many different languages. We are starting with a POC of Spanish and English documents, and we are using the DirectSolrSpellChecker. From reading others' threads online, I know that we have to have multiple spellcheckers