Hi All,
We are coming across a strange bug in the Analysis section of the Admin UI. For
our non-English schema components, instead of the Synonym Graph Filter (SGF)
showing in the UI, it's showing something called a "List Based Token Stream"
(LBTS) in its place. We found an old issue that docum
Oh wow, I had no idea this existed. Thank you so much!
Best,
Audrey
On 5/1/20, 12:58 PM, "Markus Jelsma" wrote:
Hello,
Although it is not mentioned in Solr's language analysis page in the
manual, Lucene has had support for Korean for quite a while now.
https://urldefense.proofp
over at snowballstem.org?
On Thu, Apr 30, 2020 at 4:08 PM Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
> I agree with Erick. I think that's just how the cookie crumbles when
> stemming. If you have some time on your hands, you can integrate
> OpenNLP with
Hi All,
My team would like to index Korean, but it looks like Solr OOTB does not have
explicit support for Korean. If any of you have schema pipelines you could
share for your Korean documents, I would love to see them! I'm assuming I would
just use some combination of the OOTB CJK factories..
but it does not fit in my use case. Reason is
while giving as output we have to show each field with its
value, with copy it combines the value but we do not know field and value
relationship.
regards
sam
On Wed, Apr 29, 2020 at 9:53 AM Audrey Lorberfeld -
au
I agree with Erick. I think that's just how the cookie crumbles when stemming.
If you have some time on your hands, you can integrate OpenNLP with your Solr
instance and start using the lemmas of tokens instead of the stems. In this
case, I believe if you were to lemmatize both "identify" and "i
Hi, Sam!
Have you tried creating a copyField?
https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/copying-fields.html
Best,
Audrey
On 4/28/20, 1:07 PM, "sambasivarao giddaluri"
wrote:
Hi All,
Is there a way we can map fields in a single field?
Ex: s
Hi All,
We are adding Japanese to our index, and I would love to know if any of you
have a synonyms file you use for Japanese?
Thank you!
Best,
Audrey Lorberfeld
uot;soap powder" anymore, rather it expands separate synonyms
> for
> "soap" and "powder".
>
>
>
> Best Regards,
> Atin Janki
>
>
> On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
> audrey.lorberf...@ib
;atin janki" wrote:
Using sow=true, does split the word on whitespaces but it will not look for
synonyms of "soap powder" anymore, rather it expands separate synonyms for
"soap" and "powder".
Best Regards,
Atin
Have you set sow=true in your search handler? I know that we have it set to
false (sow = split on whitespace) because we WANT multi-token synonyms retained
as multiple tokens.
On 3/16/20, 10:49 AM, "atin janki" wrote:
Hello everyone,
I am using solr 8.3.
After I include
Hi Manoj,
In the handler, I think you are missing the suggest.dictionary parameter, which
should be set to the name of your suggestion component. In this case, I believe
it would should be set to "titleSuggester."
In this sample URL from the documentation, they have a suggest.dictionary
field,
Hi All,
Would anyone be able to help me debug my suggestion component? Right now, our
config looks like this:
mySuggester
FuzzyLookupFactory
FileDictionaryFactory
./conf/queries_list_with_weights.txt
,
conf
keywords_w3_en
false
We like the idea of the F
r search "bag". Thus, for a single search, S/D can be either 0
or 1 - you're right, it's binary!
Hope this helps. Loved your questions! :)
On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> Paras,
>
>
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.
Hope I'm clear. :)
On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> This article
>
(i.e., the query is related
to the context; 60% of the pairs) and unrelated (40% of the pairs)."
On 2/25/20, 10:25 AM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com"
wrote:
Thank you, Walter & Paras!
So, from the MRR equation, I was under the impression the s
uot;In practice, this is inverted to obtain the reciprocal rank, e.g., if the
> searcher clicks on the 4th result, the reciprocal rank is 0.25. The
average
> of these reciprocal ranks is called the mean reciprocal rank (MRR)."
>
> nDCG may require human intervent
ecomes apparent if you observe
the URL after clicking a suggestion on dir.indiamart.com. However, not
everything would benefit you. Do let me know for any related query or
explanation. Hope this helps. :)
On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - audrey.lorberf...@ibm
Hi all,
How do you all evaluate the success of your query autocomplete (i.e. suggester)
component if you use it?
We cannot use MRR for various reasons (I can go into them if you're
interested), so we're thinking of using nDCG since we already use that for
relevance eval of our system as a who
tandard /select
queries. (this separate suggest collection would also have appropriate
tokenization to match the partial words as the user types, like ngramming)
Erik
> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
ead of adding weights inthe document you can also use LTR
>> with in Solr to rerank on the features.
>>
>> Regards,
>> Lucky Sharma
>>
>> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
>> audrey.lorberf...@ibm.com, wrote:
>>
th
> in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> audrey.lorberf...@ibm.com,
> wrote:
>
> > Erik,
> >
> > Thank you! Yes, that's exactly
0 at 17:02, David Hastings
wrote:
> Not a bad idea at all, however ive never used an external file before,
just
> a field in the index, so not an area im familiar with
>
> On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com
standard /select
queries. (this separate suggest collection would also have appropriate
tokenization to match the partial words as the user types, like ngramming)
Erik
> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
>
Hm, I'm not sure what you mean, but I am pretty new to Solr. Apologies!
On 1/20/20, 12:01 PM, "fiedzia" wrote:
>From my understanding, if you want regional sales manager to be indexed as
both director of sales and area manager, you
>would have to type:
>
>Regional sales ma
l make the
start/reload times pretty slow
On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
> Hi All,
>
> We plan to incorporate a query autocomplete functionality into our search
> engine (like this:
https://ur
Hi All,
We plan to incorporate a query autocomplete functionality into our search
engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html
). And I was wondering if anyone has personal experience with this component
and would like to share? Basically, we are just looking for so
From my understanding, if you want regional sales manager to be indexed as both
director of sales and area manager, you would have to type:
Regional sales manager -> director of sales, area manager
I do not believe you can chain synonyms.
Re: bigrams/trigrams, I was more interested in you want
Hmm what is the reasoning behind adding the bigrams and trigrams manually
like that? Maybe if we knew the end goal, we could figure out a different
strategy. Happy that at least the matching is working now!
On 1/17/20, 10:28 AM, "fiedzia" wrote:
> Doing it the other way (new york cit
If you instead write "new york => new_york, new_york_city" it should work
(https://doc.lucidworks.com/fusion/3.1/Collections/Synonyms-Files.html)
On 1/17/20, 6:29 AM, "fiedzia" wrote:
Having synonyms defined for
new york -> new_york
new york city -> new_york_city
I'd
I would also love to know what filter to use to ignore capitalized acronyms...
which one can do this OOTB?
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 11/6/19, 3:54 AM, "Paras Lehana" wrote:
Hi Community,
In Ref Guide 8.3's *Understanding An
query time against teh query
>
> On Fri, Oct 25, 2019 at 12:11 PM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com wrote:
>
>> So then you do run your POS tagger at query-time, Dave?
>>
>> --
>> Audrey Lorberfeld
>>
5, 2019 at 12:11 PM Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
> So then you do run your POS tagger at query-time, Dave?
>
> --
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>
>
ts in a separate field(s)
> >
> > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld -
> > audrey.lorberf...@ibm.com wrote:
> >
> > > No, I meant for part-of-speech tagging __ But that's interesting that
> you
> > > use Stan
the
> documents in a separate field(s)
>
> On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld -
> audrey.lorberf...@ibm.com wrote:
>
> > No, I meant for part-of-speech tagging __ But that's interesting that
you
> > use StanfordNLP. I
, 2019 at 10:16 AM Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
> Hi All,
>
> Does anyone use a POS tagger with their Solr instance other than
> OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson.
>
> Thanks!
>
> --
&g
Hi All,
Does anyone use a POS tagger with their Solr instance other than OpenNLP’s? We
are considering OpenNLP, SpaCy, and Watson.
Thanks!
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
en^1.5 url^0.5
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 10/22/19, 1:50 PM, "Shawn Heisey" wrote:
On 10/22/2019 11:42 AM, Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote:
> I think you actually can search over all fields, but n
I think you actually can search over all fields, but not in the df parameter.
We have a big list of fields we want to search over. So, we just put a dummy
one in the df param field, and then we use the fl parameter. With the edismax
parser, this works. It looks something like this:
I'm not sure how your config file is setup, but I know that the way we do
multi-token synonyms is to have the sow (split on whitespace) parameter set to
False while using the edismax parser. I'm not sure if this would work with
PhraseQueries , but it might be worth a try!
In our config file we
e anything close to a decent server you wont notice it all. im
at about 21 million documents, index varies between 450gb to 800gb
depending on merges, and about 60k searches a day and stay sub second non
stop, and this is on a single core/non cloud environment
On Wed, Oct 9, 20
would have to
retrieve every single document in our corpus and rank them. That's a high
computational cost, no?
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 10/9/19, 2:31 PM, "Audrey Lorberfeld - audrey.lorberf...@ibm.com"
wrote:
orner cases. Consider a
> search for “to be or not to be” if they’re all stopwords.
>
> Best,
> Erick
>
> > On Oct 9, 2019, at 9:38 AM, Audrey Lorberfeld -
> audrey.lorberf...@ibm.com wrote:
> >
> > Hey Alex,
> >
>
utions. E.g. "IT:ibm ->
> term365". As long as it is done on both indexing and query, they will
> still match. You may have to have a bunch of them or write some sort
> of lookup map.
>
> Regards,
>Alex.
>
> On Tue, 8 Oct
Hi All,
This is likely a rudimentary question, but I can’t seem to find a
straight-forward answer on forums or the documentation…is there a way to
protect tokens from ANY analysis? I know things like the
KeywordMarkerFilterFactory protect tokens from stemming, but we have some terms
we don’t e
Yay!
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 9/24/19, 10:15 AM, "digi_business" wrote:
Hi all, reading your suggestions i've juste come out of the darkness!
Just for explaining, my problem is that i want to show all my items (not
only
Hi Federico,
I am not sure exactly what syntax would get you the functionality that you're
looking for, but I'd recommend writing a boost function. That's what we're
doing right now for boosting more recent results in our search engine. You'd
somehow have to work with date math and possibly mak
heck it ignores Keyword marked word)
3) RemoveDuplicatesTokenFilterFactory
That may give what you are after without custom coding.
Regards,
Alex.
On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
>
> Toke,
>
...@ibm.com
On 9/3/19, 2:58 PM, "Toke Eskildsen" wrote:
Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> Do you find that searching over both the original title field and the
normalized title
> field increases the time it takes for your search engine to retri
audrey.lorberf...@ibm.com
On 8/31/19, 3:01 PM, "Toke Eskildsen" wrote:
Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> Just wanting to test the waters here – for those of you with search
engines
> that index multiple languages, do you use ASCII-folding in your sche
d the like.
>
> MappingCFF works..
>
>> On Aug 30, 2019, at 1:54 PM, Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
>>
>> Aita,
>>
>> Thanks for that insight!
>>
>> As the conversation has pro
9, at 1:54 PM, Audrey Lorberfeld -
audrey.lorberf...@ibm.com wrote:
>
> Aita,
>
> Thanks for that insight!
>
> As the conversation has progressed, we are now leaning towards not having
the ASCII-folding filter in our pipelines in order to keep marks li
rman index, we neutralize accents before index i.e. umlauts to
'ae', 'ue'.. Etc and similar what we do at the query time too for an
appropriate match.
On Fri, Aug 30, 2019, 4:22 PM Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
> Hi All,
Hi All,
Just wanting to test the waters here – for those of you with search engines
that index multiple languages, do you use ASCII-folding in your schema? We are
onboarding Spanish documents into our index right now and keep going back and
forth on whether we should preserve accent marks. From
away with a generic fieldtype that does not do
> anything language specific, but I doubt.
>
> > Am 29.08.2019 um 16:20 schrieb Audrey Lorberfeld -
> audrey.lorberf...@ibm.com :
> >
> > Hi All,
> >
> > We are starting up an intern
Hi All,
We are starting up an internal search engine that has to work for many
different languages. We are starting with a POC of Spanish and English
documents, and we are using the DirectSolrSpellChecker.
From reading others' threads online, I know that we have to have multiple
spellcheckers
56 matches
Mail list logo