Hi,
Alright, after trying and trying, I have managed to isolate the fields that are
causing the search to fail.
Now, all the fields are "" are
breaking up my search.
I changed the id-StrField to
And finally now it
Hi,
> Have you tried reindexing the documents and compare the results? No issues
> if you cannot do that - let's try something else. I was going through the
> whole mail and your files. You had said:
Yes, but since it hasn't worked as suggested, I kept as you suggested.
> As soon as I add dbId or
Hi Guilherme,
Have you tried reindexing the documents and compare the results? No issues
if you cannot do that - let's try something else. I was going through the
whole mail and your files. You had said:
As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I
> don't get anythi
Hi Paras
No worries.
No I didn’t find anything. This is annoying now...
Yes! They do contain dbId. Absolutely all my docs contains dbId and it is
actually my key, if you check again the schema.xml
Cheers
Guilherme
> On 15 Nov 2019, at 05:37, Paras Lehana wrote:
>
>
> Hey Guilherme,
>
> I w
Hey Guilherme,
I was a bit busy for the past few days and couldn't read your mail. So, did
you find anything? Anyways, as I had expected, the culprit is definitely
among the qfs. Do the documents in concern contain dbId? I suggest you to
cross check the fields in your document with those impacting
What I can't understand is:
I search for the exact term - "Immunoregulatory interactions between a Lymphoid
and a non-Lymphoid cell" and If i search "I search for the exact term -
Immunoregulatory interactions between a Lymphoid and non-Lymphoid cell" then it
works
> On 11 Nov 2019, at 12:24,
Thanks
> Removing stopwords is another story. I'm curious to find the reason
> assuming that you keep on using stopwords. In some cases, stopwords are
> really necessary.
Yes. It always make sense the way we've been using.
> If q.alt is giving you responses, it's confirmed that your stopwords filt
Hi
So I don't think removing it completely is the way to go from the scenario
> we have
Removing stopwords is another story. I'm curious to find the reason
assuming that you keep on using stopwords. In some cases, stopwords are
really necessary.
Quite a considerable increase
If q.alt is givi
I use 3 word shingles with stopwords for my MLT ML trainer that worked
pretty well for such a solution, but for a full index the size became
prohibitive
On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood
wrote:
> If we had IDF for phrases, they would be super effective. The 2X weight is
> a hack t
If we had IDF for phrases, they would be super effective. The 2X weight is a
hack that mostly works.
Infoseek had phrase IDF and it was a killer algorithm for relevance.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 8, 2019, at 11:08 AM, David
the pf and qf fields are REALLY nice for this
On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood
wrote:
> I always enable phrase searching in edismax for exactly this reason.
>
> Something like:
>
>title^16 keywords^8 text^2
>
> To deal with concepts in queries, a classifier and/or named e
I always enable phrase searching in edismax for exactly this reason.
Something like:
title^16 keywords^8 text^2
To deal with concepts in queries, a classifier and/or named entity extractor
can be helpful. If you have a list of concepts (“controlled vocabulary”) that
includes “Lamin A”,
But when you change it to AND, a single misspelling means zero results. That is
usually not helpful.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 8, 2019, at 10:43 AM, David Hastings
> wrote:
>
> is your default operator OR?
> change it to
Look at the “mm” parameter, try setting it to 100%. Although that’t not
entirely likely to do what you want either since virtually every doc will have
“a” in it. But at least you’d get docs that have both terms.
you may also be able to search for things like “Lamin A” _only as a phrase_ and
hav
OR
OR
explicit
edismax
*:*
name
...
> On 8 Nov 2019, at 16:43, David Hastings wrote:
>
> is your default operator OR?
> change it to AND
>
>
> On Fri, Nov 8, 2019 at
is your default operator OR?
change it to AND
On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri wrote:
> HI Walter and Paras
>
> I indexed it removing all the references to StopWordFilter and I went from
> 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid
> cell" is match
HI Walter and Paras
I indexed it removing all the references to StopWordFilter and I went from 121
results to near 20K as the search term q="Lymphoid and a non-Lymphoid cell" is
matching entities such as "IFT A" or "Lamin A". So I don't think removing it
completely is the way to go from the sc
Hi Guilherme
By accident, I ended up querying the using the default handler (/select)
> and it worked.
You've just found the culprit. Thanks for giving the material I requested.
Your analysis chain is working as expected. I don't see any issue in either
StopWordFilter or your boosts. I also use
I normally use a weight of 8 for the most important field, like title. Other
fields might get a 4 or 2.
I add a “pf” field with the weights doubled, so that phrase matches have a
higher weight.
The weight of 8 comes from experience at Infoseek and Inktomi, two early web
search engines. With di
Hi Wunder,
My indexer takes quite a few hours to be executed I am shortening it to run
faster, but I also need to make sure it gives what we are expecting. This
implementation's been there for >4y, and massively used.
> In your edismax handlers, weights of 20, 50, and 100 are extremely high. I
Ha, funny enough i still use qf/pf boosts starting at 100 and go down,
gives me room to add boosting to more fields but not equal. maybe
excessive but haven't noticed a performance issue
On Thu, Nov 7, 2019 at 9:44 AM Walter Underwood
wrote:
> Thanks for posting the files. Looking at schema.xml
Thanks for posting the files. Looking at schema.xml, I see that you still are
using StopFilterFactory. The first advice we gave you was to remove that.
Remove StopFilterFactory everywhere and reindex.
You will continue to have problems matching stopwords until you do that.
In your edismax handl
Hi Paras, everyone
Thank you again for your inputs and suggestions. I sorry to hear you had
trouble with the attachments I will host it somewhere and share the links.
I don't tweak my index, I get the data from the graph database, create a
document as they are and save to solr.
So, I am sendin
Hi Guilherme.
I am sending they analysis result and the json result as requested.
Thanks for the effort. Luckily, I can see your attachments (low quality
though).
>From the analysis screen, the analysis is working as expected. One of the
reasons for query="lymphoid and *a* non-lymphoid cell" no
I don’t see the attachments, maybe I deleted old e-mails or some such. The
Apache server is fairly aggressive about stripping attachments though, so it’s
also possible they didn’t make it through.
> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri wrote:
>
> Thanks Erick.
>
>> First, your index a
Thanks Erick.
> First, your index and analysis chains are considerably different, this can
> easily be a source of problems. In particular, using two different tokenizers
> is a huge red flag. I _strongly_ recommend against this unless you’re totally
> sure you understand the consequences. Addi
First, your index and analysis chains are considerably different, this can
easily be a source of problems. In particular, using two different tokenizers
is a huge red flag. I _strongly_ recommend against this unless you’re totally
sure you understand the consequences. Additionally, your use of t
Hi Walter,
The solr.StopFilter removes all tokens that are stopwords. Those words will
> not be in the index, so they can never match a query.
I think the OP's concern is different results when adding a stopword. I
think he's using the filter factory correctly - the query chain includes
the filt
No.
The solr.StopFilter removes all tokens that are stopwords. Those words will not
be in the index, so they can never match a query.
1. Remove the lines with solr.StopFilter from every analysis chain in
schema.xml.
2. Reload the collection, restart Solr, or whatever to read the new config.
3.
Ok. I am kind a lost now.
If I open up the console > analysis and perform it, that's the final result.
Your suggestion is: get rid of the in the schema.xml and
during index phase replaceAll("in stopwords.txt"," ") then add to solr. Is that
correct ?
Thanks David
> On 5 Nov 2019, at 14:48, D
no,
is still using stopwords and should be removed, in my opinion of course,
based on your use case may be different, but i generally axe any reference
to them at all
On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri wrote:
> Thanks.
> Haven't I done this here ?
>positionIncre
Thanks.
Haven't I done this here ?
> On 5 Nov 2019, at 14:15, David Hastings wrote:
>
> Fwd to another server
>
> The first thing you should do is remove any reference to stop words and
>
The first thing you should do is remove any reference to stop words and
never use them, then re-index your data and try it again.
On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri wrote:
> Hi,
>
> I am performing a search to match a name (text_field), however this term
> contains 'and' and 'a' and
Hi,
I am performing a search to match a name (text_field), however this term
contains 'and' and 'a' and it doesn't return any records. If i remove 'a' then
it works.
e.g
Search Term: lymphoid and a non-lymphoid cell
doesn't work:
https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymph
34 matches
Mail list logo