Yes sure, thanks for your advice.
I'm still waiting for my server to come before I can scale up my system and
do the testing. Now the Solr running on my 4GB RAM system will crash if I
try to scale up my system as there's not enough memory to support it.
Regards,
Edwin
On 11 May 2015 at 19:11, A
Yes sure, thanks for your advice.
I'm still waiting for my server to come before I can scale up my system and
do the testing. Now the Solr running on my 4GB RAM system will crash if I
try to scale up my system as there's not enough memory to support it.
Regards,
Edwin
On 11 May 2015 at 19:11, A
2015-05-11 4:44 GMT+01:00 Zheng Lin Edwin Yeo :
> I've managed to run the synonyms with 10 different synonyms file. Each of
> the synonym file size is 1MB, which consist of about 1000 tokens, and each
> token has about 40-50 words. These lists of files are more extreme, which I
> probably won't us
I've managed to run the synonyms with 10 different synonyms file. Each of
the synonym file size is 1MB, which consist of about 1000 tokens, and each
token has about 40-50 words. These lists of files are more extreme, which I
probably won't use for the real environment, except now for the testing
pu
Thank you for your suggestions.
I can't do a proper testing on that yet as I'm currently using a 4GB RAM
normal PC machine, and all these probably requires more RAM that what I
have.
I've tried running the setup with 20 synonyms file, and the system went Out
of Memory before I could test anything.
This is a quite big Sinonym corpus !
If it's not feasible to have only 1 big synonym file ( I haven't checked,
so I assume the 1 Mb limit is true, even if strange)
I would do an experiment :
1) testing query time with a Solr Classic config
2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way
So it means like having more than 10 or 20 synonym files locally will still
be faster than accessing external service?
As I found out that zookeeper only allows the synonym.txt file to be a
maximum of 1MB, and as my potential synonym file is more than 20MB, I'll
need to split the file to more than
Accessing an external service ( such a thesaurus website) per each query,
can slow down your system a lot.
Having the synonyms locally, with the Solr integration is much better.
Cheers
2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo :
> The document seems to point to using AutoPhrasingTokenFilter
The document seems to point to using AutoPhrasingTokenFilter, putting an
underscore to the multi-term or changing to index time synonyms.
I'm also thinking of putting the synonyms onto a database or query some
thesaurus website when the using enter the search key, instead of using the
SynonymFilte
I found this very interesting article that I think can help in better
understanding the problem :
http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
And this :
http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so
Thanks for explaining the information.
Currently I'm only using the comma-separated list of words and only using
the synonym filter at query time. I find that when I set expend = true,
there's quite a number of irrelevant results that came back, and this
didn't happen when I set expend = false.
I
Let's explain little bit better here :
First of all, the SynonimFilter is a Token Filter, and being a Token Filter
it can be part of an Analysis pipeline at Indexing and Query Time.
As the different type of analysis explicitly explains when the filtering
happens, let's go to the details of the syn
Just an update, the tokenizer class which I'm using is
StandardTokenizerFactory, and I'm using Solr 5.0.
On 8 May 2015 16:24, "Zheng Lin Edwin Yeo" wrote:
> Hi,
>
> Will like to check, for the SynonymFilterFactory, I have the following in
> my synonyms.txt:
>
> Titanium Dioxides, titanium oxide,
Hi,
Will like to check, for the SynonymFilterFactory, I have the following in
my synonyms.txt:
Titanium Dioxides, titanium oxide, pigment
pigment, colour, colouring material
If I set expend=false, and I search for q=pigment, I will get results that
matches pigment, Titanium Dioxides and titanium
14 matches
Mail list logo