Re: Queries on SynonymFilterFactory

2015-05-11 Thread Zheng Lin Edwin Yeo
Yes sure, thanks for your advice. I'm still waiting for my server to come before I can scale up my system and do the testing. Now the Solr running on my 4GB RAM system will crash if I try to scale up my system as there's not enough memory to support it. Regards, Edwin On 11 May 2015 at 19:11, A

Re: Queries on SynonymFilterFactory

2015-05-11 Thread Zheng Lin Edwin Yeo
Yes sure, thanks for your advice. I'm still waiting for my server to come before I can scale up my system and do the testing. Now the Solr running on my 4GB RAM system will crash if I try to scale up my system as there's not enough memory to support it. Regards, Edwin On 11 May 2015 at 19:11, A

Re: Queries on SynonymFilterFactory

2015-05-11 Thread Alessandro Benedetti
2015-05-11 4:44 GMT+01:00 Zheng Lin Edwin Yeo : > I've managed to run the synonyms with 10 different synonyms file. Each of > the synonym file size is 1MB, which consist of about 1000 tokens, and each > token has about 40-50 words. These lists of files are more extreme, which I > probably won't us

Re: Queries on SynonymFilterFactory

2015-05-10 Thread Zheng Lin Edwin Yeo
I've managed to run the synonyms with 10 different synonyms file. Each of the synonym file size is 1MB, which consist of about 1000 tokens, and each token has about 40-50 words. These lists of files are more extreme, which I probably won't use for the real environment, except now for the testing pu

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thank you for your suggestions. I can't do a proper testing on that yet as I'm currently using a 4GB RAM normal PC machine, and all these probably requires more RAM that what I have. I've tried running the setup with 20 synonyms file, and the system went Out of Memory before I could test anything.

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
This is a quite big Sinonym corpus ! If it's not feasible to have only 1 big synonym file ( I haven't checked, so I assume the 1 Mb limit is true, even if strange) I would do an experiment : 1) testing query time with a Solr Classic config 2) Use an Ad Hoc Solr Core to manage Synonyms ( in this way

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
So it means like having more than 10 or 20 synonym files locally will still be faster than accessing external service? As I found out that zookeeper only allows the synonym.txt file to be a maximum of 1MB, and as my potential synonym file is more than 20MB, I'll need to split the file to more than

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Accessing an external service ( such a thesaurus website) per each query, can slow down your system a lot. Having the synonyms locally, with the Solr integration is much better. Cheers 2015-05-08 11:46 GMT+01:00 Zheng Lin Edwin Yeo : > The document seems to point to using AutoPhrasingTokenFilter

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
The document seems to point to using AutoPhrasingTokenFilter, putting an underscore to the multi-term or changing to index time synonyms. I'm also thinking of putting the synonyms onto a database or query some thesaurus website when the using enter the search key, instead of using the SynonymFilte

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
I found this very interesting article that I think can help in better understanding the problem : http://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ And this : http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Thanks for explaining the information. Currently I'm only using the comma-separated list of words and only using the synonym filter at query time. I find that when I set expend = true, there's quite a number of irrelevant results that came back, and this didn't happen when I set expend = false. I

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Alessandro Benedetti
Let's explain little bit better here : First of all, the SynonimFilter is a Token Filter, and being a Token Filter it can be part of an Analysis pipeline at Indexing and Query Time. As the different type of analysis explicitly explains when the filtering happens, let's go to the details of the syn

Re: Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Just an update, the tokenizer class which I'm using is StandardTokenizerFactory, and I'm using Solr 5.0. On 8 May 2015 16:24, "Zheng Lin Edwin Yeo" wrote: > Hi, > > Will like to check, for the SynonymFilterFactory, I have the following in > my synonyms.txt: > > Titanium Dioxides, titanium oxide,

Queries on SynonymFilterFactory

2015-05-08 Thread Zheng Lin Edwin Yeo
Hi, Will like to check, for the SynonymFilterFactory, I have the following in my synonyms.txt: Titanium Dioxides, titanium oxide, pigment pigment, colour, colouring material If I set expend=false, and I search for q=pigment, I will get results that matches pigment, Titanium Dioxides and titanium