Re: Which Tokenizer to use at searching
Hi Oops my bad. I actually meant While indexing A,B A and B should give result but "A B" should not give result. Also I will look at analyser. Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:38 To: abhishek jain Subject: Re: Which Tokenizer to use at searching Then I don't see the problem. StandardTokenizer (see the "text_general" fieldType) should do all this for you automatically. Did you look at the analysis page? I really recommend it. Best, Erick On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain wrote: > Hi Erick, > Thanks for replying, > > I want to index A,B (with or without space with comma) as separate words and > also want to return results when A and B searched individually and also > "A,B" . > > Please let me know your views. > Let me know if i still havent explained correctly. I will try again. > > Thanks > abhishek > > > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson > wrote: >> >> You've contradicted yourself, so it's hard to say. Or >> I'm mis-reading your messages. >> >> bq: During indexing i want to token on all punctuations, so i can use >> StandardTokenizer, but at search time i want to consider punctuations as >> part of text, >> >> and in your second message: >> >> bq: when i search for "A,B" it should return result. [for input "A,B"] >> >> If, indeed, you "... at search time i want to consider punctuations as >> part of text" then "A,B" should NOT match the document. >> >> The admin/analysis page is your friend, I strongly suggest you spend >> some time looking at the various transformations performed by >> the various analyzers and tokenizers. >> >> Best, >> Erick >> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain >> wrote: >> > hi, >> > >> > Thanks for replying promptly, >> > an example: >> > >> > I want to index for A,B >> > but when i search A AND B, it should return result, >> > when i search for "A,B" it should return result. >> > >> > Also Ideally when i search for "A , B" (with space) it should return >> > result. >> > >> > >> > please advice >> > thanks >> > abhishek >> > >> > >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI >> > wrote: >> > >> >> Hi; >> >> >> >> Firstly you have to keep in mind that if you don't index punctuation >> >> they >> >> will not be visible for search. On the other hand you can have >> >> different >> >> analyzer for index and search. You have to give more detail about your >> >> situation. What will be your tokenizer at search time, >> >> WhiteSpaceTokenizer? >> >> You can have a look at here: >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters >> >> >> >> If you can give some examples what you want for indexing and searching >> >> I >> >> can help you to combine index and search analyzer/tokenizer/token >> >> filters. >> >> >> >> Thanks; >> >> Furkan KAMACI >> >> >> >> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain : >> >> >> >> > Hi Friends, >> >> > >> >> > I am concerned on Tokenizer, my scenario is: >> >> > >> >> > During indexing i want to token on all punctuations, so i can use >> >> > StandardTokenizer, but at search time i want to consider punctuations >> >> > as >> >> > part of text, >> >> > >> >> > I dont store contents but only indexes. >> >> > >> >> > What should i use. >> >> > >> >> > Any advices ? >> >> > >> >> > >> >> > -- >> >> > Thanks and kind Regards, >> >> > Abhishek jain >> >> > >> >> >> > >> > >> > >> > -- >> > Thanks and kind Regards, >> > Abhishek jain >> > +91 9971376767 > > > > > -- > Thanks and kind Regards, > Abhishek jain > +91 9971376767
Re: Which Tokenizer to use at searching
Hi, I meant that while searching A AND B should return result individually and when together with a AND. I want "A B" should not give result. Though A,B is indexed with StandardTokenizer. Thanks Abhishek Original Message From: Furkan KAMACI Sent: Monday, 10 March 2014 06:11 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: Which Tokenizer to use at searching Hi; What do you mean at here: "While indexing A,B A and B should give result " Thanks; Furkan KAMACI 2014-03-09 22:36 GMT+02:00 : > Hi > Oops my bad. I actually meant > While indexing A,B > A and B should give result but > "A B" should not give result. > > Also I will look at analyser. > > Thanks > Abhishek > > Original Message > From: Erick Erickson > Sent: Monday, 10 March 2014 01:38 > To: abhishek jain > Subject: Re: Which Tokenizer to use at searching > > Then I don't see the problem. StandardTokenizer > (see the "text_general" fieldType) should do all this > for you automatically. > > Did you look at the analysis page? I really recommend it. > > Best, > Erick > > On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain > wrote: > > Hi Erick, > > Thanks for replying, > > > > I want to index A,B (with or without space with comma) as separate words > and > > also want to return results when A and B searched individually and also > > "A,B" . > > > > Please let me know your views. > > Let me know if i still havent explained correctly. I will try again. > > > > Thanks > > abhishek > > > > > > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson > > > wrote: > >> > >> You've contradicted yourself, so it's hard to say. Or > >> I'm mis-reading your messages. > >> > >> bq: During indexing i want to token on all punctuations, so i can use > >> StandardTokenizer, but at search time i want to consider punctuations as > >> part of text, > >> > >> and in your second message: > >> > >> bq: when i search for "A,B" it should return result. [for input "A,B"] > >> > >> If, indeed, you "... at search time i want to consider punctuations as > >> part of text" then "A,B" should NOT match the document. > >> > >> The admin/analysis page is your friend, I strongly suggest you spend > >> some time looking at the various transformations performed by > >> the various analyzers and tokenizers. > >> > >> Best, > >> Erick > >> > >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain > >> wrote: > >> > hi, > >> > > >> > Thanks for replying promptly, > >> > an example: > >> > > >> > I want to index for A,B > >> > but when i search A AND B, it should return result, > >> > when i search for "A,B" it should return result. > >> > > >> > Also Ideally when i search for "A , B" (with space) it should return > >> > result. > >> > > >> > > >> > please advice > >> > thanks > >> > abhishek > >> > > >> > > >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI > >> > wrote: > >> > > >> >> Hi; > >> >> > >> >> Firstly you have to keep in mind that if you don't index punctuation > >> >> they > >> >> will not be visible for search. On the other hand you can have > >> >> different > >> >> analyzer for index and search. You have to give more detail about > your > >> >> situation. What will be your tokenizer at search time, > >> >> WhiteSpaceTokenizer? > >> >> You can have a look at here: > >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > >> >> > >> >> If you can give some examples what you want for indexing and > searching > >> >> I > >> >> can help you to combine index and search analyzer/tokenizer/token > >> >> filters. > >> >> > >> >> Thanks; > >> >> Furkan KAMACI > >> >> > >> >> > >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain >: > >> >> > >> >> > Hi Friends, > >> >> > > >> >> > I am concerned on Tokenizer, my scenario is: > >> >> > > >> >> > During indexing i want to token on all punctuations, so i can use > >> >> > StandardTokenizer, but at search time i want to consider > punctuations > >> >> > as > >> >> > part of text, > >> >> > > >> >> > I dont store contents but only indexes. > >> >> > > >> >> > What should i use. > >> >> > > >> >> > Any advices ? > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks and kind Regards, > >> >> > Abhishek jain > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks and kind Regards, > >> > Abhishek jain > >> > +91 9971376767 > > > > > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > +91 9971376767 >
Re: Optimizing RAM
Hi, If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size? Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:37 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Optimizing RAM I'd go for a copyField, keep the stemmed and unstemmed version in the same index. An alternative (and I think there's a JIRA for this if not an outright patch) is implement a "special" filter that, say, puts the original tken in with a special character, say $ at the end, i.e. if indexing "running", you'd index both "running$" and "run". Then when you want exact match, you search for "running$". Best, Erick On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain wrote: > hi friends, > I want to index some good amount of data, i want to keep both stemmed and > unstemmed versions , > I am confused should i keep two separate indexes or keep one index with two > versions or column , i mean col1_stemmed and col2_unstemmed. > > I have multicore with multi shard configuration. > My server have 32 GB RAM and stemmed index size (without content) i > calculated as 60 GB . > I want to not put too much load and I/O load on a decent server with some 5 > other replicated servers and want to use servers for other purposes also. > > > Also is it advised to server queries from master server or only from slaves? > -- > Thanks, > Abhishek
Re: Optimizing RAM
Hi all, What should be the ideal RAM index size ratio. please reply I expect index to be of size of 60 gb and I dont store contents. Thanks Abhishek Original Message From: abhishek.netj...@gmail.com Sent: Monday, 10 March 2014 09:25 To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: Optimizing RAM Hi, If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size? Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:37 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Optimizing RAM I'd go for a copyField, keep the stemmed and unstemmed version in the same index. An alternative (and I think there's a JIRA for this if not an outright patch) is implement a "special" filter that, say, puts the original tken in with a special character, say $ at the end, i.e. if indexing "running", you'd index both "running$" and "run". Then when you want exact match, you search for "running$". Best, Erick On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain wrote: > hi friends, > I want to index some good amount of data, i want to keep both stemmed and > unstemmed versions , > I am confused should i keep two separate indexes or keep one index with two > versions or column , i mean col1_stemmed and col2_unstemmed. > > I have multicore with multi shard configuration. > My server have 32 GB RAM and stemmed index size (without content) i > calculated as 60 GB . > I want to not put too much load and I/O load on a decent server with some 5 > other replicated servers and want to use servers for other purposes also. > > > Also is it advised to server queries from master server or only from slaves? > -- > Thanks, > Abhishek
Re: Strange behavior while deleting
Hi, These settings are commented in schema. These are two different solr severs and almost identical schema with the exception of one stemmed field. Same solr versions are running. Please help. Thanks Abhishek Original Message From: Jack Krupansky Sent: Monday, 31 March 2014 14:54 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Strange behavior while deleting Do the two cores have identical schema and solrconfig files? Are the delete and merge config settings the sameidentical? Are these two cores running on the same Solr server, or two separate Solr servers? If the latter, are they both running the same release of Solr? How big is the discrepancy - just a few, dozens, 10%, 50%? -- Jack Krupansky -Original Message- From: abhishek jain Sent: Monday, March 31, 2014 3:26 AM To: solr-user@lucene.apache.org Subject: Strange behavior while deleting hi friends, I have observed a strange behavior, I have two indexes of same ids and same number of docs, and i am using a json file to delete records from both the indexes, after deleting the ids, the resulting indexes now show different count of docs, Not sure why I used curl with the same json file to delete from both the indexes. Please advise asap, thanks -- Thanks and kind Regards, Abhishek