Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek . netjain
‎Hi
Oops my bad. I actually meant
While indexing A,B 
A and B should ‎give result but 
"A B" should not give result.

Also I will look at analyser.

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:38
To: abhishek jain
Subject: Re: Which Tokenizer to use at searching

Then I don't see the problem. StandardTokenizer
(see the "text_general" fieldType) should do all this
for you automatically.

Did you look at the analysis page? I really recommend it.

Best,
Erick

On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
 wrote:
> Hi Erick,
> Thanks for replying,
>
> I want to index A,B (with or without space with comma) as separate words and
> also want to return results when A and B searched individually and also
> "A,B" .
>
> Please let me know your views.
> Let me know if i still havent explained correctly. I will try again.
>
> Thanks
> abhishek
>
>
> On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson 
> wrote:
>>
>> You've contradicted yourself, so it's hard to say. Or
>> I'm mis-reading your messages.
>>
>> bq: During indexing i want to token on all punctuations, so i can use
>> StandardTokenizer, but at search time i want to consider punctuations as
>> part of text,
>>
>> and in your second message:
>>
>> bq: when i search for "A,B" it should return result. [for input "A,B"]
>>
>> If, indeed, you "... at search time i want to consider punctuations as
>> part of text" then "A,B" should NOT match the document.
>>
>> The admin/analysis page is your friend, I strongly suggest you spend
>> some time looking at the various transformations performed by
>> the various analyzers and tokenizers.
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
>>  wrote:
>> > hi,
>> >
>> > Thanks for replying promptly,
>> > an example:
>> >
>> > I want to index for A,B
>> > but when i search A AND B, it should return result,
>> > when i search for "A,B" it should return result.
>> >
>> > Also Ideally when i search for "A , B" (with space) it should return
>> > result.
>> >
>> >
>> > please advice
>> > thanks
>> > abhishek
>> >
>> >
>> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
>> > wrote:
>> >
>> >> Hi;
>> >>
>> >> Firstly you have to keep in mind that if you don't index punctuation
>> >> they
>> >> will not be visible for search. On the other hand you can have
>> >> different
>> >> analyzer for index and search. You have to give more detail about your
>> >> situation. What will be your tokenizer at search time,
>> >> WhiteSpaceTokenizer?
>> >> You can have a look at here:
>> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> >>
>> >> If you can give some examples what you want for indexing and searching
>> >> I
>> >> can help you to combine index and search analyzer/tokenizer/token
>> >> filters.
>> >>
>> >> Thanks;
>> >> Furkan KAMACI
>> >>
>> >>
>> >> 2014-03-09 18:06 GMT+02:00 abhishek jain :
>> >>
>> >> > Hi Friends,
>> >> >
>> >> > I am concerned on Tokenizer, my scenario is:
>> >> >
>> >> > During indexing i want to token on all punctuations, so i can use
>> >> > StandardTokenizer, but at search time i want to consider punctuations
>> >> > as
>> >> > part of text,
>> >> >
>> >> > I dont store contents but only indexes.
>> >> >
>> >> > What should i use.
>> >> >
>> >> > Any advices ?
>> >> >
>> >> >
>> >> > --
>> >> > Thanks and kind Regards,
>> >> > Abhishek jain
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and kind Regards,
>> > Abhishek jain
>> > +91 9971376767
>
>
>
>
> --
> Thanks and kind Regards,
> Abhishek jain
> +91 9971376767


Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek . netjain

Hi,
I meant that while searching A AND B should return result individually and when 
together with a AND. 

I want "A B" should not give result. Though A,B is indexed with 
StandardTokenizer. 

Thanks 
Abhishek
  Original Message  
From: Furkan KAMACI
Sent: Monday, 10 March 2014 06:11
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Which Tokenizer to use at searching

Hi;

What do you mean at here:

"While indexing A,B
A and B should give result "

Thanks;
Furkan KAMACI


2014-03-09 22:36 GMT+02:00 :

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
> Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
>  wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson  >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >>  wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain  >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>


Re: Optimizing RAM

2014-03-09 Thread abhishek . netjain
Hi,
If I go with copy field than will it increase I/O load considering I have RAM 
less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
 wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek


Re: Optimizing RAM

2014-03-11 Thread abhishek . netjain
Hi all,
What should be the ideal RAM index size ratio.

please reply I expect index to be of size of 60 gb and I dont store contents. 
Thanks 
Abhishek

  Original Message  
From: abhishek.netj...@gmail.com
Sent: Monday, 10 March 2014 09:25
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Optimizing RAM

Hi,
If I go with copy field than will it increase I/O load considering I have RAM 
less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
 wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek


Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain
Hi,
These settings are commented in schema. These are two different solr severs and 
almost identical schema ‎with the exception of one stemmed field.

Same solr versions are running.
Please help.

Thanks 
Abhishek

  Original Message  
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain
Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek