from:"abhishek"

Nutch and Solr search on the fly

2011-02-08 Thread .: Abhishek :.

Hi all,

 I am a newbie to nutch and solr. Well relatively much newer to Solr than
Nutch :)

 I have been using nutch for past two weeks, and I wanted to know if I can
query or search on my nutch crawls on the fly(before it completes). I am
asking this because the websites I am crawling are really huge and it takes
around 3-4 days for a crawl to complete. I want to analyze some quick
results while the nutch crawler is still crawling the URLs. Some one
suggested me that Solr would make it possible.

 I followed the steps in
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By
this process, I see only the injected URLs are shown in the Solr search. I
know I did something really foolish and the crawl never happened, I feel I
am missing some information here. I think somewhere in the process there
should be a crawling happening and I missed it out.

 Just wanted to see if some one could help me pointing this out and where I
went wrong in the process. Forgive my foolishness and thanks for your
patience.

Cheers,
Abi

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.

Hi Markus,

 I am sorry for not being clear, I meant to say that...

 Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in
turn contain links to a.html, b.html, c.html, d.html) is injected into the
seed.txt, after the whole process I was expecting a bunch of other pages
which crawled from this seed url. However, at the end of it all I see is the
contents from only this page namely
www.somehost.com/gifts/greetingcard.htmland I do not see any other
pages(here a.html, b.html, c.html, d.html)
crawled from this one.

 The crawling happens only for the URLs mentioned in the seed.txt and does
not proceed further from there. So I am just bit confused. Why is it not
crawling the linked pages(a.html, b.html, c.html and d.html). I get a
feeling that I am missing something that the author of the blog(
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed
everyone would know.

Thanks,
Abi

On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma wrote:

> The parsed data is only sent to the Solr index of you tell a segment to be
> indexed; solrindex   
>
> If you did this only once after injecting  and then the consequent
> fetch,parse,update,index sequence then you, of course, only see those
> URL's.
> If you don't index a segment after it's being parsed, you need to do it
> later
> on.
>
> On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote:
> > Hi all,
> >
> >  I am a newbie to nutch and solr. Well relatively much newer to Solr than
> > Nutch :)
> >
> >  I have been using nutch for past two weeks, and I wanted to know if I
> can
> > query or search on my nutch crawls on the fly(before it completes). I am
> > asking this because the websites I am crawling are really huge and it
> takes
> > around 3-4 days for a crawl to complete. I want to analyze some quick
> > results while the nutch crawler is still crawling the URLs. Some one
> > suggested me that Solr would make it possible.
> >
> >  I followed the steps in
> > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By
> > this process, I see only the injected URLs are shown in the Solr search.
> I
> > know I did something really foolish and the crawl never happened, I feel
> I
> > am missing some information here. I think somewhere in the process there
> > should be a crawling happening and I missed it out.
> >
> >  Just wanted to see if some one could help me pointing this out and where
> I
> > went wrong in the process. Forgive my foolishness and thanks for your
> > patience.
> >
> > Cheers,
> > Abi
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.

Hi Erick,

 Thanks a bunch for the response

 Could be a chance..but all I am wondering is where to specify the depth in
the whole entire process in the URL
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried
specifying it during the fetcher phase but it was just ignored :(

Thanks,
Abi

On Wed, Feb 9, 2011 at 10:11 PM, Erick Erickson wrote:

> WARNING: I don't do Nutch much, but could it be that your
> crawl depth is 1? See:
> http://wiki.apache.org/nutch/NutchTutorial
>
> <http://wiki.apache.org/nutch/NutchTutorial>and search for "depth"
> Best
> Erick
>
> On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :.  wrote:
>
> > Hi Markus,
> >
> >  I am sorry for not being clear, I meant to say that...
> >
> >  Suppose if a url namely 
> > www.somehost.com/gifts/greetingcard.html(which<http://www.somehost.com/gifts/greetingcard.html%28which>in
> > turn contain links to a.html, b.html, c.html, d.html) is injected into
> the
> > seed.txt, after the whole process I was expecting a bunch of other pages
> > which crawled from this seed url. However, at the end of it all I see is
> > the
> > contents from only this page namely
> > www.somehost.com/gifts/greetingcard.htmland I do not see any other
> > pages(here a.html, b.html, c.html, d.html)
> > crawled from this one.
> >
> >  The crawling happens only for the URLs mentioned in the seed.txt and
> does
> > not proceed further from there. So I am just bit confused. Why is it not
> > crawling the linked pages(a.html, b.html, c.html and d.html). I get a
> > feeling that I am missing something that the author of the blog(
> > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed
> > everyone would know.
> >
> > Thanks,
> > Abi
> >
> >
> > On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma <
> markus.jel...@openindex.io
> > >wrote:
> >
> > > The parsed data is only sent to the Solr index of you tell a segment to
> > be
> > > indexed; solrindex   
> > >
> > > If you did this only once after injecting  and then the consequent
> > > fetch,parse,update,index sequence then you, of course, only see those
> > > URL's.
> > > If you don't index a segment after it's being parsed, you need to do it
> > > later
> > > on.
> > >
> > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote:
> > > > Hi all,
> > > >
> > > >  I am a newbie to nutch and solr. Well relatively much newer to Solr
> > than
> > > > Nutch :)
> > > >
> > > >  I have been using nutch for past two weeks, and I wanted to know if
> I
> > > can
> > > > query or search on my nutch crawls on the fly(before it completes). I
> > am
> > > > asking this because the websites I am crawling are really huge and it
> > > takes
> > > > around 3-4 days for a crawl to complete. I want to analyze some quick
> > > > results while the nutch crawler is still crawling the URLs. Some one
> > > > suggested me that Solr would make it possible.
> > > >
> > > >  I followed the steps in
> > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for
> this.
> > By
> > > > this process, I see only the injected URLs are shown in the Solr
> > search.
> > > I
> > > > know I did something really foolish and the crawl never happened, I
> > feel
> > > I
> > > > am missing some information here. I think somewhere in the process
> > there
> > > > should be a crawling happening and I missed it out.
> > > >
> > > >  Just wanted to see if some one could help me pointing this out and
> > where
> > > I
> > > > went wrong in the process. Forgive my foolishness and thanks for your
> > > > patience.
> > > >
> > > > Cheers,
> > > > Abi
> > >
> > > --
> > > Markus Jelsma - CTO - Openindex
> > > http://www.linkedin.com/in/markus17
> > > 050-8536620 / 06-50258350
> > >
> >
>

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.

Hi Charan,

 Thanks for the clarifications.

 The link I have been referring to(
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say
anything about using the crawl? Do I have to do it after the  last step
mentioned?

Thanks,
Abi

On Thu, Feb 10, 2011 at 12:58 AM, charan kumar wrote:

> Hi Abishek,
>
> depth is a param of crawl command, not fetch command
>
> If you are using custom script calling individual stages of nutch crawl,
> then depth N means , you running that script for N times.. You can put a
> loop, in the script.
>
> Thanks,
> Charan
>
> On Wed, Feb 9, 2011 at 6:26 AM, .: Abhishek :.  wrote:
>
> > Hi Erick,
> >
> >  Thanks a bunch for the response
> >
> >  Could be a chance..but all I am wondering is where to specify the depth
> in
> > the whole entire process in the URL
> > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried
> > specifying it during the fetcher phase but it was just ignored :(
> >
> > Thanks,
> > Abi
> >
> > On Wed, Feb 9, 2011 at 10:11 PM, Erick Erickson  > >wrote:
> >
> > > WARNING: I don't do Nutch much, but could it be that your
> > > crawl depth is 1? See:
> > > http://wiki.apache.org/nutch/NutchTutorial
> > >
> > > <http://wiki.apache.org/nutch/NutchTutorial>and search for "depth"
> > > Best
> > > Erick
> > >
> > > On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. 
> > wrote:
> > >
> > > > Hi Markus,
> > > >
> > > >  I am sorry for not being clear, I meant to say that...
> > > >
> > > >  Suppose if a url namely
> > www.somehost.com/gifts/greetingcard.html(which<http://www.somehost.com/gifts/greetingcard.html%28which>
> <http://www.somehost.com/gifts/greetingcard.html%28which>
> > <http://www.somehost.com/gifts/greetingcard.html%28which>in
> > > > turn contain links to a.html, b.html, c.html, d.html) is injected
> into
> > > the
> > > > seed.txt, after the whole process I was expecting a bunch of other
> > pages
> > > > which crawled from this seed url. However, at the end of it all I see
> > is
> > > > the
> > > > contents from only this page namely
> > > > www.somehost.com/gifts/greetingcard.htmland I do not see any other
> > > > pages(here a.html, b.html, c.html, d.html)
> > > > crawled from this one.
> > > >
> > > >  The crawling happens only for the URLs mentioned in the seed.txt and
> > > does
> > > > not proceed further from there. So I am just bit confused. Why is it
> > not
> > > > crawling the linked pages(a.html, b.html, c.html and d.html). I get a
> > > > feeling that I am missing something that the author of the blog(
> > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed
> > > > everyone would know.
> > > >
> > > > Thanks,
> > > > Abi
> > > >
> > > >
> > > > On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma <
> > > markus.jel...@openindex.io
> > > > >wrote:
> > > >
> > > > > The parsed data is only sent to the Solr index of you tell a
> segment
> > to
> > > > be
> > > > > indexed; solrindex   
> > > > >
> > > > > If you did this only once after injecting  and then the consequent
> > > > > fetch,parse,update,index sequence then you, of course, only see
> those
> > > > > URL's.
> > > > > If you don't index a segment after it's being parsed, you need to
> do
> > it
> > > > > later
> > > > > on.
> > > > >
> > > > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote:
> > > > > > Hi all,
> > > > > >
> > > > > >  I am a newbie to nutch and solr. Well relatively much newer to
> > Solr
> > > > than
> > > > > > Nutch :)
> > > > > >
> > > > > >  I have been using nutch for past two weeks, and I wanted to know
> > if
> > > I
> > > > > can
> > > > > > query or search on my nutch crawls on the fly(before it
> completes).
> > I
> > > > am
> > > > > > asking this because the websites I am crawling are really huge
> and
> > it
> > > > > takes
> > > > > > around 3-4 days for a crawl to complete. I want to analyze some
> > quick
> > > > > > results while the nutch crawler is still crawling the URLs. Some
> > one
> > > > > > suggested me that Solr would make it possible.
> > > > > >
> > > > > >  I followed the steps in
> > > > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for
> > > this.
> > > > By
> > > > > > this process, I see only the injected URLs are shown in the Solr
> > > > search.
> > > > > I
> > > > > > know I did something really foolish and the crawl never happened,
> I
> > > > feel
> > > > > I
> > > > > > am missing some information here. I think somewhere in the
> process
> > > > there
> > > > > > should be a crawling happening and I missed it out.
> > > > > >
> > > > > >  Just wanted to see if some one could help me pointing this out
> and
> > > > where
> > > > > I
> > > > > > went wrong in the process. Forgive my foolishness and thanks for
> > your
> > > > > > patience.
> > > > > >
> > > > > > Cheers,
> > > > > > Abi
> > > > >
> > > > > --
> > > > > Markus Jelsma - CTO - Openindex
> > > > > http://www.linkedin.com/in/markus17
> > > > > 050-8536620 / 06-50258350
> > > > >
> > > >
> > >
> >
>

Re: Custom cache for Solr Cloud mode

2019-06-06 Thread abhishek



Thanks for the response.

Eric, 
Are you suggesting to download this file from zookeeper, and upload it after
changing ? 

Mikhail,
Thanks. I will try solrCore.SolrConfg.userCacheConfigs option.
Any idea why, CoreContainer->getCores() would be returning empty list for me
?

(CoreAdminRequest.setAction(CoreAdminAction.STATUS);
CoreAdminRequest.process(solrClient); -> gives me list of cores correctly)

-Abhishek




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

new data structure for some fields

2015-12-21 Thread Abhishek Mishra

Hello all

i am facing some kind of requirement that where for an id p1 is  associated
with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We need
to sort the query of solr on the basis of b1/b2/b3/b4 depending on given
category_id . Right now we mapped the category_ids into multi-valued
attribute. [c1,c2,c3,c4] something like this. we are querying into it. But
from now we also need to find which integer b1,b2,b3.. associated with
given category and also sort the whole query on it.


sorry for any typos..

Regards
Abhishek

Re: new data structure for some fields

2015-12-21 Thread Abhishek Mishra

hi binoy
thanks for reply. I mean by sort is to sort the data-sets on the basis of
integers values given for that category.
For any document let say for an id P1,
category associated is c1,c2,c3,c4 (using multivalued field)
For new implementation
similarly a number is associated with each category. let say
c1---b1,c2---b2,c3---b3,c4---b4.
now when we querying into solr for the ids which have c1 in their
categories. (q=category_id:c1) now i want the result of this query sorted
on the basis of number(b) associated with it throughout the result..

number of association is usually less than 20 (means an id can't be mapped
more than 20 category_ids)

On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal  wrote:

> When you say sort, do you mean search on the basis of category and
> integers? Or score the docs based on their category and integer values?
>
> Also, for any given document, how many categories or integers are
> associated with it?
>
> On Mon, 21 Dec 2015, 14:43 Abhishek Mishra  wrote:
>
> > Hello all
> >
> > i am facing some kind of requirement that where for an id p1 is
> associated
> > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We
> need
> > to sort the query of solr on the basis of b1/b2/b3/b4 depending on given
> > category_id . Right now we mapped the category_ids into multi-valued
> > attribute. [c1,c2,c3,c4] something like this. we are querying into it.
> But
> > from now we also need to find which integer b1,b2,b3.. associated with
> > given category and also sort the whole query on it.
> >
> >
> > sorry for any typos..
> >
> > Regards
> > Abhishek
> >
> --
> Regards,
> Binoy Dalal
>

Re: new data structure for some fields

2015-12-21 Thread Abhishek Mishra

Hi binoy it will not work as category and integer is one to one mapping so
if category_id is multivalued same goes to integer also. and you need some
kind of mechanism which will identify which integer to pick given to
category_id for search thenafter you can implement sort according to it.

On Mon, Dec 21, 2015 at 5:27 PM, Binoy Dalal  wrote:

> Small edit:
> The sort parameter in the solrconfig goes in the request handler
> declaration that you're using. So if it's select, put in the  name="defaults"> list.
>
> On Mon, 21 Dec 2015, 17:21 Binoy Dalal  wrote:
>
> > OK. You will only be able to sort based on the integers if the integer
> > field is single valued, I.e. only one integer is associated with one
> > category I'd.
> >
> > To do this you've to use the sort parameter.
> > You can either specify it in your solrconfig.XML like so:
> > integer asc
> > Field name followed by the order - asc/desc
> >
> > Or you can specify the it along with our query by appending it to your
> > query like so:
> > /select?q=query&sort=integet%20asc
> >
> > If you want to apply these sorting rules for all docs, then specify the
> > sorting in your solrconfig. If you only want It for a certain subset then
> > apply the parameter from code at the app level
> >
> > On Mon, 21 Dec 2015, 16:49 Abhishek Mishra  wrote:
> >
> >> hi binoy
> >> thanks for reply. I mean by sort is to sort the data-sets on the basis
> of
> >> integers values given for that category.
> >> For any document let say for an id P1,
> >> category associated is c1,c2,c3,c4 (using multivalued field)
> >> For new implementation
> >> similarly a number is associated with each category. let say
> >> c1---b1,c2---b2,c3---b3,c4---b4.
> >> now when we querying into solr for the ids which have c1 in their
> >> categories. (q=category_id:c1) now i want the result of this query
> sorted
> >> on the basis of number(b) associated with it throughout the result..
> >>
> >> number of association is usually less than 20 (means an id can't be
> mapped
> >> more than 20 category_ids)
> >>
> >>
> >> On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal 
> >> wrote:
> >>
> >> > When you say sort, do you mean search on the basis of category and
> >> > integers? Or score the docs based on their category and integer
> values?
> >> >
> >> > Also, for any given document, how many categories or integers are
> >> > associated with it?
> >> >
> >> > On Mon, 21 Dec 2015, 14:43 Abhishek Mishra 
> >> wrote:
> >> >
> >> > > Hello all
> >> > >
> >> > > i am facing some kind of requirement that where for an id p1 is
> >> > associated
> >> > > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4.
> We
> >> > need
> >> > > to sort the query of solr on the basis of b1/b2/b3/b4 depending on
> >> given
> >> > > category_id . Right now we mapped the category_ids into multi-valued
> >> > > attribute. [c1,c2,c3,c4] something like this. we are querying into
> it.
> >> > But
> >> > > from now we also need to find which integer b1,b2,b3.. associated
> with
> >> > > given category and also sort the whole query on it.
> >> > >
> >> > >
> >> > > sorry for any typos..
> >> > >
> >> > > Regards
> >> > > Abhishek
> >> > >
> >> > --
> >> > Regards,
> >> > Binoy Dalal
> >> >
> >>
> > --
> > Regards,
> > Binoy Dalal
> >
> --
> Regards,
> Binoy Dalal
>

Stable Versions in Solr 4

2015-12-28 Thread abhi Abhishek

Hi All,
   i am trying to determine stable version of SOLR 4. is there a blog which
we can refer.. i understand we can read through Release Notes. I am
interested in user reviews and challenges seen with various versions of
SOLR 4.


Appreciate your contribution.

Thanks,
Abhishek

Determine if Merge is triggered in SOLR

2016-01-26 Thread abhi Abhishek

Hi All,
is there a way in SOLR to determine if a merge has been triggered in
SOLR? is there a API exposed to query this?

if its not available is there a way to do the same using lucene jar files
available in the SOLR libs?

Appreciate your help.

Best Regards,
Abhishek

Re: Determine if Merge is triggered in SOLR

2016-01-31 Thread abhi Abhishek

Hi All,
any suggestions/ ideas?

Thanks,
Abhishek

On Tue, Jan 26, 2016 at 9:16 PM, abhi Abhishek  wrote:

> Hi All,
> is there a way in SOLR to determine if a merge has been triggered in
> SOLR? is there a API exposed to query this?
>
> if its not available is there a way to do the same using lucene jar files
> available in the SOLR libs?
>
> Appreciate your help.
>
> Best Regards,
> Abhishek
>

Need a group custom function(fieldcollapsing)

2016-03-14 Thread Abhishek Mishra

Hi all
We are running on solr5.2.1 . Now the requirement come that we need the
data on basis on some algo. The algorithm part we need to put on result
obtained from query. So best we can do is using
group.field,group.main,group.func. In group.func we need to use custom
function which will run the algorithm part. My doubts are where we need to
put custom function in which file??.  I found some articles related to this
https://dzone.com/articles/how-write-custom-solr
in this it's not explained where to put the code part in which file.


Regards,
Abhishek

Re: Need a group custom function(fieldcollapsing)

2016-03-15 Thread Abhishek Mishra

Any update on this???

On Mon, Mar 14, 2016 at 4:06 PM, Abhishek Mishra 
wrote:

> Hi all
> We are running on solr5.2.1 . Now the requirement come that we need the
> data on basis on some algo. The algorithm part we need to put on result
> obtained from query. So best we can do is using
> group.field,group.main,group.func. In group.func we need to use custom
> function which will run the algorithm part. My doubts are where we need to
> put custom function in which file??.  I found some articles related to this
> https://dzone.com/articles/how-write-custom-solr
> in this it's not explained where to put the code part in which file.
>
>
> Regards,
> Abhishek
>

Solr 4 replication

2016-04-04 Thread abhi Abhishek

Hi all,
Is solr 4 replication push or pull?

Best Regards,
Abhishek

Re: Solr 4 replication

2016-04-05 Thread abhi Abhishek

Thanks MIkhail.
  is there a way to have a push Replication. any Contributions or
Anything what could in this case?

Thanks,
Abhishek

On Tue, Apr 5, 2016 at 1:29 AM, Mikhail Khludnev  wrote:

> It's pull, but you can trigger pulling.
>
> On Mon, Apr 4, 2016 at 9:19 PM, abhi Abhishek  wrote:
>
> > Hi all,
> > Is solr 4 replication push or pull?
> >
> > Best Regards,
> > Abhishek
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>

SOLR Upgrade 3.x to 4.10

2016-04-12 Thread abhi Abhishek

Hi All,
I have SOLR 3.6 running currently, i am planning to upgrade this to
SOLR 4.10. Below were the thoughts we could come up with.

1. in place upgrade
   I would be making the SOLR 4.10 slave of 3.6 and copy the indexes,
and optimize this index.

  will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene
4.10) change the index structure to Lucene 4.10? if not what would be the
version?
  if i enable docvalues on certain fields before issuing optimize, will
it be able to incorporate ( create .dvd & .dvm files ) that in the newly
created index?


2. Re-Index the data

Seeking advice for minimum time to upgrade this with most features of SOLR
4.10

Thanks in Advance

Best Regards,
Abhishek

Re: SOLR Upgrade 3.x to 4.10

2016-04-12 Thread abhi Abhishek

Thanks Erick and Shawn for the input. it makes more sense to move to SOLR
5.x but we would like to get there in few iterations gradually making
incremental changes to have a smooth cut over.

our index size is 3TB (10 shards of 300G each), i was looking for a
alternate route which would save me from pain of re-indexing. any thoughts
for the same would help.

Best Regards,
Abhishek


On Wed, Apr 13, 2016 at 6:18 AM, Shawn Heisey  wrote:

> On 4/12/2016 6:10 AM, abhi Abhishek wrote:
> > I have SOLR 3.6 running currently, i am planning to upgrade this to
> > SOLR 4.10. Below were the thoughts we could come up with.
> >
> > 1. in place upgrade
> >I would be making the SOLR 4.10 slave of 3.6 and copy the indexes,
> > and optimize this index.
> >
> >   will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene
> > 4.10) change the index structure to Lucene 4.10? if not what would be the
> > version?
>
> Yes, the optimize will change the index structure, but the contents of
> the index will not change, even if changes in Solr's analysis components
> would have resulted in different info going into the index based on your
> schema.  Because the *query* analysis may also change with the upgrade,
> this might cause queries to no longer work the same, unless you reindex
> and verify that your analysis still does what you require.  A few
> changes to analysis components in later versions can be changed back to
> earlier behavior with luceneMatchVersion, but this typically only
> happens with big changes -- such as the major bugfix for
> WordDelimiterFilterFactory in version 4.8.
>
> Reindexing for all upgrades is recommended when possible.
>
> >   if i enable docvalues on certain fields before issuing optimize,
> will
> > it be able to incorporate ( create .dvd & .dvm files ) that in the newly
> > created index?
>
> No.  You must entirely reindex to add docValues.  Optimize just rewrites
> what's already present in the Lucene index.
>
> > 2. Re-Index the data
> >
> > Seeking advice for minimum time to upgrade this with most features of
> SOLR
> > 4.10
>
> This is impossible to answer.  It will depend on how long it takes to
> index your data.  That is very difficult to predict even if a lot of
> information is available.
>
> Thanks,
> Shawn
>
>

working of Sharded Query in SOLR 3.6

2015-09-09 Thread abhi Abhishek

Hi,
   I have a question about the distributed Querying in solr (
https://wiki.apache.org/solr/DistributedSearch),

let us consider the below call being made to solr server.

https://server1:8080/solr/core1/select?shards=server1:8080/solr/core1,server2:8070
/solr/core2,server3:8090/solr/core3&q=*:*&rows=10*&start=0

please correct if my understanding of the query processing here is
incorrect!

server1 acts as the master server for this request, and would spawn
requests toserver1, server2 and server3 for the given query and would wait
for the response from all the requests to return the response back to
client.

if this is the case(server1 waits on all the sharded calls to respond) how
would it join all the results from all the sharded calls?

if this is not how it does the processing can you please help me in
understanding the same.

Thanks in advance.

Thanks and Best Regards,
Abhishek Das

Re: working of Sharded Query in SOLR 3.6

2015-09-09 Thread abhi Abhishek

Hi,
   Thanks for the reply Shawn and Mugeesh. I was just trying to understand
the working of Distributed Querying in SOLR.

Thanks,
Abhishek Das

On Wed, Sep 9, 2015 at 8:18 PM, Mugeesh Husain  wrote:

> You are correct for distributed search.
> do worry care about join, solr will aggregate results from all core.
> share your requirement what you want ?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/working-of-Sharded-Query-in-SOLR-3-6-tp4227952p4227979.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

SOLR Backup and Restore - Solr 3.6.1

2015-03-01 Thread abhi Abhishek

Hello,
   we have solr 3.6.1 in our environment. we are trying to analyse
backup and recovery solutions for the same. is there a way to compress the
backup taken?

we have explored about replicationHandler with backup command. but as our
index is in 100's of GB's we would like a solution that provides
compression to reduce storage overhead.

thanks in advance

Regards,
Abhishek

data import

2015-03-13 Thread abhishek tiwari

solr indexing taking too much time .

What should i do to reduce time . working on solr 4.0.

not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari

Please provide the basic steps to resolve the issue


Getting following error

Full Import failed:java.lang.RuntimeException:
java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could
not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari

Alex thanks for replying
my solrconfig :

 <
lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />


##
  data-config-new.xml  


On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch 
wrote:

> > Could not load driver: com.mysql.jdbc.Driver
>
> Looks like a custom driver. Is the driver name correct? Is the library
> declared in solrconfig.xml? Is the library path correct (use absolute
> path if in doubt).
>
> Regards,
>Alex.
>
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 19 March 2015 at 00:35, abhishek tiwari  wrote:
> > Please provide the basic steps to resolve the issue
> >
> >
> > Getting following error
> >
> > Full Import failed:java.lang.RuntimeException:
> > java.lang.RuntimeException:
> > org.apache.solr.handler.dataimport.DataImportHandlerException: Could
> > not load driver: com.mysql.jdbc.Driver Processing Document # 1
>

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari


  

but still not working

On Thu, Mar 19, 2015 at 10:41 AM, Alexandre Rafalovitch 
wrote:

> Try absolute path to the jar directory. Hard to tell whether relative
> path is correct without knowing exactly how you are running it.
>
> Regards,
> Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 19 March 2015 at 01:00, abhishek tiwari  wrote:
> > Alex thanks for replying
> > my solrconfig :
> >
> > 
> <
> > lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" />
> >
> >
> > ##
> >  > "org.apache.solr.handler.dataimport.DataImportHandler">  name="defaults"
> >> data-config-new.xml  
> >
> >
> > On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> > Could not load driver: com.mysql.jdbc.Driver
> >>
> >> Looks like a custom driver. Is the driver name correct? Is the library
> >> declared in solrconfig.xml? Is the library path correct (use absolute
> >> path if in doubt).
> >>
> >> Regards,
> >>Alex.
> >>
> >> 
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 19 March 2015 at 00:35, abhishek tiwari 
> wrote:
> >> > Please provide the basic steps to resolve the issue
> >> >
> >> >
> >> > Getting following error
> >> >
> >> > Full Import failed:java.lang.RuntimeException:
> >> > java.lang.RuntimeException:
> >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Could
> >> > not load driver: com.mysql.jdbc.Driver Processing Document # 1
> >>
>

Re: data import

2015-03-19 Thread abhishek tiwari

Hi ,

- architecture : master (1) - slave(3)
solrconfig:

 500 

 15000 false 

schema :
   <
field name="selling_price" type="tfloat" indexed="true" stored="true" /> <
field name="third_price" type="tfloat" indexed="true" stored="true" /> <
field name="discount_percentage" type="tfloat" indexed="true" stored="true"
/> <
field name="sort_2" type="tint" indexed="true" stored="true" />  <
field name="show_metacategory" type="variantFacet" indexed="true" stored=
"true" />  <
field name="products" type="tint" indexed="true" stored="true" /> <
field name="by_drive_supported" type="text_path_new" indexed="true" stored=
"true" multiValued="true"/> <
field name="by_primary_camera" type="text_path_new" indexed="true" stored=
"true" multiValued="true"/><
field name="by_dial_shape" type="text_path_new" indexed="true" stored="true"
multiValued="true"/>   <
field name="by_features" type="text_path_new" indexed="true" stored="true"
multiValued="true"/><
field name="speaker_configuration" type="text_path_new" indexed="true"
stored="true" multiValued="true"/>  id  <
copyField source="product" dest="product_keyword"/>   <
copyField source="list_price" dest="text"/> <
copyField source="seo_name" dest="text"/>

On Fri, Mar 13, 2015 at 2:25 PM, Antonio Jesús Sánchez Padial <
antonio.sanc...@inia.es> wrote:

> Maybe you should add some info about:
>
> - your architecture, number of servers, etc
> - your schema.xml
> - and the data (ammount, type, ...) you are indexing
>
> Best.
>
> El 13/03/2015 a las 9:37, abhishek tiwari escribió:
>
>  solr indexing taking too much time .
>>
>> What should i do to reduce time . working on solr 4.0.
>>
>>
> --
> Antonio Jesús Sánchez Padial
> Jefe del Servicio de Biometría
> antonio.sanc...@inia.es
> Tlfno: +34 91 347 6831
> Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria
> Ctra.m de La Coruña, km.7
> 28040 Madrid
>
>

SOLR Index in shared/Network folder

2015-03-26 Thread abhi Abhishek

Greetings,
  I am trying to use a network shared location as my index directory.
are there any known problems in using a Network File System for running a
SOLR Instance?

Thanks in Advance.

Best Regards,
Abhishek

ZFS File System for SOLR 3.6 and SOLR 4

2015-03-26 Thread abhi Abhishek

Hello,
 i am trying to use ZFS as filesystem for my Linux Environment. are
there any performance implications of using any filesystem other than
ext-3/ext-4 with SOLR?

Thanks in Advance

Best Regards,
Abhishek

Re: SOLR Index in shared/Network folder

2015-03-29 Thread abhi Abhishek

Hello,
 Thanks for the suggestions. My aim is to reduce the disk space usage.
I have 1 master with 2 slave configured, where slaves are used for
searching and master ingests new data replicated to slaves, but as my index
size is in 100's of GB we see 3x times space overhead. i would like to
reduce this overhead, can you suggest something for this?

Thanks in Advance

Best Regards,
Abhishek

On Sat, Mar 28, 2015 at 12:13 AM, Erick Erickson 
wrote:

> To pile on: If you're talking about pointing two Solr instances at the
> _same_ index, it doesn't matter whether you are on NFS or not, you'll
> have all sorts of problems. And if this is a SolrCloud installation,
> it's particularly hard to get right.
>
> Please do not do this unless you have a very good reason, and please
> tell us what the reason is so we can perhaps suggest alternatives.
>
> Best,
> Erick
>
> On Fri, Mar 27, 2015 at 8:08 AM, Walter Underwood 
> wrote:
> > Several years ago, I accidentally put Solr indexes on an NFS volume and
> it was 100X slower.
> >
> > If you have enough RAM, query speed should be OK, but startup time
> (loading indexes into file buffers) could be really long. Indexing could be
> quite slow.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> > On Mar 26, 2015, at 11:31 PM, Shawn Heisey  wrote:
> >
> >> On 3/27/2015 12:06 AM, abhi Abhishek wrote:
> >>> Greetings,
> >>>  I am trying to use a network shared location as my index
> directory.
> >>> are there any known problems in using a Network File System for
> running a
> >>> SOLR Instance?
> >>
> >> It is not recommended.  You will probably need to change the lockType,
> >> ... the default "native" probably will not work, and you might need to
> >> change it to "none" to get it working ... but that disables an important
> >> safety mechanism that prevents index corruption.
> >>
> >> http://stackoverflow.com/questions/9599529/solr-over-nfs-problems
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>

Errors during Indexing in SOLR 4.6

2015-04-14 Thread abhi Abhishek

Hi All,
 we recently migrated from SOLR 3.6 to SOLR 4, while indexing in SOLR 4
we are getting below exception.

Apr 1, 2015 9:22:57 AM org.apache.solr.common.SolrException log

SEVERE: null:org.apache.solr.common.SolrException: Exception writing
document id 932684555 to the index; possible analysis error.

at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)

at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)

at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)

Caused by: java.lang.IllegalArgumentException: first position increment
must be > 0 (got 0) for field 'DataEnglish'

at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:131)



this works perfectly fine in SOLR 3.6. can someone help in debugging this.
any fixes/solutions?


Thanks in Advance.


Best Regards,

Abhishek

Unable to identify why faceting is taking so much time

2015-05-10 Thread Abhishek Gupta

I trying to facet over some data. My query is:

http://localhost:9020/search/p1-umShard-1/select?q=*:*&fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])&debug=timing&wt=json&rows=0
{

   - responseHeader:
   {
  - status: 0,
  - QTime: 45
  },
   - response:
   {
  - numFound: 137,
  - start: 0,
  - docs: [ ]
  },
   - debug:
   {
  - timing:
  {
 - time: 45,
 - prepare:
 {
- time: 0,
- query:
{
   - time: 0
   },
- facet:
{
   - time: 0
   },
- mlt:
{
   - time: 0
   },
- highlight:
{
   - time: 0
   },
- stats:
{
   - time: 0
   },
- debug:
{
   - time: 0
   }
},
 - process:
 {
- time: 45,
- query:
{
   - time: 45
   },
- facet:
{
   - time: 0
   },
- mlt:
{
   - time: 0
   },
- highlight:
{
   - time: 0
   },
- stats:
{
   - time: 0
   },
- debug:
{
   - time: 0
   }
}
 }
  }

}

According to this there are 137 records. Now I am faceting over these 137
records with facet.method=fc. Ideally it should just iterate over these 137
records and sub up the facets.

Facet query is:

http://localhost:9020/search/p1-umShard-1/select?q=*:*&fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])&facet.field=conversationId&facet=true&indent=on&wt=json&rows=0&facet.method=fc&debug=timing
{

   - responseHeader:
   {
  - status: 0,
  - QTime: 395103
  },
   - response:
   {
  - numFound: 137,
  - start: 0,
  - docs: [ ]
  },
   - facet_counts:
   {
  - facet_queries: { },
  - facet_fields:
  {
 - conversationId:
 [
- "t_mid.1429800181915:43409a654f429a7279",
- 14,
- "t_mid.1430066755916:3f1df73a90f3f56b24",
- 12,
- "t_mid.1424867675391:7a0ce173662f6b3230",
- 10,
- "t_mid.1429264970537:d53579af6852fdd409",
- 8,
- "t_mid.1429968009539:ad97aa3fcfc933ac32",
- 6,
- "t_mid.1429076620603:cf8c8da6cc7c0f7a40",
- 5,
- "t_mid.1429967431080:6f1037c42bc6d10921",
- 4,
- "t_mid.1430335716379:e8d2d7390c6d999689",
- 4,
- "t_mid.1430591984365:9c66f4b3f67a973193",
- 4,
- "t_mid.1431105168474:f5d294b79df5e97a26",
- 4,
- "t_id.539747739369904",
- 3,
- "t_mid.1423253619046:ef3da504f704e12448",
- 3,
- "t_mid.1424454328414:91f82976dc8196e034",
- 3,
- "t_mid.1429967443439:dacb57b0f96b00cb63",
- 3,
- "t_mid.1430734315969:e5002ecd489b51cc19",
- 3,
- "t_mid.1423229143533:71f3dd0f3714f44232",
- 2,
- "t_mid.1429076490131:87feb49fa82041dd77",
- 2,
- "t_mid.1429080523489:00a85a2b07980c9a19",
- 2,
- "t_mid.1429913551113:5870b4366960dc5c10",
- 2,
- "t_mid.1429917749072:7cbdaf3d8c2d15ef78",
- 2,
- "t_mid.1429966041997:616561349e22cb7001",
- 2,
- "t_mid.1429968203236:bcd0c539ae66947618",
- 2,
- "t_mid.1429982604402:6e509023526a0f5b09",
- 2,
- "t_mid.1430475210140:8a963390e62e26f497",
- 2,
- "t_mid.1430746574833:59b08895c5287a2998",
- 2,
- "t_mid.1423229237215:d03fb607be18b2d089",
- 1,
- "t_mid.1423256045556:63089c5cc77c800113",
- 1,
- "t_mid.1426870505993:a5b69b271bea481730",
- 1,
- "t_mid.1428776595760:d5ebc1f3b922952e41",
- 1,
- "t_mid.1429079296566:f9f0e4c24071e55444",
- 1,
- "t_mid.1429315090481:9b7d59d6d483999d57",
- 1,
- "t_mid.1429498786426:04f58597d3f5461330",
- 1,
- "t_mid.1429878261810:4bdc3e6442db876c21",
- 1,
- "t_mid.1429906605359:0f89faf08295015957",
- 1,
- "t_mid.1429915168615:365578d261795d6140",
- 1,
- "t_mid.1429968022645:2a362d85be63c2ab95",
- 1,
- "t_mid.1429968121564:2effeb664562bd9b26",
- 1,
- "t_mid.1429969582192:5aca482f37dca9d843",
- 1,
- "t_mi

Re: Unable to identify why faceting is taking so much time

2015-05-13 Thread Abhishek Gupta

Toke thanks for a quick reply. I am still confused, pls find the doubts I
have inline:

On Mon, May 11, 2015 at 1:22 PM Toke Eskildsen 
wrote:

> On Mon, 2015-05-11 at 05:48 +0000, Abhishek Gupta wrote:
> > According to this there are 137 records. Now I am faceting over these 137
> > records with facet.method=fc. Ideally it should just iterate over these
> 137
> > records and sub up the facets.
>
> That is only the ideal method if you are not planning on issuing
> subsequent calls: facet.method=fc does more work up front to ensure that
> later calls are fast.

> > http://localhost:9020/search/p1-umShard-1/select?q=*:*&;
> > fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])&
> > facet.field=conversationId&facet=true&indent=on&wt=json&rows=0&
> > facet.method=fc&debug=timing
> > {
> >
> >- responseHeader:
> >{
> >   - status: 0,
> >   - QTime: 395103
> >   },
>
> [...]
>
> > According to this faceting is taking 395036 time. Why its taking *395
> > seconds* to just calculate facets of 137 records?
>
> 6½ minute is a long time, even for first call. Do you have tens to
> hundreds of millions of documents in your index? Or do you have a
> similiar amount of unique values in your facet?
>

Yes we have that many documents (exact count: 522664425), but I am not sure
why that matters because what I understood from documentation
<https://wiki.apache.org/solr/SimpleFacetParameters#facet.method> is that
*fc* will only work on the documents filtered by filter query and query.
For my query there are only 137 documents for fc to work on and to make
*FieldCache*. But seeing the faceting result it seems that faceting is
being applied on all the documents which is not according to
documentation "*The
facet counts are calculated by iterating over documents that match the
query and summing the terms that appear in each document"*.  I am not able
to understand why fc is calculating facets over all the documents?

Just for your information the cardinality of the field(conversationId) on
which I am faceting is very high but the possible values for this field
matching my query and filter query is about 100 only.

> Either way, subsequent faceting calls should be much faster and a switch
> to DocValues should lower your first-call time significantly.
>

Also subsequent calls are not fast:
First call time: 297572
Second call time (made with in 2 sec): 249287

Yeah I agree docValues will reduce the time.

>
> Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: [EXTERNAL] Re: Does anybody crawl to a database and then index from the database to Solr?

2016-05-15 Thread abhi Abhishek

Clayton

you could also try running and optimize on the SOLR index as a
weekly/bi weekly maintenance task to keep the segment count in check and
the maxdoc , numdoc count as close as possible (in DB terms de-fragmenting
the solr indexes)

Best Regards,
Abhishek


On Sun, May 15, 2016 at 7:18 PM, Pryor, Clayton J 
wrote:

> Thank you for your feedback.  I really appreciate you taking the time to
> write it up for me (and hopefully others who might be considering the
> same).  My first thought for dealing with deleted docs was to delete the
> contents and rebuild the index from scratch but my primary customer for the
> deleted docs functionality wants to see it immediately.  I wrote a
> connector for transferring the contents of one Solr Index to another (I
> call it a Solr connector) and that takes a half hour.  As a side note, the
> reason I have multiple indexes is because we currently have physical
> servers for development and production but, as part of my effort, I am
> transitioning us to new VMs for development, quality, and production.  For
> quality control purposes I wanted to be able to reset each with the same
> set of data - thus the Solr connector.
>
> Yes, by connector I am talking about a Java program (using SolrJ) that
> reads from the database and populates the Solr Index.  For now I have had
> our enterprise DBAs create a single table to hold the current index schema
> fields plus some that I can think of that we might use outside of the
> index.  So far it is a completely flat structure so it will be easy to
> index to Solr but I can see, as requirements change, we may have to have a
> more sophisticated database (with multiple tables and greater
> normalization) in which case the connector will have to flatten the data
> for the Solr index.
>
> Thanks again, your response has been very reassuring!
>
> :)
>
> Clay
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, May 13, 2016 5:57 PM
> To: solr-user
> Subject: [EXTERNAL] Re: Does anybody crawl to a database and then index
> from the database to Solr?
>
> Clayton:
>
> I think you've done a pretty thorough investigation, I think you're
> spot-on. The only thing I would add is that you _will_ reindex your entire
> corpus multiple times. Count on it. Sometime, somewhere, somebody will
> say "gee, wouldn't it be nice if we could ". And
> to support it you'll have to change your Solr schema... which will almost
> certainly require you to re-index.
>
> The other thing people have done for deleting documents is to create
> triggers in your DB to insert the deleted doc IDs into, say, a "deleted"
> table along with a timestamp. Whenever necessary/desirable, run a cleanup
> task that finds all the IDs since the last time you ran your deleting
> program to remove docs that have been flagged since then.. Obviously you
> also have to keep a record around of the timestamp of the last successful
> run of this program..
>
> Or, frankly, since it takes so little time to rebuild from scratch people
> have foregone any of that complexity and simply rebuild the entire index
> periodically. You can use "collection aliasing" to do this in the
> background and then switch searches atomically, it depends somewhat on how
> long you can wait until you need to see (well, _not_
> see) the deleted docs.
>
> But this is all refinements, I think you're going down the right path.
>
> And when you say "connector", are you talking DIH or an external (say
> SolrJ) program?
>
> Best,
> Erick
>
> On Fri, May 13, 2016 at 2:04 PM, John Bickerstaff <
> j...@johnbickerstaff.com> wrote:
> > I've been working on a less-complex thing along the same lines -
> > taking all the data from our corporate database and pumping it into
> > Kafka for long-term storage -- and the ability to "play back" all the
> > Kafka messages any time we need to re-index.
> >
> > That simpler scenario has worked like a charm.  I don't need to
> > massage the data much once it's at rest in Kafka, so that was a
> > straightforward solution, although I could have gone with a DB and
> > just stored the solr documents with their ID's one per row in a RDBMS...
> >
> > The rest sounds like good ideas for your situation as Solr isn't the
> > best candidate for the kind of manipulation of data you're proposing
> > and a database excels at that.  It's more work, but you get a lot more
> > flexibility and you de-couple Solr from the data crawling as you say.
> >
> > It all sounds pretty good to me, but I've only been o

Proximity Search using edismax parser.

2017-06-12 Thread abhi Abhishek

Hi All,
  How does proximity Query work in SOLR.

Example if i am running a query like below, for the field containing the
text “India registered a historical test match win against the arch rival
Pakistan here in Lords, England on Sunday”

Query: “Test match India Pakistan” ~ 10

I am interested in understanding the intermediate steps
involved here to understand the search behavior and determine how results
are being matched to the search phrase.

Thanks in Advance,

Abhishek

Re: Proximity Search using edismax parser.

2017-06-12 Thread abhi Abhishek

Thanks for the suggestions Erik and Vrindavda,

i was trying to understand how does the above query work when we have slop
set to 10. the debug output of the SOLR Query gave the terms which were
being looked up but the transpositions done to look up the search wasn't
exposed.

i found following stack overflow link which describes the transpositions
done when one is looking for phrase with slop 4. is there a guide to
understand this?

https://stackoverflow.com/questions/25558195/lucene-proximity-search-for-phrase-with-more-than-two-words

Thanks in advance.

Best Regards,
Abhishek

On Mon, Jun 12, 2017 at 5:41 PM, Erik Hatcher 
wrote:

> Adding &debug=true to your search requests will give you the parsing
> details, so you can see how edismax interprets the query string and
> parameters to turn it into the underlying dismax and phrase queries.
>
> Erik
>
> > On Jun 12, 2017, at 3:22 AM, abhi Abhishek  wrote:
> >
> > Hi All,
> >  How does proximity Query work in SOLR.
> >
> > Example if i am running a query like below, for the field containing the
> > text “India registered a historical test match win against the arch rival
> > Pakistan here in Lords, England on Sunday”
> >
> > Query: “Test match India Pakistan” ~ 10
> >
> >I am interested in understanding the intermediate steps
> > involved here to understand the search behavior and determine how results
> > are being matched to the search phrase.
> >
> > Thanks in Advance,
> >
> > Abhishek
>
>

Odd Boolean Query behavior in SOLR 3.6

2017-06-13 Thread abhi Abhishek

Hi Everyone,

I have hit a weird behavior of Boolean Query, when I am
running the query with below param’s  it’s not behaving as expected. can
you please help me understand the behavior here?



q=*:*&fq=((-documentTypeId:3)+AND+companyId:29096)&version=2.2&start=0&rows=10&indent=on&debugQuery=true

 èReturns 0 matches

filter_queries: ((-documentTypeId:3) AND companyId:29096)

parsed_filter_queries: +(-documentTypeId:3) +companyId:29096



q=*:*&fq=(-documentTypeId:3+AND+companyId:29096)&version=2.2&start=0&rows=10&indent=on&debugQuery=true

è returns 1600 matches

filter_queries:(-documentTypeId:3 AND companyId:29096)

parsed_filter_queries:-documentTypeId:3 +companyId:29096



Can you please help me understand what am I missing here?


Thanks in Advance.


Thanks & Best Regards,

Abhishek

SOLR Metric Reporting to graphite

2017-08-06 Thread abhi Abhishek

Hi All,
I am trying to setup the graphite reporter for SOLR 6.5.0. i've started
a sample docker instance for graphite with statd (
https://github.com/hopsoft/docker-graphite-statsd).

also i've added the graphite metrics reporter in the SOLR.xml config of the
collection. however post doing this i dont see any data getting posted to
the graphite (
https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting).
added XML Config to solr.xml
 
  
localhost
2003
1
  
 
Graphite Mapped Ports
HostContainerService
80 80 nginx <https://www.nginx.com/resources/admin-guide/>
2003 2003 carbon receiver - plaintext
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol>
2004 2004 carbon receiver - pickle
<http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol>
2023 2023 carbon aggregator - plaintext
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
2024 2024 carbon aggregator - pickle
<http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py>
8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/server.md>
8126 8126 statsd admin
<https://github.com/etsy/statsd/blob/v0.7.2/docs/admin_interface.md>
<https://github.com/hopsoft/docker-graphite-statsd#mounted-volumes>
please advice if i am doing something wrong here.

Thanks,
Abhishek

edismax parsing confusion

2017-04-03 Thread Abhishek Mishra

Hi all
i am running solr query with these parameter

bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
qf: "test_product^5 category_path_tf^4 product_id gender"
q: "handbags between rs150 and rs 400"
defType: "edismax"

parsed query is like below one

for q:-
(+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag |
test_product:handbag^5.0 | product_id:handbags))
DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
test_product:between^5.0 | product_id:between))
+DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
test_product:rs150^5.0 | product_id:rs150))
+DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
test_product:rs^5.0 | product_id:rs))
DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags
between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags between"))
DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs
400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs150 ?
rs")) DisjunctionMaxQuery(("":"? rs 400")))
FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord

but for dismax parser it is working perfect:

(+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag |
test_product:handbag^5.0 | product_id:handbags))
DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
test_product:between^5.0 | product_id:between))
DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
test_product:rs150^5.0 | product_id:rs150))
DisjunctionMaxQuery((product_id:and))
DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
test_product:rs^5.0 | product_id:rs))
DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags
between rs150 ? rs 400"))
FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord


*according to me difference between dismax and edismax is based on some
extra features plus working of boosting fucntions.*



Regards,
Abhishek

Re: edismax parsing confusion

2017-04-04 Thread Abhishek Mishra

Hello guys
sorry for late response. @steve I am using solr 5.2 .
@greg i am using default mm from config file(According to me it is default
mm is 1).

Regards,
Abhishek

On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury 
wrote:

> eDismax uses 'mm', so knowing what that has been set to is important, or if
> it has been left unset/default you would need to consider whether 'q.op'
> has been set. Or the default operator from the config file.
>
> Ta,
> Greg
>
>
> On 3 April 2017 at 23:56, Steve Rowe  wrote:
>
> > Hi Abhishek,
> >
> > Which version of Solr are you using?
> >
> > I can see that the parsed queries are different, but they’re also very
> > similar, and there’s a lot of detail there - can you be more specific
> about
> > what the problem is?
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Apr 3, 2017, at 4:54 AM, Abhishek Mishra 
> > wrote:
> > >
> > > Hi all
> > > i am running solr query with these parameter
> > >
> > > bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
> > > qf: "test_product^5 category_path_tf^4 product_id gender"
> > > q: "handbags between rs150 and rs 400"
> > > defType: "edismax"
> > >
> > > parsed query is like below one
> > >
> > > for q:-
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags
> between"))
> > > DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs
> > > 400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
> > > DisjunctionMaxQuery(("":"between rs150"))
> > DisjunctionMaxQuery(("":"rs150 ?
> > > rs")) DisjunctionMaxQuery(("":"? rs 400")))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > > but for dismax parser it is working perfect:
> > >
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > DisjunctionMaxQuery((product_id:and))
> > > DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400"))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > >
> > > *according to me difference between dismax and edismax is based on some
> > > extra features plus working of boosting fucntions.*
> > >
> > >
> > >
> > > Regards,
> > > Abhishek
> >
> >
>

Which Tokenizer to use at searching

2014-03-09 Thread abhishek jain

Hi Friends,

I am concerned on Tokenizer, my scenario is:

During indexing i want to token on all punctuations, so i can use
StandardTokenizer, but at search time i want to consider punctuations as
part of text,

I dont store contents but only indexes.

What should i use.

Any advices ?


-- 
Thanks and kind Regards,
Abhishek jain

Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek jain

hi,

Thanks for replying promptly,
an example:

I want to index for A,B
but when i search A AND B, it should return result,
when i search for "A,B" it should return result.

Also Ideally when i search for "A , B" (with space) it should return result.


please advice
thanks
abhishek


On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI wrote:

> Hi;
>
> Firstly you have to keep in mind that if you don't index punctuation they
> will not be visible for search. On the other hand you can have different
> analyzer for index and search. You have to give more detail about your
> situation. What will be your tokenizer at search time, WhiteSpaceTokenizer?
> You can have a look at here:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> If you can give some examples what you want for indexing and searching I
> can help you to combine index and search analyzer/tokenizer/token filters.
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-09 18:06 GMT+02:00 abhishek jain :
>
> > Hi Friends,
> >
> > I am concerned on Tokenizer, my scenario is:
> >
> > During indexing i want to token on all punctuations, so i can use
> > StandardTokenizer, but at search time i want to consider punctuations as
> > part of text,
> >
> > I dont store contents but only indexes.
> >
> > What should i use.
> >
> > Any advices ?
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> >
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Optimizing RAM

2014-03-09 Thread abhishek jain

hi friends,
I want to index some good amount of data, i want to keep both stemmed and
unstemmed versions ,
I am confused should i keep two separate indexes or keep one index with two
versions or column , i mean col1_stemmed and col2_unstemmed.

I have multicore with multi shard configuration.
My server have 32 GB RAM and stemmed index size (without content) i
calculated as 60 GB .
I want to not put too much load and I/O load on a decent server with some 5
other replicated servers and want to use servers for other purposes also.


Also is it advised to server queries from master server or only from slaves?
-- 
Thanks,
Abhishek

Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek jain

Hi Erick,
Thanks for replying,

I want to index A,B (with or without space with comma) as separate words
and also want to return results when A and B searched individually and also
"A,B" .

Please let me know your views.
Let me know if i still havent explained correctly. I will try again.

Thanks
abhishek


On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson wrote:

> You've contradicted yourself, so it's hard to say. Or
> I'm  mis-reading your messages.
>
> bq: During indexing i want to token on all punctuations, so i can use
> StandardTokenizer, but at search time i want to consider punctuations as
> part of text,
>
> and in your second message:
>
> bq: when i search for "A,B" it should return result. [for input "A,B"]
>
> If, indeed, you "... at search time i want to consider punctuations as
> part of text" then "A,B" should NOT match the document.
>
> The admin/analysis page is your friend, I strongly suggest you spend
> some time looking at the various transformations performed by
> the various analyzers and tokenizers.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
>  wrote:
> > hi,
> >
> > Thanks for replying promptly,
> > an example:
> >
> > I want to index for A,B
> > but when i search A AND B, it should return result,
> > when i search for "A,B" it should return result.
> >
> > Also Ideally when i search for "A , B" (with space) it should return
> result.
> >
> >
> > please advice
> > thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI  >wrote:
> >
> >> Hi;
> >>
> >> Firstly you have to keep in mind that if you don't index punctuation
> they
> >> will not be visible for search. On the other hand you can have different
> >> analyzer for index and search. You have to give more detail about your
> >> situation. What will be your tokenizer at search time,
> WhiteSpaceTokenizer?
> >> You can have a look at here:
> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >>
> >> If you can give some examples what you want for indexing and searching I
> >> can help you to combine index and search analyzer/tokenizer/token
> filters.
> >>
> >> Thanks;
> >> Furkan KAMACI
> >>
> >>
> >> 2014-03-09 18:06 GMT+02:00 abhishek jain :
> >>
> >> > Hi Friends,
> >> >
> >> > I am concerned on Tokenizer, my scenario is:
> >> >
> >> > During indexing i want to token on all punctuations, so i can use
> >> > StandardTokenizer, but at search time i want to consider punctuations
> as
> >> > part of text,
> >> >
> >> > I dont store contents but only indexes.
> >> >
> >> > What should i use.
> >> >
> >> > Any advices ?
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek . netjain

‎Hi
Oops my bad. I actually meant
While indexing A,B 
A and B should ‎give result but 
"A B" should not give result.

Also I will look at analyser.

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:38
To: abhishek jain
Subject: Re: Which Tokenizer to use at searching

Then I don't see the problem. StandardTokenizer
(see the "text_general" fieldType) should do all this
for you automatically.

Did you look at the analysis page? I really recommend it.

Best,
Erick

On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
 wrote:
> Hi Erick,
> Thanks for replying,
>
> I want to index A,B (with or without space with comma) as separate words and
> also want to return results when A and B searched individually and also
> "A,B" .
>
> Please let me know your views.
> Let me know if i still havent explained correctly. I will try again.
>
> Thanks
> abhishek
>
>
> On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson 
> wrote:
>>
>> You've contradicted yourself, so it's hard to say. Or
>> I'm mis-reading your messages.
>>
>> bq: During indexing i want to token on all punctuations, so i can use
>> StandardTokenizer, but at search time i want to consider punctuations as
>> part of text,
>>
>> and in your second message:
>>
>> bq: when i search for "A,B" it should return result. [for input "A,B"]
>>
>> If, indeed, you "... at search time i want to consider punctuations as
>> part of text" then "A,B" should NOT match the document.
>>
>> The admin/analysis page is your friend, I strongly suggest you spend
>> some time looking at the various transformations performed by
>> the various analyzers and tokenizers.
>>
>> Best,
>> Erick
>>
>> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
>>  wrote:
>> > hi,
>> >
>> > Thanks for replying promptly,
>> > an example:
>> >
>> > I want to index for A,B
>> > but when i search A AND B, it should return result,
>> > when i search for "A,B" it should return result.
>> >
>> > Also Ideally when i search for "A , B" (with space) it should return
>> > result.
>> >
>> >
>> > please advice
>> > thanks
>> > abhishek
>> >
>> >
>> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
>> > wrote:
>> >
>> >> Hi;
>> >>
>> >> Firstly you have to keep in mind that if you don't index punctuation
>> >> they
>> >> will not be visible for search. On the other hand you can have
>> >> different
>> >> analyzer for index and search. You have to give more detail about your
>> >> situation. What will be your tokenizer at search time,
>> >> WhiteSpaceTokenizer?
>> >> You can have a look at here:
>> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> >>
>> >> If you can give some examples what you want for indexing and searching
>> >> I
>> >> can help you to combine index and search analyzer/tokenizer/token
>> >> filters.
>> >>
>> >> Thanks;
>> >> Furkan KAMACI
>> >>
>> >>
>> >> 2014-03-09 18:06 GMT+02:00 abhishek jain :
>> >>
>> >> > Hi Friends,
>> >> >
>> >> > I am concerned on Tokenizer, my scenario is:
>> >> >
>> >> > During indexing i want to token on all punctuations, so i can use
>> >> > StandardTokenizer, but at search time i want to consider punctuations
>> >> > as
>> >> > part of text,
>> >> >
>> >> > I dont store contents but only indexes.
>> >> >
>> >> > What should i use.
>> >> >
>> >> > Any advices ?
>> >> >
>> >> >
>> >> > --
>> >> > Thanks and kind Regards,
>> >> > Abhishek jain
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks and kind Regards,
>> > Abhishek jain
>> > +91 9971376767
>
>
>
>
> --
> Thanks and kind Regards,
> Abhishek jain
> +91 9971376767

Re: Which Tokenizer to use at searching

2014-03-09 Thread abhishek . netjain


Hi,
I meant that while searching A AND B should return result individually and when 
together with a AND. 

I want "A B" should not give result. Though A,B is indexed with 
StandardTokenizer. 

Thanks 
Abhishek
  Original Message  
From: Furkan KAMACI
Sent: Monday, 10 March 2014 06:11
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Which Tokenizer to use at searching

Hi;

What do you mean at here:

"While indexing A,B
A and B should give result "

Thanks;
Furkan KAMACI


2014-03-09 22:36 GMT+02:00 :

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
> Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
>  wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson  >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >>  wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain  >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>

Re: Optimizing RAM

2014-03-09 Thread abhishek . netjain

Hi,
If I go with copy field than will it increase I/O load considering I have RAM 
less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
 wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek

Re: Which Tokenizer to use at searching

2014-03-10 Thread abhishek jain

Hi,
As a solution, i have tried a combination of PatternTokenizerFactory and
PatternReplaceFilterFactory .

In both query and indexer i have written:




What i am trying to do is tokenizing on space and then rewriting every
special character as " punct " .

So, A,B becomes A punct B .

but the problem is A punct B is still one word and not tokenized further
application of filter,

Is there a way i can tokenize after application of filter, please suggest i
know i am missing something basic.

thanks
abhishek


On Mon, Mar 10, 2014 at 2:06 AM,  wrote:

> Hi
> Oops my bad. I actually meant
> While indexing A,B
> A and B should give result but
> "A B" should not give result.
>
> Also I will look at analyser.
>
> Thanks
> Abhishek
>
>   Original Message
> From: Erick Erickson
> Sent: Monday, 10 March 2014 01:38
> To: abhishek jain
> Subject: Re: Which Tokenizer to use at searching
>
> Then I don't see the problem. StandardTokenizer
> (see the "text_general" fieldType) should do all this
> for you automatically.
>
> Did you look at the analysis page? I really recommend it.
>
> Best,
> Erick
>
> On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain
>  wrote:
> > Hi Erick,
> > Thanks for replying,
> >
> > I want to index A,B (with or without space with comma) as separate words
> and
> > also want to return results when A and B searched individually and also
> > "A,B" .
> >
> > Please let me know your views.
> > Let me know if i still havent explained correctly. I will try again.
> >
> > Thanks
> > abhishek
> >
> >
> > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson  >
> > wrote:
> >>
> >> You've contradicted yourself, so it's hard to say. Or
> >> I'm mis-reading your messages.
> >>
> >> bq: During indexing i want to token on all punctuations, so i can use
> >> StandardTokenizer, but at search time i want to consider punctuations as
> >> part of text,
> >>
> >> and in your second message:
> >>
> >> bq: when i search for "A,B" it should return result. [for input "A,B"]
> >>
> >> If, indeed, you "... at search time i want to consider punctuations as
> >> part of text" then "A,B" should NOT match the document.
> >>
> >> The admin/analysis page is your friend, I strongly suggest you spend
> >> some time looking at the various transformations performed by
> >> the various analyzers and tokenizers.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain
> >>  wrote:
> >> > hi,
> >> >
> >> > Thanks for replying promptly,
> >> > an example:
> >> >
> >> > I want to index for A,B
> >> > but when i search A AND B, it should return result,
> >> > when i search for "A,B" it should return result.
> >> >
> >> > Also Ideally when i search for "A , B" (with space) it should return
> >> > result.
> >> >
> >> >
> >> > please advice
> >> > thanks
> >> > abhishek
> >> >
> >> >
> >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI
> >> > wrote:
> >> >
> >> >> Hi;
> >> >>
> >> >> Firstly you have to keep in mind that if you don't index punctuation
> >> >> they
> >> >> will not be visible for search. On the other hand you can have
> >> >> different
> >> >> analyzer for index and search. You have to give more detail about
> your
> >> >> situation. What will be your tokenizer at search time,
> >> >> WhiteSpaceTokenizer?
> >> >> You can have a look at here:
> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >> >>
> >> >> If you can give some examples what you want for indexing and
> searching
> >> >> I
> >> >> can help you to combine index and search analyzer/tokenizer/token
> >> >> filters.
> >> >>
> >> >> Thanks;
> >> >> Furkan KAMACI
> >> >>
> >> >>
> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain  >:
> >> >>
> >> >> > Hi Friends,
> >> >> >
> >> >> > I am concerned on Tokenizer, my scenario is:
> >> >> >
> >> >> > During indexing i want to token on all punctuations, so i can use
> >> >> > StandardTokenizer, but at search time i want to consider
> punctuations
> >> >> > as
> >> >> > part of text,
> >> >> >
> >> >> > I dont store contents but only indexes.
> >> >> >
> >> >> > What should i use.
> >> >> >
> >> >> > Any advices ?
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks and kind Regards,
> >> >> > Abhishek jain
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and kind Regards,
> >> > Abhishek jain
> >> > +91 9971376767
> >
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Re: Optimizing RAM

2014-03-11 Thread abhishek . netjain

Hi all,
What should be the ideal RAM index size ratio.

please reply I expect index to be of size of 60 gb and I dont store contents. 
Thanks 
Abhishek

  Original Message  
From: abhishek.netj...@gmail.com
Sent: Monday, 10 March 2014 09:25
To: solr-user@lucene.apache.org
Cc: Erick Erickson
Subject: Re: Optimizing RAM

Hi,
If I go with copy field than will it increase I/O load considering I have RAM 
less than one third of total index size?

Thanks 
Abhishek

  Original Message  
From: Erick Erickson
Sent: Monday, 10 March 2014 01:37
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Optimizing RAM

I'd go for a copyField, keep the stemmed and unstemmed
version in the same index.

An alternative (and I think there's a JIRA for this if not an
outright patch) is implement a "special" filter that, say, puts
the original tken in with a special character, say $ at the
end, i.e. if indexing "running", you'd index both "running$" and
"run". Then when you want exact match, you search for "running$".

Best,
Erick

On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain
 wrote:
> hi friends,
> I want to index some good amount of data, i want to keep both stemmed and
> unstemmed versions ,
> I am confused should i keep two separate indexes or keep one index with two
> versions or column , i mean col1_stemmed and col2_unstemmed.
>
> I have multicore with multi shard configuration.
> My server have 32 GB RAM and stemmed index size (without content) i
> calculated as 60 GB .
> I want to not put too much load and I/O load on a decent server with some 5
> other replicated servers and want to use servers for other purposes also.
>
>
> Also is it advised to server queries from master server or only from slaves?
> --
> Thanks,
> Abhishek

Re: Optimizing RAM

2014-03-11 Thread abhishek jain

hi Shawn,
Thanks for the reply,

Is there a way to optimize RAM or does  Solr does automatically. I have
multiple shards and i know i will be querying only 30% of shards most of
time! and i have 6 slaves. so dedicating more slave with 30% most used
shards .

Another question:
Is it advised to serve queries from master or only from slaves? or it
doesnt matter?

thanks
Abhishek




On Tue, Mar 11, 2014 at 9:12 PM, Shawn Heisey  wrote:

> On 3/11/2014 6:14 AM, abhishek.netj...@gmail.com wrote:
> > Hi all,
> > What should be the ideal RAM index size ratio.
> >
> > please reply I expect index to be of size of 60 gb and I dont store
> contents.
>
> Ideally, your total system RAM will be equal to the size of all your
> program's heap requirements, plus the size of all the data for all the
> programs.
>
> If Solr is the only thing on the box, then the ideal memory size is
> roughly the Solr heap plus the size of all the Solr indexes that live on
> that machine.  So if your heap is 8GB and your index is 60GB, you'll
> want at least 68GB of RAM for an ideal setup.  I don't know how big your
> heap is, so I am guessing here.
>
> You said your index does not store much content.  That means you will
> need a higher percentage of your total index size to be in RAM for good
> performance.  I would estimate that you want a minimum of two thirds of
> your index in RAM, which indicates a minimum RAM size of 48GB if we
> assume your heap is 8GB.  64GB would be better.
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#General_information
>
> Thanks,
> Shawn
>
>


-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

AND not as a boolean operator in Phrase

2014-03-25 Thread abhishek jain

hi friends,

when i search for "A and B" it gives me result for A , B , i am not sure
why?

Please guide how can i exact match when it is within phrase/quotes.

-- 
Thanks and kind Regards,
Abhishek jain

Re: AND not as a boolean operator in Phrase

2014-03-26 Thread abhishek jain

Hi Jack,
You are right, i am using 'and' as a stop word in both indexing and query,

Should i use it only during  indexing?

thanks



On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky wrote:

> What does your field type analyzer look like?
>
> I suspect that you have a stop filter which cause "and" to be removed.
>
> -- Jack Krupansky
>
> -Original Message- From: abhishek jain Sent: Tuesday, March 25,
> 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a
> boolean operator in Phrase
> hi friends,
>
> when i search for "A and B" it gives me result for A , B , i am not sure
> why?
>
> Please guide how can i exact match when it is within phrase/quotes.
>
> --
> Thanks and kind Regards,
> Abhishek jain
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Strange behavior while deleting

2014-03-31 Thread abhishek jain

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek

Re: Strange behavior while deleting

2014-03-31 Thread abhishek . netjain

Hi,
These settings are commented in schema. These are two different solr severs and 
almost identical schema ‎with the exception of one stemmed field.

Same solr versions are running.
Please help.

Thanks 
Abhishek

  Original Message  
From: Jack Krupansky
Sent: Monday, 31 March 2014 14:54
To: solr-user@lucene.apache.org
Reply To: solr-user@lucene.apache.org
Subject: Re: Strange behavior while deleting

Do the two cores have identical schema and solrconfig files? Are the delete 
and merge config settings the sameidentical?

Are these two cores running on the same Solr server, or two separate Solr 
servers? If the latter, are they both running the same release of Solr?

How big is the discrepancy - just a few, dozens, 10%, 50%?

-- Jack Krupansky

-Original Message- 
From: abhishek jain
Sent: Monday, March 31, 2014 3:26 AM
To: solr-user@lucene.apache.org
Subject: Strange behavior while deleting

hi friends,
I have observed a strange behavior,

I have two indexes of same ids and same number of docs, and i am using a
json file to delete records from both the indexes,
after deleting the ids, the resulting indexes now show different count of
docs,

Not sure why
I used curl with the same json file to delete from both the indexes.

Please advise asap,
thanks

-- 
Thanks and kind Regards,
Abhishek

Re: AND not as a boolean operator in Phrase

2014-04-02 Thread abhishek jain

Hi,
Ok thanks,
i want to search for phrase "A and B" with the *and *word sandwiched
between A and B. I dont want to work with and as a boolean operator when
within quotes.

I have and as a stop word and i dont want to reindex data.

What is my best bet.

thanks
abhishek jain


On Sun, Mar 30, 2014 at 2:33 AM, Bob Laferriere wrote:

> If you are using edismax you need to use AND. So A AND B will ignore the
> stop word and apply the Boolean operator. You can configure edismax to
> ignore Boolean stop words that are lowercase.
>
> Regards,
>
> Bob
>
> > On Mar 26, 2014, at 2:39 AM, abhishek jain 
> wrote:
> >
> > Hi Jack,
> > You are right, i am using 'and' as a stop word in both indexing and
> query,
> >
> > Should i use it only during  indexing?
> >
> > thanks
> >
> >
> >
> > On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky <
> j...@basetechnology.com>wrote:
> >
> >> What does your field type analyzer look like?
> >>
> >> I suspect that you have a stop filter which cause "and" to be removed.
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: abhishek jain Sent: Tuesday, March 25,
> >> 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a
> >> boolean operator in Phrase
> >> hi friends,
> >>
> >> when i search for "A and B" it gives me result for A , B , i am not sure
> >> why?
> >>
> >> Please guide how can i exact match when it is within phrase/quotes.
> >>
> >> --
> >> Thanks and kind Regards,
> >> Abhishek jain
> >
> >
> >
> > --
> > Thanks and kind Regards,
> > Abhishek jain
> > +91 9971376767
>



-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Error handling in Solr.

2014-04-08 Thread abhishek jain

hi friends,
While browsing through the logs of solr,i noticed a few null pointer
exceptions, i am concerned what could be the reason?


 ERROR org.apache.solr.core.SolrCore  â EURO " java.lang.NullPointerException
at
org.apache.solr.handler.admin.ShowFileRequestHandler.showFromFileSystem(ShowFileRequestHandler.java:212)
at
org.apache.solr.handler.admin.ShowFileRequestHandler.handleRequestBody(ShowFileRequestHandler.java:122)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)

Please help,

-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

Stopping Solr instance

2014-04-08 Thread abhishek jain

Hi friends,

What is the best way to stop solr from command line, the command with the
stop port and secret key as given in most online help links don't work for
me all time, 

I have to kill it most times ! i have though noted excessive swap usage when
i have to kill it. Is there a link between swap usage and solr not stopping?

 

Please let me know best way to stop solr instance.

 

Thanks

Abhi

Typecast non stored string field for sorting

2014-04-23 Thread abhishek jain

Hi friends,
I have a field which is string which I created by mistake it should have
been int.
It is not stored just indexed.

I want to numerically sort it, and hence I want a function which can at
query convert to integer or double and then I can apply sort. Is it
possible?
If not then can I create a new field with the value from non stored field?

Please advise.
Thanks
Abhishek

-- 
Thanks and kind Regards,
Abhishek jain
+91 9971376767

explaination of query processing in SOLR

2014-08-08 Thread abhi Abhishek

Hello,
I am fairly new to SOLR, can someone please help me understand how a
query is processed in SOLR, i.e, what i want to understand is from the time
it hits solr what files it refers to process the query, i.e, order in which
.tvx, .tvd files and others are accessed. basically i would like to
understand the code path of the search functionality also significance of
various files in the solr directory such as .tvx, .tcd, .frq, etc.


Regards,
Abhishek Das

Re: explaination of query processing in SOLR

2014-08-12 Thread abhi Abhishek

Thanks Alex and Jack for the direction, actually what i was trying to
understand was how various files had an effect on the search.

Thanks,
Abhishek


On Fri, Aug 8, 2014 at 6:35 PM, Alexandre Rafalovitch 
wrote:

> Abhishek,
>
> Your first part of the question is interesting, but your specific
> details are probably the wrong level for you to concentrate on. The
> issues you will be facing are not about which file does what. That's
> more performance and inner details. I feel you should worry more about
> the fields, default search fields, multiterms, whitespaces, etc.
>
> One way to do that is to enable debug and see if you actually
> understand what those different debug entries do. And don't use string
> or basic tokenizer. Pick something that has complex analyzer chain and
> see how that affects debug.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On Fri, Aug 8, 2014 at 1:59 PM, abhi Abhishek  wrote:
> > Hello,
> > I am fairly new to SOLR, can someone please help me understand how a
> > query is processed in SOLR, i.e, what i want to understand is from the
> time
> > it hits solr what files it refers to process the query, i.e, order in
> which
> > .tvx, .tvd files and others are accessed. basically i would like to
> > understand the code path of the search functionality also significance of
> > various files in the solr directory such as .tvx, .tcd, .frq, etc.
> >
> >
> > Regards,
> > Abhishek Das
>

Special character search in Solr and boosting without altering the resultset

2014-01-31 Thread abhishek jain

Hi friends,

I am facing a strange problem, When I search a term eg .Net   , the solr
searches for Net and not includes '.'  

Is dot a special character in Solr? I tried escaping it with backslash in
the url call to solr, but no use same resultset,

 

Also , is there a way to boost some terms within a resultset.

I mean I want to boost a term within a result and I don't want to fire a
separate query. I couldn't use OR operator as it will modify the resultset.
I want to use a single query and boost. I don't want to use dismax query as
well,

 

Please advice.

 

Thanks,

Abhishek

RE: Special character search in Solr and boosting without altering the resultset

2014-02-01 Thread abhishek jain

Hi,
Ok thanks, will look more into it,

Any info on boosting without altering the resultset?

Thanks
Abhishek 

> -Original Message-
> 
> Hi Abhishek,
> 
> dot is not a special character. Your field type / analyzer is stripping
> that character. Please see similar discussions and alternative
> solutions.
> 
> http://search-lucene.com/m/6dbI9zMSob1
> http://search-lucene.com/m/Ac71G0KlGz
> http://search-lucene.com/m/RRD2D1p1mi
> 
> Ahmet
> 
> 
> 
> On Friday, January 31, 2014 8:23 PM, abhishek jain
>  wrote:
> Hi friends,
> 
> I am facing a strange problem, When I search a term eg     .Net   , the
> solr searches for Net and not includes '.'
> 
> Is dot a special character in Solr? I tried escaping it with backslash
> in the url call to solr, but no use same resultset,
> 
> 
> 
> Also , is there a way to boost some terms within a resultset.
> 
> I mean I want to boost a term within a result and I don't want to fire
> a separate query. I couldn't use OR operator as it will modify the
> resultset.
> I want to use a single query and boost. I don't want to use dismax
> query as well,
> 
> 
> 
> Please advice.
> 
> 
> 
> Thanks,
> 
> Abhishek

RE: Special character search in Solr and boosting without altering the resultset

2014-02-01 Thread abhishek jain

Hi,
Thanks for replying but if i understand right:
q=term1 term2^0.6 means it will search for term1 and term2 and somewhat less
boost to term2, 

I want to search only for term1 and if the term2 exists boost by a positive
factor . I am not able to make such a query .

Thanks
Abhishek 

> -Original Message-
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Saturday, February 1, 2014 8:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Special character search in Solr and boosting without
> altering the resultset
> 
> Hi,
> 
> Can you elaborate your boosting requirement? There is a carat operator
> to boost query terms.
> 
> for example : q=term1 term2^0.6
> 
> 
> 
> 
> On Saturday, February 1, 2014 1:51 PM, abhishek jain
>  wrote:
> Hi,
> Ok thanks, will look more into it,
> 
> Any info on boosting without altering the resultset?
> 
> Thanks
> Abhishek
> 
> 
> > -Original Message-
> >
> > Hi Abhishek,
> >
> > dot is not a special character. Your field type / analyzer is
> > stripping that character. Please see similar discussions and
> > alternative solutions.
> >
> > http://search-lucene.com/m/6dbI9zMSob1
> > http://search-lucene.com/m/Ac71G0KlGz
> > http://search-lucene.com/m/RRD2D1p1mi
> >
> > Ahmet
> >
> >
> >
> > On Friday, January 31, 2014 8:23 PM, abhishek jain
> >  wrote:
> > Hi friends,
> >
> > I am facing a strange problem, When I search a term eg     .Net   ,
> > the solr searches for Net and not includes '.'
> >
> > Is dot a special character in Solr? I tried escaping it with
> backslash
> > in the url call to solr, but no use same resultset,
> >
> >
> >
> > Also , is there a way to boost some terms within a resultset.
> >
> > I mean I want to boost a term within a result and I don't want to
> fire
> > a separate query. I couldn't use OR operator as it will modify the
> > resultset.
> > I want to use a single query and boost. I don't want to use dismax
> > query as well,
> >
> >
> >
> > Please advice.
> >
> >
> >
> > Thanks,
> >
> > Abhishek

Remove stemming without reindexing - currently using KStem

2014-02-01 Thread abhishek jain

Hi Friends,

Is it possible to remove stemming without having to reindex the entire data,
I am using KStem.

Can we do so by query itself, not sure how?

I am not using dismax.

 

Thanks

Abhishek

"facet.mincount=0" returns facet values with 0 counts for "q=*" query

2014-12-09 Thread Abhishek Sharma

Hi,

Can any one help me understand what does it mean to have facet results like
this -

  "values": [
"4th of july flags",
0,
"angela moore",
0,
"anklets",
0,
"applique flags",
0,
"army national guard",
0,
"bangles",
0,
"beatriz ball"
  ]

for a *q=** query with

*facet.mincount=0?*
What do the* results signify? *In what condition can we have *facet count
as 0* for *q=** query?

Solr Memory Usage - How to reduce memory footprint for solr

2015-01-06 Thread Abhishek Sharma

*Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i
keep this low, my CPU hits 100% and response time for indexing increases a
lot.. And i have hit OOM Error as well when this value is low..

Is this too high? If so, how can I reduce this?

*Machine Details* 4 G RAM, SSD

*Solr App Details* (Standalone solr app, no shards)

   1. num. of Solr Cores = 5
   2. Index Size - 2 g
   3. num. of Search Hits per sec - 10 [*IMP* - All search queries have
   faceting..]
   4. num. of times Re-Indexing per hour per core - 10 (it may happen at
   the same time at a moment for all the 5 cores)
   5. Query Result Cache, Document cache and Filter Cache are all default
   size - 4 kb.

*top* stats -

  VIRTRESSHR S %CPU %MEM
6446600 3.478g  18308 S 11.3 94.6

*iotop* stats

 DISK READ  DISK WRITE  SWAPIN IO>
0-1200 K/s0-100 K/s  0  0-5%

DocValues with docValuesFormat="Disk"

2013-04-23 Thread Abhishek Sanoujam


Hi all,

I am trying to experiment with DocValues 
(http://wiki.apache.org/solr/DocValues) and use the "Disk" docValuesFormat.


Here's how my field type declaration looks like:
   sortMissingLast="true" omitNorms="true" 
docValuesFormat="Disk"/>


I don't even have any fields using that type.

Also I've updated solrconfig.xml with:
LUCENE_42

Am running with solr-4.2.1. My solr core is totally empty, and there is 
nothing in the data dir.


Am getting this weird error while starting up the solr core:

org.apache.solr.common.SolrException: FieldType 'stringDv' is configured 
with a docValues format, but the codec does not support it: class 
org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.(SolrCore.java:822)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is 
configured with a docValues format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.(SolrCore.java:735)
... 13 more
Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create 
core: p5-upsShard-1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is 
configured with a docValues format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.(SolrCore.java:822)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
... 10 more
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is 
configured with a docValues format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.(SolrCore.java:735)
... 13 more


Is there any other config change that I need to do? I've read 
http://wiki.apache.org/solr/DocValues multiple times, but am unable to 
see any light to solve the problem.


--
-
Cheers,
Abhishek

Re: DocValues with docValuesFormat="Disk"

2013-04-23 Thread Abhishek Sanoujam


Answering myself - adding this line in solrconfig.xml made it work:





On 4/23/13 3:42 PM, Abhishek Sanoujam wrote:

Hi all,

I am trying to experiment with DocValues 
(http://wiki.apache.org/solr/DocValues) and use the "Disk" 
docValuesFormat.


Here's how my field type declaration looks like:
   sortMissingLast="true" omitNorms="true" 
docValuesFormat="Disk"/>


I don't even have any fields using that type.

Also I've updated solrconfig.xml with:
LUCENE_42

Am running with solr-4.2.1. My solr core is totally empty, and there 
is nothing in the data dir.


Am getting this weird error while starting up the solr core:

org.apache.solr.common.SolrException: FieldType 'stringDv' is 
configured with a docValues format, but the codec does not support it: 
class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.(SolrCore.java:822)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.(SolrCore.java:735)
... 13 more
Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Unable to create 
core: p5-upsShard-1
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:680)
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.(SolrCore.java:822)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
... 10 more
Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' 
is configured with a docValues format, but the codec does not support 
it: class org.apache.solr.core.SolrCore$3

at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870)
at org.apache.solr.core.SolrCore.(SolrCore.java:735)
... 13 more


Is there any other config change that I need to do? I've read 
http://wiki.apache.org/solr/DocValues multiple times, but am unable to 
see any light to solve the problem.


--
-
Cheers,
Abhishek



--
-
Cheers,
Abhishek

Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Abhishek Sanoujam

rIndexSearcher.getDocListNC(SolrIndexSearcher.java:1491)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)



--
-
Cheers,
Abhishek

Re: Solr performance issues for simple query - q=: with start and rows

2013-04-29 Thread Abhishek Sanoujam

We have a single shard, and all the data is in a single box only. 
Definitely looks like "deep-paging" is having problems.


Just to understand, is the searcher looping over the result set 
everytime and skipping the first "start" count? This will definitely 
take a toll when we reach higher "start" values.




On 4/29/13 2:28 PM, Jan Høydahl wrote:

Hi,

How many shards do you have? This is a known issue with deep paging with multi 
shard, see https://issues.apache.org/jira/browse/SOLR-1726

You may be more successful in going to each shard, one at a time (with 
&distrib=false) to avoid this issue.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam :


We have a solr core with about 115 million documents. We are trying to migrate 
data and running a simple query with *:* query and with start and rows param.
The performance is becoming too slow in solr, its taking almost 2 mins to get 
4000 rows and migration is being just too slow. Logs snippet below:

INFO: [coreName] webapp=/solr path=/select 
params={start=55438000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=168308
INFO: [coreName] webapp=/solr path=/select 
params={start=55446000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=122771
INFO: [coreName] webapp=/solr path=/select 
params={start=55454000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=137615
INFO: [coreName] webapp=/solr path=/select 
params={start=5545&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=141223
INFO: [coreName] webapp=/solr path=/select 
params={start=55462000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=97474
INFO: [coreName] webapp=/solr path=/select 
params={start=55458000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=98115
INFO: [coreName] webapp=/solr path=/select 
params={start=55466000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=143822
INFO: [coreName] webapp=/solr path=/select 
params={start=55474000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=118066
INFO: [coreName] webapp=/solr path=/select 
params={start=5547&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=121498
INFO: [coreName] webapp=/solr path=/select 
params={start=55482000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=164062
INFO: [coreName] webapp=/solr path=/select 
params={start=55478000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=165518
INFO: [coreName] webapp=/solr path=/select 
params={start=55486000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=118163
INFO: [coreName] webapp=/solr path=/select 
params={start=55494000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=141642
INFO: [coreName] webapp=/solr path=/select 
params={start=5549&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 
status=0 QTime=145037


I've taken some thread dumps in the solr server and most of the time the 
threads seem to be busy in the following stacks mostly:
Is there anything that can be done to improve the performance? Is it a known 
issue? Its very surprising that querying for some just rows starting at some 
points is taking in order of minutes.


"395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a runnable 
[0x7f42865dd000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252)
at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184)
at 
org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61)
at 
org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter

Need solr query help

2013-05-09 Thread Abhishek tiwari

We are doing spatial search. with following logic.
a) There are shops in a city . Each provides the facility of home delivery
b) each shop has different  max_delivery_distance .

Now my query is suppose some one is searching from point P1 with radius R.

User wants the result of shops those can deliver him.(distance between P1
to shop s1 say d1 should be less than max_delivery distance say md1 )

how can i implement this by solr spatial query.

Need help on Solr

2013-06-20 Thread Abhishek Bansal

invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
field definition for 'id'
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]] and
[[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
required=true}]]]
 at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176)
 at
org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
 at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
... 1 more




with regards,
Abhishek Bansal

Re: Need help on Solr

2013-06-20 Thread Abhishek Bansal

Yeah I know, out of the box there is one id field. I removed it from
schema.xml

I have also added below code to automatically generate an ID.


   
   

with regards,
Abhishek Bansal


On 20 June 2013 21:49, Shreejay  wrote:

> org.apache.solr.common.SolrException: [schema.xml] Duplicate field
> definition for 'id'
>
> You might have defined an id field in the schema file. The out of box
> schema file already contains an id field .
>
> --
> Shreejay
>
>
> On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote:
>
> > Hello,
> >
> > I am trying to index a pdf file on Solr. I am running icurrently Solr on
> > Apache Tomcat 6.
> >
> > When I try to index it I get below error. Please help. I was not able to
> > rectify this error with help of internet.
> >
> >
> >
> >
> > ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer;
> Unable
> > to create core: collection1
> > org.apache.solr.common.SolrException: [schema.xml] Duplicate field
> > definition for 'id'
> >
> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
> > required=true}]]] and
> >
> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
> > required=true}]]]
> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176)
> > at
> >
> org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
> > at
> >
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
> > at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > at java.lang.Thread.run(Thread.java:662)
> > ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
> > null:org.apache.solr.common.SolrException: Unable to create core:
> > collection1
> > at
> >
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > at java.lang.Thread.run(Thread.java:662)
> > Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
> > field definition for 'id'
> >
> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
> > required=true}]]] and
> >
> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
> > required=true}]]]
> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176)
> > at
> >
> org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
> > at
> >
> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
> > at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:9

Re: Need help on Solr

2013-06-20 Thread Abhishek Bansal

As I am running Solr on windows + tomcat I am using below command to index
pdf. I hope this command is not faulty. Please check

java -jar -Durl="
http://localhost:8080/solr-4.3.0/update/extract?literal.id=1&commit=true";
post.jar sample.pdf

with regards,
Abhishek Bansal


On 20 June 2013 21:56, Abhishek Bansal  wrote:

> Yeah I know, out of the box there is one id field. I removed it from
> schema.xml
>
> I have also added below code to automatically generate an ID.
>
>  multiValued="false"/>
>
>
>
> with regards,
> Abhishek Bansal
>
>
> On 20 June 2013 21:49, Shreejay  wrote:
>
>> org.apache.solr.common.SolrException: [schema.xml] Duplicate field
>> definition for 'id'
>>
>> You might have defined an id field in the schema file. The out of box
>> schema file already contains an id field .
>>
>> --
>> Shreejay
>>
>>
>> On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote:
>>
>> > Hello,
>> >
>> > I am trying to index a pdf file on Solr. I am running icurrently Solr on
>> > Apache Tomcat 6.
>> >
>> > When I try to index it I get below error. Please help. I was not able to
>> > rectify this error with help of internet.
>> >
>> >
>> >
>> >
>> > ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer;
>> Unable
>> > to create core: collection1
>> > org.apache.solr.common.SolrException: [schema.xml] Duplicate field
>> > definition for 'id'
>> >
>> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
>> > required=true}]]] and
>> >
>> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required,
>> > required=true}]]]
>> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502)
>> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176)
>> > at
>> >
>> org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62)
>> > at
>> >
>> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36)
>> > at
>> >
>> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946)
>> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
>> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
>> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > at java.lang.Thread.run(Thread.java:662)
>> > ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException;
>> > null:org.apache.solr.common.SolrException: Unable to create core:
>> > collection1
>> > at
>> >
>> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
>> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
>> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
>> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> > at java.lang.Thread.run(Thread.java:662)
>> > Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate
>> > field definition for 'id'
>> >
>> [[[id{type=string,properties=indexe

Re: help to build query

2013-01-31 Thread Abhishek tiwari

jack Thanks for your response..

we have a deal web application.. and having free text search in it . here
free text
 means you can type any thing in it..

we have deals of different categories..  and tagged at  different
 merchant  locations..
As per requirement  i have to do some tweaks in search ..

for example user can search deals like :

a)   cat1 in location1 , location 2.( spa in malviya nagar, Ashok vihar
... here spa : cat1, location1: malviya nagar, location2:Ashok vihar
b) cat1 and cat2 in location1
c) cat1 in location1 and location2

Hope i am able to explain you better..

On Wed, Jan 30, 2013 at 9:06 PM, Jack Krupansky wrote:

> Start by expressing the specific semantics of those queries in strict
> boolean form. I mean, what exactly do you mean by "in", and "location1,
> location 2", and "location1, loc2 and loc3? Is the latter an AND or an OR?
>
> Or at least fully express those two queries, unambiguously in plain
> English. There is too much ambiguity present to give you any solid
> direction.
>
> -- Jack Krupansky
>
> -Original Message- From: Abhishek tiwari
> Sent: Wednesday, January 30, 2013 12:55 AM
> To: solr-user@lucene.apache.org
> Subject: help to build query
>
>
> want to execute queries like :
> a)  cat in location1 , location2
> b)  cat 1 and cat2 in location1 ,loc2 and  loc3
>
> in our search .
>
> our challenges :
>
> 1)  picking right keywords(category and locality) from query entered.
> 2)  its mapping to relevant entity
>
> How should i proceed for it .
>
> we have localities and categories data indexed .
>
> thanks in advance.
>
> ~abhishek
>

Re: Writing a french Solr book - Ecrire un livre en français

2012-01-29 Thread Abhishek Tyagi

If you are thinking then do it, why do want people to tell you what you
should do.

bestaluck!

On Sun, Jan 29, 2012 at 8:20 PM, SR  wrote:

> Hi there,
>
> Have you heard of any existing Solr book in French? If no, I'm thinking of
> writing one. Do you think this could be useful for francophone community?
>
> Thanks
> -SR

-- 
Abhishek Tyagi
Let's just say.. I'm the Frankenstein's Monster.

Re: schema design help

2012-03-06 Thread Abhishek tiwari

thanks for replying ..

In our RDBMS schema we have Establishment/Event/Movie master relations.
Establishment has title ,description , ratings,  tags, cuisines
(multivalued), services (multivalued) and features  (multivalued) like
fields..similarly in Event title, description, category(multivalued)  and
venue(multivalued) ..fields..and in movies name,start date and end date
,genre, theater ,rating , review  like fields ..

  we are having nearly 1 M data in each entity and movie and event expire
frequently and we have to update on expire 
we are having the data additional to index data ( stored data)  to reduce
RDBMS query..

please suggest me how to proceed for schema design.. single core or
multiple core for each entity?

On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:

> On 6 March 2012 18:01, Abhishek tiwari 
> wrote:
> > i am new in solr  want help in shema design .  i have multiple entities
> > like Event , Establishments and Movies ..each have different types of
> > relations.. should i make diffrent core for each entities ?
>
> It depends on your use case, i.e., what would your typical searches
> be on. Normally, using a separate core for each entity would be
> unusual, and instead one would flatten out typical RDBMS data for
> Solr.
>
> Please describe what you want to achieve, and people might be
> better able to help you.
>
> Regards,
> Gora
>

Re: schema design help

2012-03-07 Thread Abhishek tiwari

please  suggest me when one should  create multiple core..?

On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood wrote:

> Solr is not relational, so you will probably need to take a fresh look at
> your data.
>
> Here is one method.
>
> 1. Sketch your search results page.
> 2. Each result is a document in Solr.
> 3. Each displayed item is a stored field in Solr.
> 4. Each searched item is an indexed field in Solr.
>
> It may help to think of this as a big flat materialized view in your DBMS.
>
> wunder
> Search Guy, Chegg.com
>
> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:
>
> > thanks for replying ..
> >
> > In our RDBMS schema we have Establishment/Event/Movie master relations.
> > Establishment has title ,description , ratings,  tags, cuisines
> > (multivalued), services (multivalued) and features  (multivalued) like
> > fields..similarly in Event title, description, category(multivalued)  and
> > venue(multivalued) ..fields..and in movies name,start date and end date
> > ,genre, theater ,rating , review  like fields ..
> >
> >  we are having nearly 1 M data in each entity and movie and event expire
> > frequently and we have to update on expire 
> > we are having the data additional to index data ( stored data)  to reduce
> > RDBMS query..
> >
> > please suggest me how to proceed for schema design.. single core or
> > multiple core for each entity?
> >
> >
> > On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty  wrote:
> >
> >> On 6 March 2012 18:01, Abhishek tiwari 
> >> wrote:
> >>> i am new in solr  want help in shema design .  i have multiple entities
> >>> like Event , Establishments and Movies ..each have different types of
> >>> relations.. should i make diffrent core for each entities ?
> >>
> >> It depends on your use case, i.e., what would your typical searches
> >> be on. Normally, using a separate core for each entity would be
> >> unusual, and instead one would flatten out typical RDBMS data for
> >> Solr.
> >>
> >> Please describe what you want to achieve, and people might be
> >> better able to help you.
> >>
> >> Regards,
> >> Gora
> >>
>
>
>
>
>
>

Re: schema design help

2012-03-07 Thread Abhishek tiwari

 my page have layout in following manner
 *All tab* :which will contain all entities (Establishment/Event/Movie)
Establishment: contain Establishment search results
Event tab : will contain Event search results
Movie tab : will contain Movie search results

please suggest me how to design my schema ?


On Thu, Mar 8, 2012 at 10:21 AM, Walter Underwood wrote:

> You should create multiple cores when each core is an independent search.
> If you have three separate search pages, you may want three separate cores.
>
> wunder
> Search Guy, Chegg.com
>
> On Mar 7, 2012, at 8:48 PM, Abhishek tiwari wrote:
>
> > please  suggest me when one should  create multiple core..?
> >
> > On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood  >wrote:
> >
> >> Solr is not relational, so you will probably need to take a fresh look
> at
> >> your data.
> >>
> >> Here is one method.
> >>
> >> 1. Sketch your search results page.
> >> 2. Each result is a document in Solr.
> >> 3. Each displayed item is a stored field in Solr.
> >> 4. Each searched item is an indexed field in Solr.
> >>
> >> It may help to think of this as a big flat materialized view in your
> DBMS.
> >>
> >> wunder
> >> Search Guy, Chegg.com
> >>
> >> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote:
> >>
> >>> thanks for replying ..
> >>>
> >>> In our RDBMS schema we have Establishment/Event/Movie master relations.
> >>> Establishment has title ,description , ratings,  tags, cuisines
> >>> (multivalued), services (multivalued) and features  (multivalued) like
> >>> fields..similarly in Event title, description, category(multivalued)
>  and
> >>> venue(multivalued) ..fields..and in movies name,start date and end date
> >>> ,genre, theater ,rating , review  like fields ..
> >>>
> >>> we are having nearly 1 M data in each entity and movie and event expire
> >>> frequently and we have to update on expire 
> >>> we are having the data additional to index data ( stored data)  to
> reduce
> >>> RDBMS query..
> >>>
> >>> please suggest me how to proceed for schema design.. single core or
> >>> multiple core for each entity?
> >>>
> >>>
> >>> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty 
> wrote:
> >>>
> >>>> On 6 March 2012 18:01, Abhishek tiwari  >
> >>>> wrote:
> >>>>> i am new in solr  want help in shema design .  i have multiple
> entities
> >>>>> like Event , Establishments and Movies ..each have different types of
> >>>>> relations.. should i make diffrent core for each entities ?
> >>>>
> >>>> It depends on your use case, i.e., what would your typical searches
> >>>> be on. Normally, using a separate core for each entity would be
> >>>> unusual, and instead one would flatten out typical RDBMS data for
> >>>> Solr.
> >>>>
> >>>> Please describe what you want to achieve, and people might be
> >>>> better able to help you.
> >>>>
> >>>> Regards,
> >>>> Gora
> >>>>
>
>
>
>
>

Re: schema design help

2012-03-07 Thread Abhishek tiwari

Gora,
we are not having the related search ...
like u have mentioned ... * will a search on an Establishment
also require results from Movie, such as what movies are showing
at the establishment*

Establishment doesnot require movie reults .. each enitity has there
separate search..

On Thu, Mar 8, 2012 at 10:49 AM, Gora Mohanty  wrote:

> On 8 March 2012 10:40, Abhishek tiwari 
> wrote:
> >  my page have layout in following manner
> >  *All tab* :which will contain all entities (Establishment/Event/Movie)
> > Establishment: contain Establishment search results
> > Event tab : will contain Event search results
> > Movie tab : will contain Movie search results
> >
> > please suggest me how to design my schema ?
> [...]
>
> You will need to think more about your search requirements, and
> provide more details. E.g., will a search on an Establishment
> also require results from Movie, such as what movies are showing
> at the establishment? Similarly, will results from an Event search
> require a list of Movies showing at the events? As Solr is not a
> RDBMS, if you need such correlated data, you should typically use
> a single, flat index, rather than multiple cores.
>
> IMHO, a multi-core setup would be unusual for what you are
> trying to do. However, this is difficult to say for sure without an
> insight into your search requirements.
>
> Regards,
> Gora
>

Re: schema design help

2012-03-13 Thread Abhishek tiwari

Hi Gora,
Thanks,

My one more concern,
Though Establishments, Events, Movies are not related to each other,

I have to make 3 search queries to their independent cores and club the
data to show, will that effect my relevancy?
There is movie with title "Striker" and Establishment with title "Striker".

So which one is better:

   - 3 queries to independent cores and clubbing data

   - Single query to one core which contains all the data.

Thanks

Abhishek

On Thu, Mar 8, 2012 at 11:07 AM, Gora Mohanty  wrote:

> On 8 March 2012 11:05, Abhishek tiwari 
> wrote:
> > Gora,
> > we are not having the related search ...
> > like u have mentioned ... * will a search on an Establishment
> > also require results from Movie, such as what movies are showing
> > at the establishment*
> >
> > Establishment doesnot require movie reults .. each enitity has there
> > separate search..
> [...]
>
> In that case, multiple cores should be OK.
>
> Regards,
> Gora
>

query help

2012-03-28 Thread Abhishek tiwari

Hi ,
i have multi valued field want to sort the docs order the particular
text eq:'B1' is added.
how i should query? ad_text is multivalued field.

t



B1
B2
B3




B2
B1
B3





B1
B2
B3




B3
B2
B1

Re: query help

2012-03-29 Thread Abhishek tiwari

a) No. i do not want to sort the content within document .
I want to sort the documents .
b) As i have explained i have result set( documents ) and each document
contains a fields "*ad_text*" (with other fields also) which is
multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*

if i search for B1: result should come in following order:
doc1,doc3,doc2,doc4
(As B1 is first value in maltivalued result for doc1and doc3, and B1 is in
2nd value in doc2 while  B1 is at 3rd in doc4  )
if i search for B2: result should come in following order: doc4
,doc1,doc3,doc2

I donot know whether it is possible or not ..

but please suggest how it can be done.

On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson wrote:

> Hmmm, I don't quite get this. Are you saying that you want
> to sort the documents or sort the content within the document?
>
> Sorting documents (i.e the results list) requires a single-valued
> field. So you'd have to, at index time, sort the entries.
>
> Sorting the content within the document is something you'd
> have to do when you index, Solr doesn't rearrange the
> contents of a document.
>
> If all you want to do is display the results within the document
> in order, your app can do that as it builds the display page.
>
> Best
> Erick
>
> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
>  wrote:
> > Hi ,
> > i have multi valued field want to sort the docs order the particular
> > text eq:'B1' is added.
> > how i should query? ad_text is multivalued field.
> >
> > t
> >
> > 
> > 
> > B1
> > B2
> > B3
> > 
> > 
> > 
> > 
> > B2
> > B1
> > B3
> > 
> > 
> >
> > 
> > 
> > B1
> > B2
> > B3
> > 
> > 
> > 
> > 
> > B3
> > B2
> > B1
> > 
> > 
>

Re: query help

2012-03-29 Thread Abhishek tiwari

can i achieve this with help of  boosting technique ?

On Thu, Mar 29, 2012 at 10:42 PM, Erick Erickson wrote:

> Solr doesn't support sorting on multValued fields so I don't think this
> is possible OOB.
>
> I can't come up with a clever indexing solution that does this either,
> sorry.
>
> Best
> Erick
>
> On Thu, Mar 29, 2012 at 8:27 AM, Abhishek tiwari
>  wrote:
> > a) No. i do not want to sort the content within document .
> > I want to sort the documents .
> > b) As i have explained i have result set( documents ) and each document
> > contains a fields "*ad_text*" (with other fields also) which is
> > multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags
> > are different for each doc. say (B1, B2, B3) *for doc1*,  B3,B1 B2*, for
> > doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4*
> >
> > if i search for B1: result should come in following order:
> > doc1,doc3,doc2,doc4
> > (As B1 is first value in maltivalued result for doc1and doc3, and B1 is
> in
> > 2nd value in doc2 while  B1 is at 3rd in doc4  )
> > if i search for B2: result should come in following order: doc4
> > ,doc1,doc3,doc2
> >
> >
> > I donot know whether it is possible or not ..
> >
> > but please suggest how it can be done.
> >
> >
> >
> > On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson  >wrote:
> >
> >> Hmmm, I don't quite get this. Are you saying that you want
> >> to sort the documents or sort the content within the document?
> >>
> >> Sorting documents (i.e the results list) requires a single-valued
> >> field. So you'd have to, at index time, sort the entries.
> >>
> >> Sorting the content within the document is something you'd
> >> have to do when you index, Solr doesn't rearrange the
> >> contents of a document.
> >>
> >> If all you want to do is display the results within the document
> >> in order, your app can do that as it builds the display page.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari
> >>  wrote:
> >> > Hi ,
> >> > i have multi valued field want to sort the docs order the particular
> >> > text eq:'B1' is added.
> >> > how i should query? ad_text is multivalued field.
> >> >
> >> > t
> >> >
> >> > 
> >> > 
> >> > B1
> >> > B2
> >> > B3
> >> > 
> >> > 
> >> > 
> >> > 
> >> > B2
> >> > B1
> >> > B3
> >> > 
> >> > 
> >> >
> >> > 
> >> > 
> >> > B1
> >> > B2
> >> > B3
> >> > 
> >> > 
> >> > 
> >> > 
> >> > B3
> >> > B2
> >> > B1
> >> > 
> >> > 
> >>
>

Re: Error

2012-04-12 Thread Abhishek tiwari

i am using 3.4 solr version... please assist...

On Thu, Apr 12, 2012 at 8:41 PM, Erick Erickson wrote:

> Please review:
>
> http://wiki.apache.org/solr/UsingMailingLists
>
> You haven't said whether, for instance, you're using trunk which
> is the only version that supports the "termfreq" function.
>
> Best
> Erick
>
> On Thu, Apr 12, 2012 at 4:08 AM, Abhishek tiwari
>  wrote:
> >
> http://xyz.com:8080/newschema/mainsearch/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&sort=termfreq%28cuisine_priorities_list,%27Chinese%27%29%20desc
> >
> > Error :  HTTP Status 400 - Missing sort order.
> > Why i am getting error ?
>

Searching .msg files

2009-12-14 Thread Abhishek Srivastava

Hello Everyone,

In my company, we store a lot of old emails (.msg files) in a database (done
for the purpose of legal compliance).

The users have been asking us to give search functionality on the old
emails.

One of the primary requirement is that when people search, they should only
be able to search in their own emails (emails in which they were in the to,
cc or bcc list).

How can solr be used?

from what I know about this product is that it only searches xml content...
so I will have to extract the body of the email and convert it to xml right?

How will I limit the search results to only those emails where the user who
is searching was in the to, cc or bcc list?

Please do recommend me an approach for providing a solution to our
requirement.

Comparison of Solr with Sharepoint Search

2010-01-19 Thread Abhishek Srivastava

Has anyone done a functionality comparison of Solr with Sharepoint/Fast
Search?

If yes, kindly share a few details here.

Thanks for your help in advance!

Regards,
Abhishek.

Question on Tokenizing email address

2010-02-09 Thread Abhishek Srivastava

Hello Everyone,

I have a field in my solr schema which stores emails. The way I want the
emails to be tokenized is like this.
if the email address is abc@alpha-xyz.com
User should be able to search on

1. abc@alpha-xyz.com  (whole address)
2. abc
3. def
4. alpha-xyz

Which tokenizer should I use?

Also, is there a feature like "Must Match" in solr? in my schema there is
field called "from" which contains the email address of the person who sent
an email. For this field, I don't want any tokenization. When the user
issues a search. The users email ID must exactly match the "for" column
value for that document/record to be returned.
How can I do this?

Regards,
Abhishek

regarding 'sharedlib' in solr

2018-09-07 Thread Abhishek Agarwal

Hi,

I want to share a folder containing text files in solr among different
cores so if the folder is updated ,so it would reflect in all the cores
having path specified but the problem I am facing is that , i am using
sharedlib in solr.xml and specifying the default path there.And also I am
updating schema.xml of my core but when I am loading core ,it is giving
error 'unsafe loading' .and not getting reload. Please help me in this.

-- 
Thanks,
Abhishek Agarwal

Re: Error:Missing Required Fields for Atomic Updates

2018-11-19 Thread abhi Abhishek

Update Handler expect all the required fields to be passed in even for the
atomic update request payload.

https://github.com/apache/lucene-solr/blob/branch_7_5/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java

Hope this helps!

// Now validate required fields or add default values // fields with
default values are defacto 'required' // Note: We don't need to add default
fields if this document is to be used for // in-place updates, since this
validation and population of default fields would've happened // during the
full indexing initially. if (!forInPlaceUpdate) { for (SchemaField field :
schema.getRequiredFields()) { if (out.getField(field.getName() ) == null) {
if (field.getDefaultValue() != null) { addField(out, field,
field.getDefaultValue(), false); } else { String msg = getID(doc, schema) +
"missing required field: " + field.getName(); throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST, msg ); } } } } Cheers! Abhishek

On Tue, Nov 20, 2018 at 11:47 AM Rahul Goswami 
wrote:

> What is the Router name for your collection? Is it "implicit"  (You can
> know this from the "Overview" of you collection in the admin UI)  ? If yes,
> what is the router.field parameter the collection was created with?
>
> Rahul
>
>
> On Mon, Nov 19, 2018 at 11:19 PM Rajeswari Kolluri <
> rajeswari.koll...@oracle.com> wrote:
>
> >
> > Hi Rahul
> >
> > Below is part of schema ,   entityid is my unique id field.  Getting
> > exception missing required field for  "category"  during atomic updates.
> >
> >
> > entityid
> >  > required="true" multiValued="false" />
> >  > required="false" multiValued="false" />
> >  > stored="true" required="false" multiValued="false" />
> >  > stored="true" required="false" multiValued="false" />
> >  > stored="true" required="false" multiValued="false" />
> >  > stored="true" required="false" multiValued="false" />
> >  > stored="true" required="false" multiValued="false" />
> >  > required="true" docValues="true" />
> >  > required="false" multiValued="true" />
> >
> >
> >
> > Thanks
> > Rajeswari
> >
> > -Original Message-
> > From: Rahul Goswami [mailto:rahul196...@gmail.com]
> > Sent: Tuesday, November 20, 2018 9:33 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Error:Missing Required Fields for Atomic Updates
> >
> > What’s your update query?
> >
> > You need to provide the unique id field of the document you are updating.
> >
> > Rahul
> >
> > On Mon, Nov 19, 2018 at 10:58 PM Rajeswari Kolluri <
> > rajeswari.koll...@oracle.com> wrote:
> >
> > > Hi,
> > >
> > >
> > >
> > >
> > >
> > > Using Solr 7.5.0.  While performing atomic updates on a document  on
> > > Solr Cloud using SolJ  getting exceptions "Missing Required Field".
> > >
> > >
> > >
> > > Please let me know  the solution, would not want to update the
> > > required fields during atomic updates.
> > >
> > >
> > >
> > >
> > >
> > > Thanks
> > >
> > > Rajeswari
> > >
> >
>

Re: Reg:- Create Solr Core Using Command Line

2018-02-05 Thread abhi Abhishek

Hello,
I followed the steps outlined in your mail. i was able to get a running
core up fine. only thing I can think of in your case is the config
directory having all the required files for the SOLR Core to get
initialized. can you check if you have all the SOLR config files in the
config directory ( i.e, schema.xml, solrconfig.xml, also various supporting
files referred in the schema.xml) you specified on the command line?

can you share the conf directory, if you can?

Cheers!
Abhishek

On Tue, Feb 6, 2018 at 9:30 AM, @Nandan@ 
wrote:

> Hi Sadiki,
> I checked Sample Techproduct Conf folder. Inside that folder, there are
> numerous files. So Again my question will be how those files came.
> I want to create core from Scratch and want to check and create each and
> every config files from my sides. Then only I can able to understand what
> and which files needs in different solr search function.
> I hope you can understand my query.
>
> Thanks
>
> On Tue, Feb 6, 2018 at 11:48 AM, Sadiki Latty  wrote:
>
> > If I'm not mistaken the command requires that the books_data folder
> > already exists with a conf folder inside and the various required files
> > (solrconfig.xml, solr.xml,etc). To get an idea of what you should have in
> > your conf folder you can check out the included configsets
> > (sample_techproducts_configs for example). These configsets have the
> > required files and you can copy and modify to accommodate your own
> needs. I
> > am not 100% sure where to find them on a windows installation but I
> believe
> > it would be C:\solr\server\configsets\  or another subfolder of the
> server
> > folder.
> >
> > -Original Message-
> > From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com]
> > Sent: Monday, February 5, 2018 9:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: Reg:- Create Solr Core Using Command Line
> >
> > Hi ,
> > This question might be very basic, but need to clarify my basic
> > understanding.
> > I am using Solr Version 7.2.1
> > I have one CSV file named as books_data.csv which contains 2 records.
> > Now I want to create solr core to start my basic search using Solr UI.
> > Steps which I Follow :-
> > 1) Go to bin directory and start solr
> > C:\solr\bin>solr start -p 8983
> > 2) books_data.csv is in C:\solr location
> > 3) Now I try to create solr core.
> > C:\solr\bin>solr create_core -c books_data -d C:\solr Got Error :- No
> Conf
> > Sub folder or Solrconfig.xml file present.
> > 4) Then I created folder "books_data" in C:\solr location and Created
> conf
> > subfolder under books_data folder and put solrconfig.xml inside conf
> > subfolder.
> > 5) Again start to execute query
> > C:\solr\bin>solr create_core -c books_data -d C:\solr\books_data Got
> Error
> > :- Already core existed.
> > When I checked Solr Admin UI , showing error message as SolrCore
> > Initialization Failures
> >
> >- *books_data:*
> >org.apache.solr.common.SolrException:org.apache.solr.
> > common.SolrException:
> >Could not load conf for core books_data: Error loading solr config
> from
> >C:\solr\bin\books_data\conf\solrconfig.xml
> >
> >
> > Please tell me where am I doing wrong?
> > Thanks.
> >
>

Re: Reg:- Create Solr Core Using Command Line

2018-02-06 Thread abhi Abhishek

you can try using the post Tool.

https://lucene.apache.org/solr/guide/6_6/post-tool.html

bin/post -c film example/books_data.csv


Cheers!
Abhishek


On Tue, Feb 6, 2018 at 1:22 PM, @Nandan@ 
wrote:

> Hi ,
> I created core name as "films". Now I am trying to insert my csv file by
> below step:-
> C:\solr>curl "http://localhost:8983/solr/films/update?commit=true";
> --data-binary @example/books_data.csv -H 'Content-type:application/csv'
> Got Below result.
> {
>   "responseHeader":{
> "status":0,
> "QTime":279}}
>
> But in Solr Admin UI, can't able to see any data.
> please tell me where am i wrong ?
> Thanks
>
>
> On Tue, Feb 6, 2018 at 1:42 PM, Shawn Heisey  wrote:
>
> > On 2/5/2018 10:39 PM, Shawn Heisey wrote:
> >
> >> In order for this solr script command to work, the argument to the -d
> >> option (which you have as C:\solr) would have to be a config directory,
> >> containing a minimum of solrconfig.xml and the schema.
> >>
> > Replying to myself because I made an error here.
> >
> > The directory provided with -d needs to contain a "conf" subdirectory,
> > which in turn must contain the files that I mentioned.
> >
> > Thanks,
> > Shawn
> >
>

SOLR 7.x stable version

2018-08-13 Thread abhi Abhishek

Hi All -
 I am using SOLR Cloud v6.5.0 and looking to upgrade it to SOLR 7.x;
any suggestions which are the most stable version in SOLR 7.x series.
from my initial reading, I see until SOLR 7.2 we had issues with CDCR
updates.

Thank you for your suggestions.

Thanks,
Abhishek

Inconsistent recovery status of replicas

2020-12-07 Thread Abhishek Mishra

Hello guys
I am using Solr cloud 7.7 on Kubernetes. During the adding of replica
sometimes we see inconsistency after successful addition nodes go to
recovery status sometimes it takes 2-3 minute to recover while sometimes it
takes more than an hour. We are getting this error.
We have 4 shards each shard has around 7GB of data. After seeing the system
metrics we see bandwidth exchanges are high between the leader and the new
replica node. Do we have any way to rate-limit the bandwidth exchange like
we had some configuration for it in master-slave? maxMbpersec something
like that?

Error

> 2020-12-01 13:40:34.983 ERROR 
> (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr
>  x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec 
> s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 
> x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy 
> Error while trying to 
> recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured 
> while waiting response from server at: 
> http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
>   at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: Read timed out
>   at java.base/java.net.SocketInputStream.socketRead0(Native Method)
>   at 
> java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
>   at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
>   at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
>   at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>   at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
>   at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>   at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>   at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
>   at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>

Migrating from solr 7.7 to solr 8.6 issues

2020-12-07 Thread Abhishek Mishra

We are trying to migrate from solr 7.7 to solr 8.6 on Kubernetes. We are
using zookeeper-3.4.13. While adding a replica to the cluster, it returns
500 status code. While in the background it is added sometimes successfully
while sometime it is in the inactive node. We are using http2 without SSL.

Error:

>  {

  "responseHeader":{
"status":500,
"QTime":307},
  "failure":{

"solr-pklatest-statefulset-pull-0.solr-pklatest-statefulset-headless.relevance:8983_solr":"org.apache.solr.client.solrj.SolrServerException:IOException
occured when talking to server at: null"},
  "Operation addreplica caused
exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
ADDREPLICA failed to create replica",
  "exception":{
"msg":"ADDREPLICA failed to create replica",
"rspCode":500},
  "error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"ADDREPLICA failed to create replica",
"trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to
create replica\n\tat
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:286)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:854)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:818)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThr

solrcloud with EKS kubernetes

2020-12-08 Thread Abhishek Mishra

Hello guys,
We are kind of facing some of the issues(Like timeout etc.) which are very
inconsistent. By any chance can it be related to EKS? We are using solr 7.7
and zookeeper 3.4.13. Should we move to ECS?

Regards,
Abhishek

Re: solrcloud with EKS kubernetes

2020-12-13 Thread Abhishek Mishra

Hi Houston,
Sorry for the late reply. Each shard has a 9GB size around.
Yeah, we are providing enough resources to pods. We are currently
using c5.4xlarge.
XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
No, I haven't run it outside Kubernetes. But I do have colleagues who did
the same on 7.2 and didn't face any issue regarding it.
Storage volume is gp2 50GB.
It's not the search query where we are facing inconsistencies or timeouts.
Seems some internal admin APIs sometimes have issues. So while adding new
replica in clusters sometimes result in inconsistencies. Like recovery
takes some time more than one hour.

Regards,
Abhishek

On Thu, Dec 10, 2020 at 10:23 AM Houston Putman 
wrote:

> Hello Abhishek,
>
> It's really hard to provide any advice without knowing any information
> about your setup/usage.
>
> Are you giving your Solr pods enough resources on EKS?
> Have you run Solr in the same configuration outside of kubernetes in the
> past without timeouts?
> What type of storage volumes are you using to store your data?
> Are you using headless services to connect your Solr Nodes, or ingresses?
>
> If this is the first time that you are using this data + Solr
> configuration, maybe it's just that your data within Solr isn't optimized
> for the type of queries that you are doing.
> If you have run it successfully in the past outside of Kubernetes, then I
> would look at the resources that you are giving your pods and the storage
> volumes that you are using.
> If you are using Ingresses, that might be causing slow connections between
> nodes, or between your client and Solr.
>
> - Houston
>
> On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> wrote:
>
> > Hello guys,
> > We are kind of facing some of the issues(Like timeout etc.) which are
> very
> > inconsistent. By any chance can it be related to EKS? We are using solr
> 7.7
> > and zookeeper 3.4.13. Should we move to ECS?
> >
> > Regards,
> > Abhishek
> >
>

Custom cache for Solr Cloud mode

2019-06-04 Thread Abhishek Dnyate

Hi,

I am trying make use of User Defined cache functionality to optimise a 
particular workflow.
We are using Solr 7.4.

Step 1. I noticed, first we would have to add Custom Cache entry in 
solrconfig.xml.
What’s its Config API alternative for solrCould ?
I couldn’t find one at, 
https://lucene.apache.org/solr/guide/7_4/config-api.html (or may be I missed 
it).
Could anyone point out to some link ?

Step 2. In step 2, to insert the required cache, I could see there is 
cacheInsert() method available for SolrIndexSearcher class.
I am not sure how to build object for this class.
I started with CoreContainer object, which just needs SOLR_HOME for 
initialisation.
>From this I was trying to get SolrCore objects.
And then I was trying to build object of SolrIndexSearcher from above SolrCore 
class objects :
=> SolrIndexSearcher newSearcher = new SolrIndexSearcher(_core, 
_core.getNewIndexDir(), _core.getLatestSchema(),

_core.getSolrConfig().indexConfig, “test query", false, 
_core.getDirectoryFactory());


But getAllCoreNames return me empty list of SolrCore objects. So it didn’t work.
Not sure, what am I missing , any pointer would be greatly appreciated.


Regards,
Abhishek

Re: solrcloud with EKS kubernetes

2020-12-23 Thread Abhishek Mishra

Hi Jonathan,
Merry Christmas.
Thanks for the suggestion. To manage IOPS can we do something on
rate-limiting behalf?

Regards,
Abhishek


On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:

> Hi Abhishek,
>
> We're running Solr Cloud 8.6 on GKE.
> 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> configured, all with anti-affinity so they never exist on the same node.
> It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> disk usage on each node is ~54gb (we've got all the shards replicated to
> all nodes)
>
> We're also using a 200gb zonal SSD, which *has* been necessary just so that
> we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for
> read & write each, and 96MB/s for read & write each)
>
> Various lessons learnt...
> You definitely don't want them ever on the same kubernetes node. From a
> resilience perspective, yes, but also when one SOLR node gets busy, they
> tend to all get busy, so now you'll have resource contention. Recovery can
> also get very busy and resource intensive, and again, sitting on the same
> node is problematic. We also saw the need to move to SSDs because of how
> IOPS bound we were.
>
> Did I mention use SSDs? ;)
>
> Good luck!
>
> On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> wrote:
>
> > Hi Houston,
> > Sorry for the late reply. Each shard has a 9GB size around.
> > Yeah, we are providing enough resources to pods. We are currently
> > using c5.4xlarge.
> > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > No, I haven't run it outside Kubernetes. But I do have colleagues who did
> > the same on 7.2 and didn't face any issue regarding it.
> > Storage volume is gp2 50GB.
> > It's not the search query where we are facing inconsistencies or
> timeouts.
> > Seems some internal admin APIs sometimes have issues. So while adding new
> > replica in clusters sometimes result in inconsistencies. Like recovery
> > takes some time more than one hour.
> >
> > Regards,
> > Abhishek
> >
> > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman  >
> > wrote:
> >
> > > Hello Abhishek,
> > >
> > > It's really hard to provide any advice without knowing any information
> > > about your setup/usage.
> > >
> > > Are you giving your Solr pods enough resources on EKS?
> > > Have you run Solr in the same configuration outside of kubernetes in
> the
> > > past without timeouts?
> > > What type of storage volumes are you using to store your data?
> > > Are you using headless services to connect your Solr Nodes, or
> ingresses?
> > >
> > > If this is the first time that you are using this data + Solr
> > > configuration, maybe it's just that your data within Solr isn't
> optimized
> > > for the type of queries that you are doing.
> > > If you have run it successfully in the past outside of Kubernetes,
> then I
> > > would look at the resources that you are giving your pods and the
> storage
> > > volumes that you are using.
> > > If you are using Ingresses, that might be causing slow connections
> > between
> > > nodes, or between your client and Solr.
> > >
> > > - Houston
> > >
> > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> > > wrote:
> > >
> > > > Hello guys,
> > > > We are kind of facing some of the issues(Like timeout etc.) which are
> > > very
> > > > inconsistent. By any chance can it be related to EKS? We are using
> solr
> > > 7.7
> > > > and zookeeper 3.4.13. Should we move to ECS?
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > >
> >
>

How pull replica works

2021-01-05 Thread Abhishek Mishra

I want to know how pull replica replicate from leader in real? Does
internally admin API get data from the leader in form of batches?

Regards,
Abhishek

Re: How pull replica works

2021-01-07 Thread Abhishek Mishra

Thanks, Tomas. It was really helpful.
Regards,
Abhishek

On Thu, Jan 7, 2021 at 7:03 AM Tomás Fernández Löbbe 
wrote:

> Hi Abhishek,
> The pull replicas uses the "/replication" endpoint to copy full segment
> files (sections of the index) from the leader. It works in a similar way to
> the legacy leader/follower replication. This[1] talk tries to explain the
> different replica types and how they work.
>
> HTH,
>
> Tomás
>
> [1] https://www.youtube.com/watch?v=C8C9GRTCSzY
>
> On Tue, Jan 5, 2021 at 10:29 PM Abhishek Mishra 
> wrote:
>
> > I want to know how pull replica replicate from leader in real? Does
> > internally admin API get data from the leader in form of batches?
> >
> > Regards,
> > Abhishek
> >
>

Re: Solr background merge in case of pull replicas

2021-01-07 Thread Abhishek Mishra

Hi Kshitij

What I can guess over here. Pull replicas replicate segments from tlog, so
whenever merge happens on tlog it will decrease the number of segments
which is more than ideal case(i.e. adding a new segment). Afaik
adding/deleting the segment is kind of a stop the world moment. This can be
the reason for the increase in response time.

Regards,
Abhishek

On Thu, Jan 7, 2021 at 12:43 PM kshitij tyagi 
wrote:

> Hi,
>
> I am not querying on tlog replicas, solr version is 8.6 and 2 tlogs and 4
> pull replica setup.
>
> why should pull replicas be affected during background segment merges?
>
> Regards,
> kshitij
>
> On Wed, Jan 6, 2021 at 9:48 PM Ritvik Sharma 
> wrote:
>
> > Hi
> > It may be the cause of rebalancing and querying is not available not on
> > tlog at that moment.
> > You can check tlog logs and pull log when u are facing this issue.
> >
> > May i know which version of solr you are using? and what is the ration of
> > tlog and pull nodes.
> >
> > On Wed, 6 Jan 2021 at 2:46 PM, kshitij tyagi 
> > wrote:
> >
> > > Hi,
> > >
> > > I am having a  tlog + pull replica solr cloud setup.
> > >
> > > 1. I am observing that whenever background segment merge is triggered
> > > automatically, i see high response time on all of my solr nodes.
> > >
> > > As far as I know merges must be happening on tlog and hence the
> increase
> > > response time, i am not able to understand that why my pull replicas
> are
> > > affected during background index merges.
> > >
> > > Can someone give some insights on this? What is affecting my pull
> > replicas
> > > during index merges?
> > >
> > > Regards,
> > > kshitij
> > >
> >
>

1 2 >

1 - 100 of 102 matches

Mail list logo