Nutch and Solr search on the fly
Hi all, I am a newbie to nutch and solr. Well relatively much newer to Solr than Nutch :) I have been using nutch for past two weeks, and I wanted to know if I can query or search on my nutch crawls on the fly(before it completes). I am asking this because the websites I am crawling are really huge and it takes around 3-4 days for a crawl to complete. I want to analyze some quick results while the nutch crawler is still crawling the URLs. Some one suggested me that Solr would make it possible. I followed the steps in http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By this process, I see only the injected URLs are shown in the Solr search. I know I did something really foolish and the crawl never happened, I feel I am missing some information here. I think somewhere in the process there should be a crawling happening and I missed it out. Just wanted to see if some one could help me pointing this out and where I went wrong in the process. Forgive my foolishness and thanks for your patience. Cheers, Abi
Re: Nutch and Solr search on the fly
Hi Markus, I am sorry for not being clear, I meant to say that... Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in turn contain links to a.html, b.html, c.html, d.html) is injected into the seed.txt, after the whole process I was expecting a bunch of other pages which crawled from this seed url. However, at the end of it all I see is the contents from only this page namely www.somehost.com/gifts/greetingcard.htmland I do not see any other pages(here a.html, b.html, c.html, d.html) crawled from this one. The crawling happens only for the URLs mentioned in the seed.txt and does not proceed further from there. So I am just bit confused. Why is it not crawling the linked pages(a.html, b.html, c.html and d.html). I get a feeling that I am missing something that the author of the blog( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed everyone would know. Thanks, Abi On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma wrote: > The parsed data is only sent to the Solr index of you tell a segment to be > indexed; solrindex > > If you did this only once after injecting and then the consequent > fetch,parse,update,index sequence then you, of course, only see those > URL's. > If you don't index a segment after it's being parsed, you need to do it > later > on. > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: > > Hi all, > > > > I am a newbie to nutch and solr. Well relatively much newer to Solr than > > Nutch :) > > > > I have been using nutch for past two weeks, and I wanted to know if I > can > > query or search on my nutch crawls on the fly(before it completes). I am > > asking this because the websites I am crawling are really huge and it > takes > > around 3-4 days for a crawl to complete. I want to analyze some quick > > results while the nutch crawler is still crawling the URLs. Some one > > suggested me that Solr would make it possible. > > > > I followed the steps in > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for this. By > > this process, I see only the injected URLs are shown in the Solr search. > I > > know I did something really foolish and the crawl never happened, I feel > I > > am missing some information here. I think somewhere in the process there > > should be a crawling happening and I missed it out. > > > > Just wanted to see if some one could help me pointing this out and where > I > > went wrong in the process. Forgive my foolishness and thanks for your > > patience. > > > > Cheers, > > Abi > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >
Re: Nutch and Solr search on the fly
Hi Erick, Thanks a bunch for the response Could be a chance..but all I am wondering is where to specify the depth in the whole entire process in the URL http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried specifying it during the fetcher phase but it was just ignored :( Thanks, Abi On Wed, Feb 9, 2011 at 10:11 PM, Erick Erickson wrote: > WARNING: I don't do Nutch much, but could it be that your > crawl depth is 1? See: > http://wiki.apache.org/nutch/NutchTutorial > > <http://wiki.apache.org/nutch/NutchTutorial>and search for "depth" > Best > Erick > > On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. wrote: > > > Hi Markus, > > > > I am sorry for not being clear, I meant to say that... > > > > Suppose if a url namely > > www.somehost.com/gifts/greetingcard.html(which<http://www.somehost.com/gifts/greetingcard.html%28which>in > > turn contain links to a.html, b.html, c.html, d.html) is injected into > the > > seed.txt, after the whole process I was expecting a bunch of other pages > > which crawled from this seed url. However, at the end of it all I see is > > the > > contents from only this page namely > > www.somehost.com/gifts/greetingcard.htmland I do not see any other > > pages(here a.html, b.html, c.html, d.html) > > crawled from this one. > > > > The crawling happens only for the URLs mentioned in the seed.txt and > does > > not proceed further from there. So I am just bit confused. Why is it not > > crawling the linked pages(a.html, b.html, c.html and d.html). I get a > > feeling that I am missing something that the author of the blog( > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed > > everyone would know. > > > > Thanks, > > Abi > > > > > > On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma < > markus.jel...@openindex.io > > >wrote: > > > > > The parsed data is only sent to the Solr index of you tell a segment to > > be > > > indexed; solrindex > > > > > > If you did this only once after injecting and then the consequent > > > fetch,parse,update,index sequence then you, of course, only see those > > > URL's. > > > If you don't index a segment after it's being parsed, you need to do it > > > later > > > on. > > > > > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: > > > > Hi all, > > > > > > > > I am a newbie to nutch and solr. Well relatively much newer to Solr > > than > > > > Nutch :) > > > > > > > > I have been using nutch for past two weeks, and I wanted to know if > I > > > can > > > > query or search on my nutch crawls on the fly(before it completes). I > > am > > > > asking this because the websites I am crawling are really huge and it > > > takes > > > > around 3-4 days for a crawl to complete. I want to analyze some quick > > > > results while the nutch crawler is still crawling the URLs. Some one > > > > suggested me that Solr would make it possible. > > > > > > > > I followed the steps in > > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for > this. > > By > > > > this process, I see only the injected URLs are shown in the Solr > > search. > > > I > > > > know I did something really foolish and the crawl never happened, I > > feel > > > I > > > > am missing some information here. I think somewhere in the process > > there > > > > should be a crawling happening and I missed it out. > > > > > > > > Just wanted to see if some one could help me pointing this out and > > where > > > I > > > > went wrong in the process. Forgive my foolishness and thanks for your > > > > patience. > > > > > > > > Cheers, > > > > Abi > > > > > > -- > > > Markus Jelsma - CTO - Openindex > > > http://www.linkedin.com/in/markus17 > > > 050-8536620 / 06-50258350 > > > > > >
Re: Nutch and Solr search on the fly
Hi Charan, Thanks for the clarifications. The link I have been referring to( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say anything about using the crawl? Do I have to do it after the last step mentioned? Thanks, Abi On Thu, Feb 10, 2011 at 12:58 AM, charan kumar wrote: > Hi Abishek, > > depth is a param of crawl command, not fetch command > > If you are using custom script calling individual stages of nutch crawl, > then depth N means , you running that script for N times.. You can put a > loop, in the script. > > Thanks, > Charan > > On Wed, Feb 9, 2011 at 6:26 AM, .: Abhishek :. wrote: > > > Hi Erick, > > > > Thanks a bunch for the response > > > > Could be a chance..but all I am wondering is where to specify the depth > in > > the whole entire process in the URL > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried > > specifying it during the fetcher phase but it was just ignored :( > > > > Thanks, > > Abi > > > > On Wed, Feb 9, 2011 at 10:11 PM, Erick Erickson > >wrote: > > > > > WARNING: I don't do Nutch much, but could it be that your > > > crawl depth is 1? See: > > > http://wiki.apache.org/nutch/NutchTutorial > > > > > > <http://wiki.apache.org/nutch/NutchTutorial>and search for "depth" > > > Best > > > Erick > > > > > > On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. > > wrote: > > > > > > > Hi Markus, > > > > > > > > I am sorry for not being clear, I meant to say that... > > > > > > > > Suppose if a url namely > > www.somehost.com/gifts/greetingcard.html(which<http://www.somehost.com/gifts/greetingcard.html%28which> > <http://www.somehost.com/gifts/greetingcard.html%28which> > > <http://www.somehost.com/gifts/greetingcard.html%28which>in > > > > turn contain links to a.html, b.html, c.html, d.html) is injected > into > > > the > > > > seed.txt, after the whole process I was expecting a bunch of other > > pages > > > > which crawled from this seed url. However, at the end of it all I see > > is > > > > the > > > > contents from only this page namely > > > > www.somehost.com/gifts/greetingcard.htmland I do not see any other > > > > pages(here a.html, b.html, c.html, d.html) > > > > crawled from this one. > > > > > > > > The crawling happens only for the URLs mentioned in the seed.txt and > > > does > > > > not proceed further from there. So I am just bit confused. Why is it > > not > > > > crawling the linked pages(a.html, b.html, c.html and d.html). I get a > > > > feeling that I am missing something that the author of the blog( > > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) assumed > > > > everyone would know. > > > > > > > > Thanks, > > > > Abi > > > > > > > > > > > > On Wed, Feb 9, 2011 at 7:09 PM, Markus Jelsma < > > > markus.jel...@openindex.io > > > > >wrote: > > > > > > > > > The parsed data is only sent to the Solr index of you tell a > segment > > to > > > > be > > > > > indexed; solrindex > > > > > > > > > > If you did this only once after injecting and then the consequent > > > > > fetch,parse,update,index sequence then you, of course, only see > those > > > > > URL's. > > > > > If you don't index a segment after it's being parsed, you need to > do > > it > > > > > later > > > > > on. > > > > > > > > > > On Wednesday 09 February 2011 04:29:44 .: Abhishek :. wrote: > > > > > > Hi all, > > > > > > > > > > > > I am a newbie to nutch and solr. Well relatively much newer to > > Solr > > > > than > > > > > > Nutch :) > > > > > > > > > > > > I have been using nutch for past two weeks, and I wanted to know > > if > > > I > > > > > can > > > > > > query or search on my nutch crawls on the fly(before it > completes). > > I > > > > am > > > > > > asking this because the websites I am crawling are really huge > and > > it > > > > > takes > > > > > > around 3-4 days for a crawl to complete. I want to analyze some > > quick > > > > > > results while the nutch crawler is still crawling the URLs. Some > > one > > > > > > suggested me that Solr would make it possible. > > > > > > > > > > > > I followed the steps in > > > > > > http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ for > > > this. > > > > By > > > > > > this process, I see only the injected URLs are shown in the Solr > > > > search. > > > > > I > > > > > > know I did something really foolish and the crawl never happened, > I > > > > feel > > > > > I > > > > > > am missing some information here. I think somewhere in the > process > > > > there > > > > > > should be a crawling happening and I missed it out. > > > > > > > > > > > > Just wanted to see if some one could help me pointing this out > and > > > > where > > > > > I > > > > > > went wrong in the process. Forgive my foolishness and thanks for > > your > > > > > > patience. > > > > > > > > > > > > Cheers, > > > > > > Abi > > > > > > > > > > -- > > > > > Markus Jelsma - CTO - Openindex > > > > > http://www.linkedin.com/in/markus17 > > > > > 050-8536620 / 06-50258350 > > > > > > > > > > > > > > >
Re: Custom cache for Solr Cloud mode
Thanks for the response. Eric, Are you suggesting to download this file from zookeeper, and upload it after changing ? Mikhail, Thanks. I will try solrCore.SolrConfg.userCacheConfigs option. Any idea why, CoreContainer->getCores() would be returning empty list for me ? (CoreAdminRequest.setAction(CoreAdminAction.STATUS); CoreAdminRequest.process(solrClient); -> gives me list of cores correctly) -Abhishek -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
new data structure for some fields
Hello all i am facing some kind of requirement that where for an id p1 is associated with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We need to sort the query of solr on the basis of b1/b2/b3/b4 depending on given category_id . Right now we mapped the category_ids into multi-valued attribute. [c1,c2,c3,c4] something like this. we are querying into it. But from now we also need to find which integer b1,b2,b3.. associated with given category and also sort the whole query on it. sorry for any typos.. Regards Abhishek
Re: new data structure for some fields
hi binoy thanks for reply. I mean by sort is to sort the data-sets on the basis of integers values given for that category. For any document let say for an id P1, category associated is c1,c2,c3,c4 (using multivalued field) For new implementation similarly a number is associated with each category. let say c1---b1,c2---b2,c3---b3,c4---b4. now when we querying into solr for the ids which have c1 in their categories. (q=category_id:c1) now i want the result of this query sorted on the basis of number(b) associated with it throughout the result.. number of association is usually less than 20 (means an id can't be mapped more than 20 category_ids) On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal wrote: > When you say sort, do you mean search on the basis of category and > integers? Or score the docs based on their category and integer values? > > Also, for any given document, how many categories or integers are > associated with it? > > On Mon, 21 Dec 2015, 14:43 Abhishek Mishra wrote: > > > Hello all > > > > i am facing some kind of requirement that where for an id p1 is > associated > > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. We > need > > to sort the query of solr on the basis of b1/b2/b3/b4 depending on given > > category_id . Right now we mapped the category_ids into multi-valued > > attribute. [c1,c2,c3,c4] something like this. we are querying into it. > But > > from now we also need to find which integer b1,b2,b3.. associated with > > given category and also sort the whole query on it. > > > > > > sorry for any typos.. > > > > Regards > > Abhishek > > > -- > Regards, > Binoy Dalal >
Re: new data structure for some fields
Hi binoy it will not work as category and integer is one to one mapping so if category_id is multivalued same goes to integer also. and you need some kind of mechanism which will identify which integer to pick given to category_id for search thenafter you can implement sort according to it. On Mon, Dec 21, 2015 at 5:27 PM, Binoy Dalal wrote: > Small edit: > The sort parameter in the solrconfig goes in the request handler > declaration that you're using. So if it's select, put in the name="defaults"> list. > > On Mon, 21 Dec 2015, 17:21 Binoy Dalal wrote: > > > OK. You will only be able to sort based on the integers if the integer > > field is single valued, I.e. only one integer is associated with one > > category I'd. > > > > To do this you've to use the sort parameter. > > You can either specify it in your solrconfig.XML like so: > > integer asc > > Field name followed by the order - asc/desc > > > > Or you can specify the it along with our query by appending it to your > > query like so: > > /select?q=query&sort=integet%20asc > > > > If you want to apply these sorting rules for all docs, then specify the > > sorting in your solrconfig. If you only want It for a certain subset then > > apply the parameter from code at the app level > > > > On Mon, 21 Dec 2015, 16:49 Abhishek Mishra wrote: > > > >> hi binoy > >> thanks for reply. I mean by sort is to sort the data-sets on the basis > of > >> integers values given for that category. > >> For any document let say for an id P1, > >> category associated is c1,c2,c3,c4 (using multivalued field) > >> For new implementation > >> similarly a number is associated with each category. let say > >> c1---b1,c2---b2,c3---b3,c4---b4. > >> now when we querying into solr for the ids which have c1 in their > >> categories. (q=category_id:c1) now i want the result of this query > sorted > >> on the basis of number(b) associated with it throughout the result.. > >> > >> number of association is usually less than 20 (means an id can't be > mapped > >> more than 20 category_ids) > >> > >> > >> On Mon, Dec 21, 2015 at 3:59 PM, Binoy Dalal > >> wrote: > >> > >> > When you say sort, do you mean search on the basis of category and > >> > integers? Or score the docs based on their category and integer > values? > >> > > >> > Also, for any given document, how many categories or integers are > >> > associated with it? > >> > > >> > On Mon, 21 Dec 2015, 14:43 Abhishek Mishra > >> wrote: > >> > > >> > > Hello all > >> > > > >> > > i am facing some kind of requirement that where for an id p1 is > >> > associated > >> > > with some category_ids c1,c2,c3,c4 with some integers b1,b2,b3,b4. > We > >> > need > >> > > to sort the query of solr on the basis of b1/b2/b3/b4 depending on > >> given > >> > > category_id . Right now we mapped the category_ids into multi-valued > >> > > attribute. [c1,c2,c3,c4] something like this. we are querying into > it. > >> > But > >> > > from now we also need to find which integer b1,b2,b3.. associated > with > >> > > given category and also sort the whole query on it. > >> > > > >> > > > >> > > sorry for any typos.. > >> > > > >> > > Regards > >> > > Abhishek > >> > > > >> > -- > >> > Regards, > >> > Binoy Dalal > >> > > >> > > -- > > Regards, > > Binoy Dalal > > > -- > Regards, > Binoy Dalal >
Stable Versions in Solr 4
Hi All, i am trying to determine stable version of SOLR 4. is there a blog which we can refer.. i understand we can read through Release Notes. I am interested in user reviews and challenges seen with various versions of SOLR 4. Appreciate your contribution. Thanks, Abhishek
Determine if Merge is triggered in SOLR
Hi All, is there a way in SOLR to determine if a merge has been triggered in SOLR? is there a API exposed to query this? if its not available is there a way to do the same using lucene jar files available in the SOLR libs? Appreciate your help. Best Regards, Abhishek
Re: Determine if Merge is triggered in SOLR
Hi All, any suggestions/ ideas? Thanks, Abhishek On Tue, Jan 26, 2016 at 9:16 PM, abhi Abhishek wrote: > Hi All, > is there a way in SOLR to determine if a merge has been triggered in > SOLR? is there a API exposed to query this? > > if its not available is there a way to do the same using lucene jar files > available in the SOLR libs? > > Appreciate your help. > > Best Regards, > Abhishek >
Need a group custom function(fieldcollapsing)
Hi all We are running on solr5.2.1 . Now the requirement come that we need the data on basis on some algo. The algorithm part we need to put on result obtained from query. So best we can do is using group.field,group.main,group.func. In group.func we need to use custom function which will run the algorithm part. My doubts are where we need to put custom function in which file??. I found some articles related to this https://dzone.com/articles/how-write-custom-solr in this it's not explained where to put the code part in which file. Regards, Abhishek
Re: Need a group custom function(fieldcollapsing)
Any update on this??? On Mon, Mar 14, 2016 at 4:06 PM, Abhishek Mishra wrote: > Hi all > We are running on solr5.2.1 . Now the requirement come that we need the > data on basis on some algo. The algorithm part we need to put on result > obtained from query. So best we can do is using > group.field,group.main,group.func. In group.func we need to use custom > function which will run the algorithm part. My doubts are where we need to > put custom function in which file??. I found some articles related to this > https://dzone.com/articles/how-write-custom-solr > in this it's not explained where to put the code part in which file. > > > Regards, > Abhishek >
Solr 4 replication
Hi all, Is solr 4 replication push or pull? Best Regards, Abhishek
Re: Solr 4 replication
Thanks MIkhail. is there a way to have a push Replication. any Contributions or Anything what could in this case? Thanks, Abhishek On Tue, Apr 5, 2016 at 1:29 AM, Mikhail Khludnev wrote: > It's pull, but you can trigger pulling. > > On Mon, Apr 4, 2016 at 9:19 PM, abhi Abhishek wrote: > > > Hi all, > > Is solr 4 replication push or pull? > > > > Best Regards, > > Abhishek > > > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > >
SOLR Upgrade 3.x to 4.10
Hi All, I have SOLR 3.6 running currently, i am planning to upgrade this to SOLR 4.10. Below were the thoughts we could come up with. 1. in place upgrade I would be making the SOLR 4.10 slave of 3.6 and copy the indexes, and optimize this index. will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene 4.10) change the index structure to Lucene 4.10? if not what would be the version? if i enable docvalues on certain fields before issuing optimize, will it be able to incorporate ( create .dvd & .dvm files ) that in the newly created index? 2. Re-Index the data Seeking advice for minimum time to upgrade this with most features of SOLR 4.10 Thanks in Advance Best Regards, Abhishek
Re: SOLR Upgrade 3.x to 4.10
Thanks Erick and Shawn for the input. it makes more sense to move to SOLR 5.x but we would like to get there in few iterations gradually making incremental changes to have a smooth cut over. our index size is 3TB (10 shards of 300G each), i was looking for a alternate route which would save me from pain of re-indexing. any thoughts for the same would help. Best Regards, Abhishek On Wed, Apr 13, 2016 at 6:18 AM, Shawn Heisey wrote: > On 4/12/2016 6:10 AM, abhi Abhishek wrote: > > I have SOLR 3.6 running currently, i am planning to upgrade this to > > SOLR 4.10. Below were the thoughts we could come up with. > > > > 1. in place upgrade > >I would be making the SOLR 4.10 slave of 3.6 and copy the indexes, > > and optimize this index. > > > > will optimizing the Lucene 3.3 index on SOLR 4 instance(with Lucene > > 4.10) change the index structure to Lucene 4.10? if not what would be the > > version? > > Yes, the optimize will change the index structure, but the contents of > the index will not change, even if changes in Solr's analysis components > would have resulted in different info going into the index based on your > schema. Because the *query* analysis may also change with the upgrade, > this might cause queries to no longer work the same, unless you reindex > and verify that your analysis still does what you require. A few > changes to analysis components in later versions can be changed back to > earlier behavior with luceneMatchVersion, but this typically only > happens with big changes -- such as the major bugfix for > WordDelimiterFilterFactory in version 4.8. > > Reindexing for all upgrades is recommended when possible. > > > if i enable docvalues on certain fields before issuing optimize, > will > > it be able to incorporate ( create .dvd & .dvm files ) that in the newly > > created index? > > No. You must entirely reindex to add docValues. Optimize just rewrites > what's already present in the Lucene index. > > > 2. Re-Index the data > > > > Seeking advice for minimum time to upgrade this with most features of > SOLR > > 4.10 > > This is impossible to answer. It will depend on how long it takes to > index your data. That is very difficult to predict even if a lot of > information is available. > > Thanks, > Shawn > >
working of Sharded Query in SOLR 3.6
Hi, I have a question about the distributed Querying in solr ( https://wiki.apache.org/solr/DistributedSearch), let us consider the below call being made to solr server. https://server1:8080/solr/core1/select?shards=server1:8080/solr/core1,server2:8070 /solr/core2,server3:8090/solr/core3&q=*:*&rows=10*&start=0 please correct if my understanding of the query processing here is incorrect! server1 acts as the master server for this request, and would spawn requests toserver1, server2 and server3 for the given query and would wait for the response from all the requests to return the response back to client. if this is the case(server1 waits on all the sharded calls to respond) how would it join all the results from all the sharded calls? if this is not how it does the processing can you please help me in understanding the same. Thanks in advance. Thanks and Best Regards, Abhishek Das
Re: working of Sharded Query in SOLR 3.6
Hi, Thanks for the reply Shawn and Mugeesh. I was just trying to understand the working of Distributed Querying in SOLR. Thanks, Abhishek Das On Wed, Sep 9, 2015 at 8:18 PM, Mugeesh Husain wrote: > You are correct for distributed search. > do worry care about join, solr will aggregate results from all core. > share your requirement what you want ? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/working-of-Sharded-Query-in-SOLR-3-6-tp4227952p4227979.html > Sent from the Solr - User mailing list archive at Nabble.com. >
SOLR Backup and Restore - Solr 3.6.1
Hello, we have solr 3.6.1 in our environment. we are trying to analyse backup and recovery solutions for the same. is there a way to compress the backup taken? we have explored about replicationHandler with backup command. but as our index is in 100's of GB's we would like a solution that provides compression to reduce storage overhead. thanks in advance Regards, Abhishek
data import
solr indexing taking too much time . What should i do to reduce time . working on solr 4.0.
not able to import Data through DIH solr 4.2.1
Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1
Re: not able to import Data through DIH solr 4.2.1
Alex thanks for replying my solrconfig : < lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" /> ## data-config-new.xml On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch wrote: > > Could not load driver: com.mysql.jdbc.Driver > > Looks like a custom driver. Is the driver name correct? Is the library > declared in solrconfig.xml? Is the library path correct (use absolute > path if in doubt). > > Regards, >Alex. > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 19 March 2015 at 00:35, abhishek tiwari wrote: > > Please provide the basic steps to resolve the issue > > > > > > Getting following error > > > > Full Import failed:java.lang.RuntimeException: > > java.lang.RuntimeException: > > org.apache.solr.handler.dataimport.DataImportHandlerException: Could > > not load driver: com.mysql.jdbc.Driver Processing Document # 1 >
Re: not able to import Data through DIH solr 4.2.1
but still not working On Thu, Mar 19, 2015 at 10:41 AM, Alexandre Rafalovitch wrote: > Try absolute path to the jar directory. Hard to tell whether relative > path is correct without knowing exactly how you are running it. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 19 March 2015 at 01:00, abhishek tiwari wrote: > > Alex thanks for replying > > my solrconfig : > > > > > < > > lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" /> > > > > > > ## > > > "org.apache.solr.handler.dataimport.DataImportHandler"> name="defaults" > >> data-config-new.xml > > > > > > On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch < > arafa...@gmail.com> > > wrote: > > > >> > Could not load driver: com.mysql.jdbc.Driver > >> > >> Looks like a custom driver. Is the driver name correct? Is the library > >> declared in solrconfig.xml? Is the library path correct (use absolute > >> path if in doubt). > >> > >> Regards, > >>Alex. > >> > >> > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > >> http://www.solr-start.com/ > >> > >> > >> On 19 March 2015 at 00:35, abhishek tiwari > wrote: > >> > Please provide the basic steps to resolve the issue > >> > > >> > > >> > Getting following error > >> > > >> > Full Import failed:java.lang.RuntimeException: > >> > java.lang.RuntimeException: > >> > org.apache.solr.handler.dataimport.DataImportHandlerException: Could > >> > not load driver: com.mysql.jdbc.Driver Processing Document # 1 > >> >
Re: data import
Hi , - architecture : master (1) - slave(3) solrconfig: 500 15000 false schema : < field name="selling_price" type="tfloat" indexed="true" stored="true" /> < field name="third_price" type="tfloat" indexed="true" stored="true" /> < field name="discount_percentage" type="tfloat" indexed="true" stored="true" /> < field name="sort_2" type="tint" indexed="true" stored="true" /> < field name="show_metacategory" type="variantFacet" indexed="true" stored= "true" /> < field name="products" type="tint" indexed="true" stored="true" /> < field name="by_drive_supported" type="text_path_new" indexed="true" stored= "true" multiValued="true"/> < field name="by_primary_camera" type="text_path_new" indexed="true" stored= "true" multiValued="true"/>< field name="by_dial_shape" type="text_path_new" indexed="true" stored="true" multiValued="true"/> < field name="by_features" type="text_path_new" indexed="true" stored="true" multiValued="true"/>< field name="speaker_configuration" type="text_path_new" indexed="true" stored="true" multiValued="true"/> id < copyField source="product" dest="product_keyword"/> < copyField source="list_price" dest="text"/> < copyField source="seo_name" dest="text"/> On Fri, Mar 13, 2015 at 2:25 PM, Antonio Jesús Sánchez Padial < antonio.sanc...@inia.es> wrote: > Maybe you should add some info about: > > - your architecture, number of servers, etc > - your schema.xml > - and the data (ammount, type, ...) you are indexing > > Best. > > El 13/03/2015 a las 9:37, abhishek tiwari escribió: > > solr indexing taking too much time . >> >> What should i do to reduce time . working on solr 4.0. >> >> > -- > Antonio Jesús Sánchez Padial > Jefe del Servicio de Biometría > antonio.sanc...@inia.es > Tlfno: +34 91 347 6831 > Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria > Ctra.m de La Coruña, km.7 > 28040 Madrid > >
SOLR Index in shared/Network folder
Greetings, I am trying to use a network shared location as my index directory. are there any known problems in using a Network File System for running a SOLR Instance? Thanks in Advance. Best Regards, Abhishek
ZFS File System for SOLR 3.6 and SOLR 4
Hello, i am trying to use ZFS as filesystem for my Linux Environment. are there any performance implications of using any filesystem other than ext-3/ext-4 with SOLR? Thanks in Advance Best Regards, Abhishek
Re: SOLR Index in shared/Network folder
Hello, Thanks for the suggestions. My aim is to reduce the disk space usage. I have 1 master with 2 slave configured, where slaves are used for searching and master ingests new data replicated to slaves, but as my index size is in 100's of GB we see 3x times space overhead. i would like to reduce this overhead, can you suggest something for this? Thanks in Advance Best Regards, Abhishek On Sat, Mar 28, 2015 at 12:13 AM, Erick Erickson wrote: > To pile on: If you're talking about pointing two Solr instances at the > _same_ index, it doesn't matter whether you are on NFS or not, you'll > have all sorts of problems. And if this is a SolrCloud installation, > it's particularly hard to get right. > > Please do not do this unless you have a very good reason, and please > tell us what the reason is so we can perhaps suggest alternatives. > > Best, > Erick > > On Fri, Mar 27, 2015 at 8:08 AM, Walter Underwood > wrote: > > Several years ago, I accidentally put Solr indexes on an NFS volume and > it was 100X slower. > > > > If you have enough RAM, query speed should be OK, but startup time > (loading indexes into file buffers) could be really long. Indexing could be > quite slow. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > > On Mar 26, 2015, at 11:31 PM, Shawn Heisey wrote: > > > >> On 3/27/2015 12:06 AM, abhi Abhishek wrote: > >>> Greetings, > >>> I am trying to use a network shared location as my index > directory. > >>> are there any known problems in using a Network File System for > running a > >>> SOLR Instance? > >> > >> It is not recommended. You will probably need to change the lockType, > >> ... the default "native" probably will not work, and you might need to > >> change it to "none" to get it working ... but that disables an important > >> safety mechanism that prevents index corruption. > >> > >> http://stackoverflow.com/questions/9599529/solr-over-nfs-problems > >> > >> Thanks, > >> Shawn > >> > > >
Errors during Indexing in SOLR 4.6
Hi All, we recently migrated from SOLR 3.6 to SOLR 4, while indexing in SOLR 4 we are getting below exception. Apr 1, 2015 9:22:57 AM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Exception writing document id 932684555 to the index; possible analysis error. at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) Caused by: java.lang.IllegalArgumentException: first position increment must be > 0 (got 0) for field 'DataEnglish' at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:131) this works perfectly fine in SOLR 3.6. can someone help in debugging this. any fixes/solutions? Thanks in Advance. Best Regards, Abhishek
Unable to identify why faceting is taking so much time
I trying to facet over some data. My query is: http://localhost:9020/search/p1-umShard-1/select?q=*:*&fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])&debug=timing&wt=json&rows=0 { - responseHeader: { - status: 0, - QTime: 45 }, - response: { - numFound: 137, - start: 0, - docs: [ ] }, - debug: { - timing: { - time: 45, - prepare: { - time: 0, - query: { - time: 0 }, - facet: { - time: 0 }, - mlt: { - time: 0 }, - highlight: { - time: 0 }, - stats: { - time: 0 }, - debug: { - time: 0 } }, - process: { - time: 45, - query: { - time: 45 }, - facet: { - time: 0 }, - mlt: { - time: 0 }, - highlight: { - time: 0 }, - stats: { - time: 0 }, - debug: { - time: 0 } } } } } According to this there are 137 records. Now I am faceting over these 137 records with facet.method=fc. Ideally it should just iterate over these 137 records and sub up the facets. Facet query is: http://localhost:9020/search/p1-umShard-1/select?q=*:*&fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])&facet.field=conversationId&facet=true&indent=on&wt=json&rows=0&facet.method=fc&debug=timing { - responseHeader: { - status: 0, - QTime: 395103 }, - response: { - numFound: 137, - start: 0, - docs: [ ] }, - facet_counts: { - facet_queries: { }, - facet_fields: { - conversationId: [ - "t_mid.1429800181915:43409a654f429a7279", - 14, - "t_mid.1430066755916:3f1df73a90f3f56b24", - 12, - "t_mid.1424867675391:7a0ce173662f6b3230", - 10, - "t_mid.1429264970537:d53579af6852fdd409", - 8, - "t_mid.1429968009539:ad97aa3fcfc933ac32", - 6, - "t_mid.1429076620603:cf8c8da6cc7c0f7a40", - 5, - "t_mid.1429967431080:6f1037c42bc6d10921", - 4, - "t_mid.1430335716379:e8d2d7390c6d999689", - 4, - "t_mid.1430591984365:9c66f4b3f67a973193", - 4, - "t_mid.1431105168474:f5d294b79df5e97a26", - 4, - "t_id.539747739369904", - 3, - "t_mid.1423253619046:ef3da504f704e12448", - 3, - "t_mid.1424454328414:91f82976dc8196e034", - 3, - "t_mid.1429967443439:dacb57b0f96b00cb63", - 3, - "t_mid.1430734315969:e5002ecd489b51cc19", - 3, - "t_mid.1423229143533:71f3dd0f3714f44232", - 2, - "t_mid.1429076490131:87feb49fa82041dd77", - 2, - "t_mid.1429080523489:00a85a2b07980c9a19", - 2, - "t_mid.1429913551113:5870b4366960dc5c10", - 2, - "t_mid.1429917749072:7cbdaf3d8c2d15ef78", - 2, - "t_mid.1429966041997:616561349e22cb7001", - 2, - "t_mid.1429968203236:bcd0c539ae66947618", - 2, - "t_mid.1429982604402:6e509023526a0f5b09", - 2, - "t_mid.1430475210140:8a963390e62e26f497", - 2, - "t_mid.1430746574833:59b08895c5287a2998", - 2, - "t_mid.1423229237215:d03fb607be18b2d089", - 1, - "t_mid.1423256045556:63089c5cc77c800113", - 1, - "t_mid.1426870505993:a5b69b271bea481730", - 1, - "t_mid.1428776595760:d5ebc1f3b922952e41", - 1, - "t_mid.1429079296566:f9f0e4c24071e55444", - 1, - "t_mid.1429315090481:9b7d59d6d483999d57", - 1, - "t_mid.1429498786426:04f58597d3f5461330", - 1, - "t_mid.1429878261810:4bdc3e6442db876c21", - 1, - "t_mid.1429906605359:0f89faf08295015957", - 1, - "t_mid.1429915168615:365578d261795d6140", - 1, - "t_mid.1429968022645:2a362d85be63c2ab95", - 1, - "t_mid.1429968121564:2effeb664562bd9b26", - 1, - "t_mid.1429969582192:5aca482f37dca9d843", - 1, - "t_mi
Re: Unable to identify why faceting is taking so much time
Toke thanks for a quick reply. I am still confused, pls find the doubts I have inline: On Mon, May 11, 2015 at 1:22 PM Toke Eskildsen wrote: > On Mon, 2015-05-11 at 05:48 +0000, Abhishek Gupta wrote: > > According to this there are 137 records. Now I am faceting over these 137 > > records with facet.method=fc. Ideally it should just iterate over these > 137 > > records and sub up the facets. > > That is only the ideal method if you are not planning on issuing > subsequent calls: facet.method=fc does more work up front to ensure that > later calls are fast. > > http://localhost:9020/search/p1-umShard-1/select?q=*:*&; > > fq=(msgType:38+AND+snCreatedTime:[2015-04-15T00:00:00Z%20TO%20*])& > > facet.field=conversationId&facet=true&indent=on&wt=json&rows=0& > > facet.method=fc&debug=timing > > { > > > >- responseHeader: > >{ > > - status: 0, > > - QTime: 395103 > > }, > > [...] > > > According to this faceting is taking 395036 time. Why its taking *395 > > seconds* to just calculate facets of 137 records? > > 6½ minute is a long time, even for first call. Do you have tens to > hundreds of millions of documents in your index? Or do you have a > similiar amount of unique values in your facet? > Yes we have that many documents (exact count: 522664425), but I am not sure why that matters because what I understood from documentation <https://wiki.apache.org/solr/SimpleFacetParameters#facet.method> is that *fc* will only work on the documents filtered by filter query and query. For my query there are only 137 documents for fc to work on and to make *FieldCache*. But seeing the faceting result it seems that faceting is being applied on all the documents which is not according to documentation "*The facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document"*. I am not able to understand why fc is calculating facets over all the documents? Just for your information the cardinality of the field(conversationId) on which I am faceting is very high but the possible values for this field matching my query and filter query is about 100 only. > Either way, subsequent faceting calls should be much faster and a switch > to DocValues should lower your first-call time significantly. > Also subsequent calls are not fast: First call time: 297572 Second call time (made with in 2 sec): 249287 Yeah I agree docValues will reduce the time. > > Toke Eskildsen, State and University Library, Denmark > > >
Re: [EXTERNAL] Re: Does anybody crawl to a database and then index from the database to Solr?
Clayton you could also try running and optimize on the SOLR index as a weekly/bi weekly maintenance task to keep the segment count in check and the maxdoc , numdoc count as close as possible (in DB terms de-fragmenting the solr indexes) Best Regards, Abhishek On Sun, May 15, 2016 at 7:18 PM, Pryor, Clayton J wrote: > Thank you for your feedback. I really appreciate you taking the time to > write it up for me (and hopefully others who might be considering the > same). My first thought for dealing with deleted docs was to delete the > contents and rebuild the index from scratch but my primary customer for the > deleted docs functionality wants to see it immediately. I wrote a > connector for transferring the contents of one Solr Index to another (I > call it a Solr connector) and that takes a half hour. As a side note, the > reason I have multiple indexes is because we currently have physical > servers for development and production but, as part of my effort, I am > transitioning us to new VMs for development, quality, and production. For > quality control purposes I wanted to be able to reset each with the same > set of data - thus the Solr connector. > > Yes, by connector I am talking about a Java program (using SolrJ) that > reads from the database and populates the Solr Index. For now I have had > our enterprise DBAs create a single table to hold the current index schema > fields plus some that I can think of that we might use outside of the > index. So far it is a completely flat structure so it will be easy to > index to Solr but I can see, as requirements change, we may have to have a > more sophisticated database (with multiple tables and greater > normalization) in which case the connector will have to flatten the data > for the Solr index. > > Thanks again, your response has been very reassuring! > > :) > > Clay > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, May 13, 2016 5:57 PM > To: solr-user > Subject: [EXTERNAL] Re: Does anybody crawl to a database and then index > from the database to Solr? > > Clayton: > > I think you've done a pretty thorough investigation, I think you're > spot-on. The only thing I would add is that you _will_ reindex your entire > corpus multiple times. Count on it. Sometime, somewhere, somebody will > say "gee, wouldn't it be nice if we could ". And > to support it you'll have to change your Solr schema... which will almost > certainly require you to re-index. > > The other thing people have done for deleting documents is to create > triggers in your DB to insert the deleted doc IDs into, say, a "deleted" > table along with a timestamp. Whenever necessary/desirable, run a cleanup > task that finds all the IDs since the last time you ran your deleting > program to remove docs that have been flagged since then.. Obviously you > also have to keep a record around of the timestamp of the last successful > run of this program.. > > Or, frankly, since it takes so little time to rebuild from scratch people > have foregone any of that complexity and simply rebuild the entire index > periodically. You can use "collection aliasing" to do this in the > background and then switch searches atomically, it depends somewhat on how > long you can wait until you need to see (well, _not_ > see) the deleted docs. > > But this is all refinements, I think you're going down the right path. > > And when you say "connector", are you talking DIH or an external (say > SolrJ) program? > > Best, > Erick > > On Fri, May 13, 2016 at 2:04 PM, John Bickerstaff < > j...@johnbickerstaff.com> wrote: > > I've been working on a less-complex thing along the same lines - > > taking all the data from our corporate database and pumping it into > > Kafka for long-term storage -- and the ability to "play back" all the > > Kafka messages any time we need to re-index. > > > > That simpler scenario has worked like a charm. I don't need to > > massage the data much once it's at rest in Kafka, so that was a > > straightforward solution, although I could have gone with a DB and > > just stored the solr documents with their ID's one per row in a RDBMS... > > > > The rest sounds like good ideas for your situation as Solr isn't the > > best candidate for the kind of manipulation of data you're proposing > > and a database excels at that. It's more work, but you get a lot more > > flexibility and you de-couple Solr from the data crawling as you say. > > > > It all sounds pretty good to me, but I've only been o
Proximity Search using edismax parser.
Hi All, How does proximity Query work in SOLR. Example if i am running a query like below, for the field containing the text “India registered a historical test match win against the arch rival Pakistan here in Lords, England on Sunday” Query: “Test match India Pakistan” ~ 10 I am interested in understanding the intermediate steps involved here to understand the search behavior and determine how results are being matched to the search phrase. Thanks in Advance, Abhishek
Re: Proximity Search using edismax parser.
Thanks for the suggestions Erik and Vrindavda, i was trying to understand how does the above query work when we have slop set to 10. the debug output of the SOLR Query gave the terms which were being looked up but the transpositions done to look up the search wasn't exposed. i found following stack overflow link which describes the transpositions done when one is looking for phrase with slop 4. is there a guide to understand this? https://stackoverflow.com/questions/25558195/lucene-proximity-search-for-phrase-with-more-than-two-words Thanks in advance. Best Regards, Abhishek On Mon, Jun 12, 2017 at 5:41 PM, Erik Hatcher wrote: > Adding &debug=true to your search requests will give you the parsing > details, so you can see how edismax interprets the query string and > parameters to turn it into the underlying dismax and phrase queries. > > Erik > > > On Jun 12, 2017, at 3:22 AM, abhi Abhishek wrote: > > > > Hi All, > > How does proximity Query work in SOLR. > > > > Example if i am running a query like below, for the field containing the > > text “India registered a historical test match win against the arch rival > > Pakistan here in Lords, England on Sunday” > > > > Query: “Test match India Pakistan” ~ 10 > > > >I am interested in understanding the intermediate steps > > involved here to understand the search behavior and determine how results > > are being matched to the search phrase. > > > > Thanks in Advance, > > > > Abhishek > >
Odd Boolean Query behavior in SOLR 3.6
Hi Everyone, I have hit a weird behavior of Boolean Query, when I am running the query with below param’s it’s not behaving as expected. can you please help me understand the behavior here? q=*:*&fq=((-documentTypeId:3)+AND+companyId:29096)&version=2.2&start=0&rows=10&indent=on&debugQuery=true èReturns 0 matches filter_queries: ((-documentTypeId:3) AND companyId:29096) parsed_filter_queries: +(-documentTypeId:3) +companyId:29096 q=*:*&fq=(-documentTypeId:3+AND+companyId:29096)&version=2.2&start=0&rows=10&indent=on&debugQuery=true è returns 1600 matches filter_queries:(-documentTypeId:3 AND companyId:29096) parsed_filter_queries:-documentTypeId:3 +companyId:29096 Can you please help me understand what am I missing here? Thanks in Advance. Thanks & Best Regards, Abhishek
SOLR Metric Reporting to graphite
Hi All, I am trying to setup the graphite reporter for SOLR 6.5.0. i've started a sample docker instance for graphite with statd ( https://github.com/hopsoft/docker-graphite-statsd). also i've added the graphite metrics reporter in the SOLR.xml config of the collection. however post doing this i dont see any data getting posted to the graphite ( https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting). added XML Config to solr.xml localhost 2003 1 Graphite Mapped Ports HostContainerService 80 80 nginx <https://www.nginx.com/resources/admin-guide/> 2003 2003 carbon receiver - plaintext <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-plaintext-protocol> 2004 2004 carbon receiver - pickle <http://graphite.readthedocs.io/en/latest/feeding-carbon.html#the-pickle-protocol> 2023 2023 carbon aggregator - plaintext <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 2024 2024 carbon aggregator - pickle <http://graphite.readthedocs.io/en/latest/carbon-daemons.html#carbon-aggregator-py> 8125 8125 statsd <https://github.com/etsy/statsd/blob/master/docs/server.md> 8126 8126 statsd admin <https://github.com/etsy/statsd/blob/v0.7.2/docs/admin_interface.md> <https://github.com/hopsoft/docker-graphite-statsd#mounted-volumes> please advice if i am doing something wrong here. Thanks, Abhishek
edismax parsing confusion
Hi all i am running solr query with these parameter bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))" qf: "test_product^5 category_path_tf^4 product_id gender" q: "handbags between rs150 and rs 400" defType: "edismax" parsed query is like below one for q:- (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag | test_product:handbag^5.0 | product_id:handbags)) DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between | test_product:between^5.0 | product_id:between)) +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 | test_product:rs150^5.0 | product_id:rs150)) +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs | test_product:rs^5.0 | product_id:rs)) DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 | test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags between")) DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs 400"))) (DisjunctionMaxQuery(("":"handbags between rs150")) DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs150 ? rs")) DisjunctionMaxQuery(("":"? rs 400"))) FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord but for dismax parser it is working perfect: (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | gender:handbag | test_product:handbag^5.0 | product_id:handbags)) DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between | test_product:between^5.0 | product_id:between)) DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 | test_product:rs150^5.0 | product_id:rs150)) DisjunctionMaxQuery((product_id:and)) DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs | test_product:rs^5.0 | product_id:rs)) DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 | test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"handbags between rs150 ? rs 400")) FunctionQuery(sum(product(float(new_popularity),const(100)),if(exists(float(third_price)),const(50),const(0)/no_coord *according to me difference between dismax and edismax is based on some extra features plus working of boosting fucntions.* Regards, Abhishek
Re: edismax parsing confusion
Hello guys sorry for late response. @steve I am using solr 5.2 . @greg i am using default mm from config file(According to me it is default mm is 1). Regards, Abhishek On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury wrote: > eDismax uses 'mm', so knowing what that has been set to is important, or if > it has been left unset/default you would need to consider whether 'q.op' > has been set. Or the default operator from the config file. > > Ta, > Greg > > > On 3 April 2017 at 23:56, Steve Rowe wrote: > > > Hi Abhishek, > > > > Which version of Solr are you using? > > > > I can see that the parsed queries are different, but they’re also very > > similar, and there’s a lot of detail there - can you be more specific > about > > what the problem is? > > > > -- > > Steve > > www.lucidworks.com > > > > > On Apr 3, 2017, at 4:54 AM, Abhishek Mishra > > wrote: > > > > > > Hi all > > > i am running solr query with these parameter > > > > > > bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))" > > > qf: "test_product^5 category_path_tf^4 product_id gender" > > > q: "handbags between rs150 and rs 400" > > > defType: "edismax" > > > > > > parsed query is like below one > > > > > > for q:- > > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | > gender:handbag | > > > test_product:handbag^5.0 | product_id:handbags)) > > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between | > > > test_product:between^5.0 | product_id:between)) > > > +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 | > > > test_product:rs150^5.0 | product_id:rs150)) > > > +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs | > > > test_product:rs^5.0 | product_id:rs)) > > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 | > > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":" > > handbags > > > between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags > between")) > > > DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs > > > 400"))) (DisjunctionMaxQuery(("":"handbags between rs150")) > > > DisjunctionMaxQuery(("":"between rs150")) > > DisjunctionMaxQuery(("":"rs150 ? > > > rs")) DisjunctionMaxQuery(("":"? rs 400"))) > > > FunctionQuery(sum(product(float(new_popularity),const( > > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord > > > > > > but for dismax parser it is working perfect: > > > > > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 | > gender:handbag | > > > test_product:handbag^5.0 | product_id:handbags)) > > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between | > > > test_product:between^5.0 | product_id:between)) > > > DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 | > > > test_product:rs150^5.0 | product_id:rs150)) > > > DisjunctionMaxQuery((product_id:and)) > > > DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs | > > > test_product:rs^5.0 | product_id:rs)) > > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 | > > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":" > > handbags > > > between rs150 ? rs 400")) > > > FunctionQuery(sum(product(float(new_popularity),const( > > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord > > > > > > > > > *according to me difference between dismax and edismax is based on some > > > extra features plus working of boosting fucntions.* > > > > > > > > > > > > Regards, > > > Abhishek > > > > >
Which Tokenizer to use at searching
Hi Friends, I am concerned on Tokenizer, my scenario is: During indexing i want to token on all punctuations, so i can use StandardTokenizer, but at search time i want to consider punctuations as part of text, I dont store contents but only indexes. What should i use. Any advices ? -- Thanks and kind Regards, Abhishek jain
Re: Which Tokenizer to use at searching
hi, Thanks for replying promptly, an example: I want to index for A,B but when i search A AND B, it should return result, when i search for "A,B" it should return result. Also Ideally when i search for "A , B" (with space) it should return result. please advice thanks abhishek On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI wrote: > Hi; > > Firstly you have to keep in mind that if you don't index punctuation they > will not be visible for search. On the other hand you can have different > analyzer for index and search. You have to give more detail about your > situation. What will be your tokenizer at search time, WhiteSpaceTokenizer? > You can have a look at here: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > If you can give some examples what you want for indexing and searching I > can help you to combine index and search analyzer/tokenizer/token filters. > > Thanks; > Furkan KAMACI > > > 2014-03-09 18:06 GMT+02:00 abhishek jain : > > > Hi Friends, > > > > I am concerned on Tokenizer, my scenario is: > > > > During indexing i want to token on all punctuations, so i can use > > StandardTokenizer, but at search time i want to consider punctuations as > > part of text, > > > > I dont store contents but only indexes. > > > > What should i use. > > > > Any advices ? > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > > -- Thanks and kind Regards, Abhishek jain +91 9971376767
Optimizing RAM
hi friends, I want to index some good amount of data, i want to keep both stemmed and unstemmed versions , I am confused should i keep two separate indexes or keep one index with two versions or column , i mean col1_stemmed and col2_unstemmed. I have multicore with multi shard configuration. My server have 32 GB RAM and stemmed index size (without content) i calculated as 60 GB . I want to not put too much load and I/O load on a decent server with some 5 other replicated servers and want to use servers for other purposes also. Also is it advised to server queries from master server or only from slaves? -- Thanks, Abhishek
Re: Which Tokenizer to use at searching
Hi Erick, Thanks for replying, I want to index A,B (with or without space with comma) as separate words and also want to return results when A and B searched individually and also "A,B" . Please let me know your views. Let me know if i still havent explained correctly. I will try again. Thanks abhishek On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson wrote: > You've contradicted yourself, so it's hard to say. Or > I'm mis-reading your messages. > > bq: During indexing i want to token on all punctuations, so i can use > StandardTokenizer, but at search time i want to consider punctuations as > part of text, > > and in your second message: > > bq: when i search for "A,B" it should return result. [for input "A,B"] > > If, indeed, you "... at search time i want to consider punctuations as > part of text" then "A,B" should NOT match the document. > > The admin/analysis page is your friend, I strongly suggest you spend > some time looking at the various transformations performed by > the various analyzers and tokenizers. > > Best, > Erick > > On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain > wrote: > > hi, > > > > Thanks for replying promptly, > > an example: > > > > I want to index for A,B > > but when i search A AND B, it should return result, > > when i search for "A,B" it should return result. > > > > Also Ideally when i search for "A , B" (with space) it should return > result. > > > > > > please advice > > thanks > > abhishek > > > > > > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI >wrote: > > > >> Hi; > >> > >> Firstly you have to keep in mind that if you don't index punctuation > they > >> will not be visible for search. On the other hand you can have different > >> analyzer for index and search. You have to give more detail about your > >> situation. What will be your tokenizer at search time, > WhiteSpaceTokenizer? > >> You can have a look at here: > >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > >> > >> If you can give some examples what you want for indexing and searching I > >> can help you to combine index and search analyzer/tokenizer/token > filters. > >> > >> Thanks; > >> Furkan KAMACI > >> > >> > >> 2014-03-09 18:06 GMT+02:00 abhishek jain : > >> > >> > Hi Friends, > >> > > >> > I am concerned on Tokenizer, my scenario is: > >> > > >> > During indexing i want to token on all punctuations, so i can use > >> > StandardTokenizer, but at search time i want to consider punctuations > as > >> > part of text, > >> > > >> > I dont store contents but only indexes. > >> > > >> > What should i use. > >> > > >> > Any advices ? > >> > > >> > > >> > -- > >> > Thanks and kind Regards, > >> > Abhishek jain > >> > > >> > > > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > +91 9971376767 > -- Thanks and kind Regards, Abhishek jain +91 9971376767
Re: Which Tokenizer to use at searching
Hi Oops my bad. I actually meant While indexing A,B A and B should give result but "A B" should not give result. Also I will look at analyser. Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:38 To: abhishek jain Subject: Re: Which Tokenizer to use at searching Then I don't see the problem. StandardTokenizer (see the "text_general" fieldType) should do all this for you automatically. Did you look at the analysis page? I really recommend it. Best, Erick On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain wrote: > Hi Erick, > Thanks for replying, > > I want to index A,B (with or without space with comma) as separate words and > also want to return results when A and B searched individually and also > "A,B" . > > Please let me know your views. > Let me know if i still havent explained correctly. I will try again. > > Thanks > abhishek > > > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson > wrote: >> >> You've contradicted yourself, so it's hard to say. Or >> I'm mis-reading your messages. >> >> bq: During indexing i want to token on all punctuations, so i can use >> StandardTokenizer, but at search time i want to consider punctuations as >> part of text, >> >> and in your second message: >> >> bq: when i search for "A,B" it should return result. [for input "A,B"] >> >> If, indeed, you "... at search time i want to consider punctuations as >> part of text" then "A,B" should NOT match the document. >> >> The admin/analysis page is your friend, I strongly suggest you spend >> some time looking at the various transformations performed by >> the various analyzers and tokenizers. >> >> Best, >> Erick >> >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain >> wrote: >> > hi, >> > >> > Thanks for replying promptly, >> > an example: >> > >> > I want to index for A,B >> > but when i search A AND B, it should return result, >> > when i search for "A,B" it should return result. >> > >> > Also Ideally when i search for "A , B" (with space) it should return >> > result. >> > >> > >> > please advice >> > thanks >> > abhishek >> > >> > >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI >> > wrote: >> > >> >> Hi; >> >> >> >> Firstly you have to keep in mind that if you don't index punctuation >> >> they >> >> will not be visible for search. On the other hand you can have >> >> different >> >> analyzer for index and search. You have to give more detail about your >> >> situation. What will be your tokenizer at search time, >> >> WhiteSpaceTokenizer? >> >> You can have a look at here: >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters >> >> >> >> If you can give some examples what you want for indexing and searching >> >> I >> >> can help you to combine index and search analyzer/tokenizer/token >> >> filters. >> >> >> >> Thanks; >> >> Furkan KAMACI >> >> >> >> >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain : >> >> >> >> > Hi Friends, >> >> > >> >> > I am concerned on Tokenizer, my scenario is: >> >> > >> >> > During indexing i want to token on all punctuations, so i can use >> >> > StandardTokenizer, but at search time i want to consider punctuations >> >> > as >> >> > part of text, >> >> > >> >> > I dont store contents but only indexes. >> >> > >> >> > What should i use. >> >> > >> >> > Any advices ? >> >> > >> >> > >> >> > -- >> >> > Thanks and kind Regards, >> >> > Abhishek jain >> >> > >> >> >> > >> > >> > >> > -- >> > Thanks and kind Regards, >> > Abhishek jain >> > +91 9971376767 > > > > > -- > Thanks and kind Regards, > Abhishek jain > +91 9971376767
Re: Which Tokenizer to use at searching
Hi, I meant that while searching A AND B should return result individually and when together with a AND. I want "A B" should not give result. Though A,B is indexed with StandardTokenizer. Thanks Abhishek Original Message From: Furkan KAMACI Sent: Monday, 10 March 2014 06:11 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: Which Tokenizer to use at searching Hi; What do you mean at here: "While indexing A,B A and B should give result " Thanks; Furkan KAMACI 2014-03-09 22:36 GMT+02:00 : > Hi > Oops my bad. I actually meant > While indexing A,B > A and B should give result but > "A B" should not give result. > > Also I will look at analyser. > > Thanks > Abhishek > > Original Message > From: Erick Erickson > Sent: Monday, 10 March 2014 01:38 > To: abhishek jain > Subject: Re: Which Tokenizer to use at searching > > Then I don't see the problem. StandardTokenizer > (see the "text_general" fieldType) should do all this > for you automatically. > > Did you look at the analysis page? I really recommend it. > > Best, > Erick > > On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain > wrote: > > Hi Erick, > > Thanks for replying, > > > > I want to index A,B (with or without space with comma) as separate words > and > > also want to return results when A and B searched individually and also > > "A,B" . > > > > Please let me know your views. > > Let me know if i still havent explained correctly. I will try again. > > > > Thanks > > abhishek > > > > > > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson > > > wrote: > >> > >> You've contradicted yourself, so it's hard to say. Or > >> I'm mis-reading your messages. > >> > >> bq: During indexing i want to token on all punctuations, so i can use > >> StandardTokenizer, but at search time i want to consider punctuations as > >> part of text, > >> > >> and in your second message: > >> > >> bq: when i search for "A,B" it should return result. [for input "A,B"] > >> > >> If, indeed, you "... at search time i want to consider punctuations as > >> part of text" then "A,B" should NOT match the document. > >> > >> The admin/analysis page is your friend, I strongly suggest you spend > >> some time looking at the various transformations performed by > >> the various analyzers and tokenizers. > >> > >> Best, > >> Erick > >> > >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain > >> wrote: > >> > hi, > >> > > >> > Thanks for replying promptly, > >> > an example: > >> > > >> > I want to index for A,B > >> > but when i search A AND B, it should return result, > >> > when i search for "A,B" it should return result. > >> > > >> > Also Ideally when i search for "A , B" (with space) it should return > >> > result. > >> > > >> > > >> > please advice > >> > thanks > >> > abhishek > >> > > >> > > >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI > >> > wrote: > >> > > >> >> Hi; > >> >> > >> >> Firstly you have to keep in mind that if you don't index punctuation > >> >> they > >> >> will not be visible for search. On the other hand you can have > >> >> different > >> >> analyzer for index and search. You have to give more detail about > your > >> >> situation. What will be your tokenizer at search time, > >> >> WhiteSpaceTokenizer? > >> >> You can have a look at here: > >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > >> >> > >> >> If you can give some examples what you want for indexing and > searching > >> >> I > >> >> can help you to combine index and search analyzer/tokenizer/token > >> >> filters. > >> >> > >> >> Thanks; > >> >> Furkan KAMACI > >> >> > >> >> > >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain >: > >> >> > >> >> > Hi Friends, > >> >> > > >> >> > I am concerned on Tokenizer, my scenario is: > >> >> > > >> >> > During indexing i want to token on all punctuations, so i can use > >> >> > StandardTokenizer, but at search time i want to consider > punctuations > >> >> > as > >> >> > part of text, > >> >> > > >> >> > I dont store contents but only indexes. > >> >> > > >> >> > What should i use. > >> >> > > >> >> > Any advices ? > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks and kind Regards, > >> >> > Abhishek jain > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks and kind Regards, > >> > Abhishek jain > >> > +91 9971376767 > > > > > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > +91 9971376767 >
Re: Optimizing RAM
Hi, If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size? Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:37 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Optimizing RAM I'd go for a copyField, keep the stemmed and unstemmed version in the same index. An alternative (and I think there's a JIRA for this if not an outright patch) is implement a "special" filter that, say, puts the original tken in with a special character, say $ at the end, i.e. if indexing "running", you'd index both "running$" and "run". Then when you want exact match, you search for "running$". Best, Erick On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain wrote: > hi friends, > I want to index some good amount of data, i want to keep both stemmed and > unstemmed versions , > I am confused should i keep two separate indexes or keep one index with two > versions or column , i mean col1_stemmed and col2_unstemmed. > > I have multicore with multi shard configuration. > My server have 32 GB RAM and stemmed index size (without content) i > calculated as 60 GB . > I want to not put too much load and I/O load on a decent server with some 5 > other replicated servers and want to use servers for other purposes also. > > > Also is it advised to server queries from master server or only from slaves? > -- > Thanks, > Abhishek
Re: Which Tokenizer to use at searching
Hi, As a solution, i have tried a combination of PatternTokenizerFactory and PatternReplaceFilterFactory . In both query and indexer i have written: What i am trying to do is tokenizing on space and then rewriting every special character as " punct " . So, A,B becomes A punct B . but the problem is A punct B is still one word and not tokenized further application of filter, Is there a way i can tokenize after application of filter, please suggest i know i am missing something basic. thanks abhishek On Mon, Mar 10, 2014 at 2:06 AM, wrote: > Hi > Oops my bad. I actually meant > While indexing A,B > A and B should give result but > "A B" should not give result. > > Also I will look at analyser. > > Thanks > Abhishek > > Original Message > From: Erick Erickson > Sent: Monday, 10 March 2014 01:38 > To: abhishek jain > Subject: Re: Which Tokenizer to use at searching > > Then I don't see the problem. StandardTokenizer > (see the "text_general" fieldType) should do all this > for you automatically. > > Did you look at the analysis page? I really recommend it. > > Best, > Erick > > On Sun, Mar 9, 2014 at 3:04 PM, abhishek jain > wrote: > > Hi Erick, > > Thanks for replying, > > > > I want to index A,B (with or without space with comma) as separate words > and > > also want to return results when A and B searched individually and also > > "A,B" . > > > > Please let me know your views. > > Let me know if i still havent explained correctly. I will try again. > > > > Thanks > > abhishek > > > > > > On Sun, Mar 9, 2014 at 11:49 PM, Erick Erickson > > > wrote: > >> > >> You've contradicted yourself, so it's hard to say. Or > >> I'm mis-reading your messages. > >> > >> bq: During indexing i want to token on all punctuations, so i can use > >> StandardTokenizer, but at search time i want to consider punctuations as > >> part of text, > >> > >> and in your second message: > >> > >> bq: when i search for "A,B" it should return result. [for input "A,B"] > >> > >> If, indeed, you "... at search time i want to consider punctuations as > >> part of text" then "A,B" should NOT match the document. > >> > >> The admin/analysis page is your friend, I strongly suggest you spend > >> some time looking at the various transformations performed by > >> the various analyzers and tokenizers. > >> > >> Best, > >> Erick > >> > >> On Sun, Mar 9, 2014 at 1:54 PM, abhishek jain > >> wrote: > >> > hi, > >> > > >> > Thanks for replying promptly, > >> > an example: > >> > > >> > I want to index for A,B > >> > but when i search A AND B, it should return result, > >> > when i search for "A,B" it should return result. > >> > > >> > Also Ideally when i search for "A , B" (with space) it should return > >> > result. > >> > > >> > > >> > please advice > >> > thanks > >> > abhishek > >> > > >> > > >> > On Sun, Mar 9, 2014 at 9:52 PM, Furkan KAMACI > >> > wrote: > >> > > >> >> Hi; > >> >> > >> >> Firstly you have to keep in mind that if you don't index punctuation > >> >> they > >> >> will not be visible for search. On the other hand you can have > >> >> different > >> >> analyzer for index and search. You have to give more detail about > your > >> >> situation. What will be your tokenizer at search time, > >> >> WhiteSpaceTokenizer? > >> >> You can have a look at here: > >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > >> >> > >> >> If you can give some examples what you want for indexing and > searching > >> >> I > >> >> can help you to combine index and search analyzer/tokenizer/token > >> >> filters. > >> >> > >> >> Thanks; > >> >> Furkan KAMACI > >> >> > >> >> > >> >> 2014-03-09 18:06 GMT+02:00 abhishek jain >: > >> >> > >> >> > Hi Friends, > >> >> > > >> >> > I am concerned on Tokenizer, my scenario is: > >> >> > > >> >> > During indexing i want to token on all punctuations, so i can use > >> >> > StandardTokenizer, but at search time i want to consider > punctuations > >> >> > as > >> >> > part of text, > >> >> > > >> >> > I dont store contents but only indexes. > >> >> > > >> >> > What should i use. > >> >> > > >> >> > Any advices ? > >> >> > > >> >> > > >> >> > -- > >> >> > Thanks and kind Regards, > >> >> > Abhishek jain > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks and kind Regards, > >> > Abhishek jain > >> > +91 9971376767 > > > > > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > +91 9971376767 > -- Thanks and kind Regards, Abhishek jain +91 9971376767
Re: Optimizing RAM
Hi all, What should be the ideal RAM index size ratio. please reply I expect index to be of size of 60 gb and I dont store contents. Thanks Abhishek Original Message From: abhishek.netj...@gmail.com Sent: Monday, 10 March 2014 09:25 To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: Optimizing RAM Hi, If I go with copy field than will it increase I/O load considering I have RAM less than one third of total index size? Thanks Abhishek Original Message From: Erick Erickson Sent: Monday, 10 March 2014 01:37 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Optimizing RAM I'd go for a copyField, keep the stemmed and unstemmed version in the same index. An alternative (and I think there's a JIRA for this if not an outright patch) is implement a "special" filter that, say, puts the original tken in with a special character, say $ at the end, i.e. if indexing "running", you'd index both "running$" and "run". Then when you want exact match, you search for "running$". Best, Erick On Sun, Mar 9, 2014 at 2:55 PM, abhishek jain wrote: > hi friends, > I want to index some good amount of data, i want to keep both stemmed and > unstemmed versions , > I am confused should i keep two separate indexes or keep one index with two > versions or column , i mean col1_stemmed and col2_unstemmed. > > I have multicore with multi shard configuration. > My server have 32 GB RAM and stemmed index size (without content) i > calculated as 60 GB . > I want to not put too much load and I/O load on a decent server with some 5 > other replicated servers and want to use servers for other purposes also. > > > Also is it advised to server queries from master server or only from slaves? > -- > Thanks, > Abhishek
Re: Optimizing RAM
hi Shawn, Thanks for the reply, Is there a way to optimize RAM or does Solr does automatically. I have multiple shards and i know i will be querying only 30% of shards most of time! and i have 6 slaves. so dedicating more slave with 30% most used shards . Another question: Is it advised to serve queries from master or only from slaves? or it doesnt matter? thanks Abhishek On Tue, Mar 11, 2014 at 9:12 PM, Shawn Heisey wrote: > On 3/11/2014 6:14 AM, abhishek.netj...@gmail.com wrote: > > Hi all, > > What should be the ideal RAM index size ratio. > > > > please reply I expect index to be of size of 60 gb and I dont store > contents. > > Ideally, your total system RAM will be equal to the size of all your > program's heap requirements, plus the size of all the data for all the > programs. > > If Solr is the only thing on the box, then the ideal memory size is > roughly the Solr heap plus the size of all the Solr indexes that live on > that machine. So if your heap is 8GB and your index is 60GB, you'll > want at least 68GB of RAM for an ideal setup. I don't know how big your > heap is, so I am guessing here. > > You said your index does not store much content. That means you will > need a higher percentage of your total index size to be in RAM for good > performance. I would estimate that you want a minimum of two thirds of > your index in RAM, which indicates a minimum RAM size of 48GB if we > assume your heap is 8GB. 64GB would be better. > > http://wiki.apache.org/solr/SolrPerformanceProblems#General_information > > Thanks, > Shawn > > -- Thanks and kind Regards, Abhishek jain +91 9971376767
AND not as a boolean operator in Phrase
hi friends, when i search for "A and B" it gives me result for A , B , i am not sure why? Please guide how can i exact match when it is within phrase/quotes. -- Thanks and kind Regards, Abhishek jain
Re: AND not as a boolean operator in Phrase
Hi Jack, You are right, i am using 'and' as a stop word in both indexing and query, Should i use it only during indexing? thanks On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky wrote: > What does your field type analyzer look like? > > I suspect that you have a stop filter which cause "and" to be removed. > > -- Jack Krupansky > > -Original Message- From: abhishek jain Sent: Tuesday, March 25, > 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a > boolean operator in Phrase > hi friends, > > when i search for "A and B" it gives me result for A , B , i am not sure > why? > > Please guide how can i exact match when it is within phrase/quotes. > > -- > Thanks and kind Regards, > Abhishek jain > -- Thanks and kind Regards, Abhishek jain +91 9971376767
Strange behavior while deleting
hi friends, I have observed a strange behavior, I have two indexes of same ids and same number of docs, and i am using a json file to delete records from both the indexes, after deleting the ids, the resulting indexes now show different count of docs, Not sure why I used curl with the same json file to delete from both the indexes. Please advise asap, thanks -- Thanks and kind Regards, Abhishek
Re: Strange behavior while deleting
Hi, These settings are commented in schema. These are two different solr severs and almost identical schema with the exception of one stemmed field. Same solr versions are running. Please help. Thanks Abhishek Original Message From: Jack Krupansky Sent: Monday, 31 March 2014 14:54 To: solr-user@lucene.apache.org Reply To: solr-user@lucene.apache.org Subject: Re: Strange behavior while deleting Do the two cores have identical schema and solrconfig files? Are the delete and merge config settings the sameidentical? Are these two cores running on the same Solr server, or two separate Solr servers? If the latter, are they both running the same release of Solr? How big is the discrepancy - just a few, dozens, 10%, 50%? -- Jack Krupansky -Original Message- From: abhishek jain Sent: Monday, March 31, 2014 3:26 AM To: solr-user@lucene.apache.org Subject: Strange behavior while deleting hi friends, I have observed a strange behavior, I have two indexes of same ids and same number of docs, and i am using a json file to delete records from both the indexes, after deleting the ids, the resulting indexes now show different count of docs, Not sure why I used curl with the same json file to delete from both the indexes. Please advise asap, thanks -- Thanks and kind Regards, Abhishek
Re: AND not as a boolean operator in Phrase
Hi, Ok thanks, i want to search for phrase "A and B" with the *and *word sandwiched between A and B. I dont want to work with and as a boolean operator when within quotes. I have and as a stop word and i dont want to reindex data. What is my best bet. thanks abhishek jain On Sun, Mar 30, 2014 at 2:33 AM, Bob Laferriere wrote: > If you are using edismax you need to use AND. So A AND B will ignore the > stop word and apply the Boolean operator. You can configure edismax to > ignore Boolean stop words that are lowercase. > > Regards, > > Bob > > > On Mar 26, 2014, at 2:39 AM, abhishek jain > wrote: > > > > Hi Jack, > > You are right, i am using 'and' as a stop word in both indexing and > query, > > > > Should i use it only during indexing? > > > > thanks > > > > > > > > On Tue, Mar 25, 2014 at 11:09 PM, Jack Krupansky < > j...@basetechnology.com>wrote: > > > >> What does your field type analyzer look like? > >> > >> I suspect that you have a stop filter which cause "and" to be removed. > >> > >> -- Jack Krupansky > >> > >> -Original Message- From: abhishek jain Sent: Tuesday, March 25, > >> 2014 1:29 PM To: solr-user@lucene.apache.org Subject: AND not as a > >> boolean operator in Phrase > >> hi friends, > >> > >> when i search for "A and B" it gives me result for A , B , i am not sure > >> why? > >> > >> Please guide how can i exact match when it is within phrase/quotes. > >> > >> -- > >> Thanks and kind Regards, > >> Abhishek jain > > > > > > > > -- > > Thanks and kind Regards, > > Abhishek jain > > +91 9971376767 > -- Thanks and kind Regards, Abhishek jain +91 9971376767
Error handling in Solr.
hi friends, While browsing through the logs of solr,i noticed a few null pointer exceptions, i am concerned what could be the reason? ERROR org.apache.solr.core.SolrCore â EURO " java.lang.NullPointerException at org.apache.solr.handler.admin.ShowFileRequestHandler.showFromFileSystem(ShowFileRequestHandler.java:212) at org.apache.solr.handler.admin.ShowFileRequestHandler.handleRequestBody(ShowFileRequestHandler.java:122) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Please help, -- Thanks and kind Regards, Abhishek jain +91 9971376767
Stopping Solr instance
Hi friends, What is the best way to stop solr from command line, the command with the stop port and secret key as given in most online help links don't work for me all time, I have to kill it most times ! i have though noted excessive swap usage when i have to kill it. Is there a link between swap usage and solr not stopping? Please let me know best way to stop solr instance. Thanks Abhi
Typecast non stored string field for sorting
Hi friends, I have a field which is string which I created by mistake it should have been int. It is not stored just indexed. I want to numerically sort it, and hence I want a function which can at query convert to integer or double and then I can apply sort. Is it possible? If not then can I create a new field with the value from non stored field? Please advise. Thanks Abhishek -- Thanks and kind Regards, Abhishek jain +91 9971376767
explaination of query processing in SOLR
Hello, I am fairly new to SOLR, can someone please help me understand how a query is processed in SOLR, i.e, what i want to understand is from the time it hits solr what files it refers to process the query, i.e, order in which .tvx, .tvd files and others are accessed. basically i would like to understand the code path of the search functionality also significance of various files in the solr directory such as .tvx, .tcd, .frq, etc. Regards, Abhishek Das
Re: explaination of query processing in SOLR
Thanks Alex and Jack for the direction, actually what i was trying to understand was how various files had an effect on the search. Thanks, Abhishek On Fri, Aug 8, 2014 at 6:35 PM, Alexandre Rafalovitch wrote: > Abhishek, > > Your first part of the question is interesting, but your specific > details are probably the wrong level for you to concentrate on. The > issues you will be facing are not about which file does what. That's > more performance and inner details. I feel you should worry more about > the fields, default search fields, multiterms, whitespaces, etc. > > One way to do that is to enable debug and see if you actually > understand what those different debug entries do. And don't use string > or basic tokenizer. Pick something that has complex analyzer chain and > see how that affects debug. > > Regards, >Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On Fri, Aug 8, 2014 at 1:59 PM, abhi Abhishek wrote: > > Hello, > > I am fairly new to SOLR, can someone please help me understand how a > > query is processed in SOLR, i.e, what i want to understand is from the > time > > it hits solr what files it refers to process the query, i.e, order in > which > > .tvx, .tvd files and others are accessed. basically i would like to > > understand the code path of the search functionality also significance of > > various files in the solr directory such as .tvx, .tcd, .frq, etc. > > > > > > Regards, > > Abhishek Das >
Special character search in Solr and boosting without altering the resultset
Hi friends, I am facing a strange problem, When I search a term eg .Net , the solr searches for Net and not includes '.' Is dot a special character in Solr? I tried escaping it with backslash in the url call to solr, but no use same resultset, Also , is there a way to boost some terms within a resultset. I mean I want to boost a term within a result and I don't want to fire a separate query. I couldn't use OR operator as it will modify the resultset. I want to use a single query and boost. I don't want to use dismax query as well, Please advice. Thanks, Abhishek
RE: Special character search in Solr and boosting without altering the resultset
Hi, Ok thanks, will look more into it, Any info on boosting without altering the resultset? Thanks Abhishek > -Original Message- > > Hi Abhishek, > > dot is not a special character. Your field type / analyzer is stripping > that character. Please see similar discussions and alternative > solutions. > > http://search-lucene.com/m/6dbI9zMSob1 > http://search-lucene.com/m/Ac71G0KlGz > http://search-lucene.com/m/RRD2D1p1mi > > Ahmet > > > > On Friday, January 31, 2014 8:23 PM, abhishek jain > wrote: > Hi friends, > > I am facing a strange problem, When I search a term eg .Net , the > solr searches for Net and not includes '.' > > Is dot a special character in Solr? I tried escaping it with backslash > in the url call to solr, but no use same resultset, > > > > Also , is there a way to boost some terms within a resultset. > > I mean I want to boost a term within a result and I don't want to fire > a separate query. I couldn't use OR operator as it will modify the > resultset. > I want to use a single query and boost. I don't want to use dismax > query as well, > > > > Please advice. > > > > Thanks, > > Abhishek
RE: Special character search in Solr and boosting without altering the resultset
Hi, Thanks for replying but if i understand right: q=term1 term2^0.6 means it will search for term1 and term2 and somewhat less boost to term2, I want to search only for term1 and if the term2 exists boost by a positive factor . I am not able to make such a query . Thanks Abhishek > -Original Message- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: Saturday, February 1, 2014 8:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Special character search in Solr and boosting without > altering the resultset > > Hi, > > Can you elaborate your boosting requirement? There is a carat operator > to boost query terms. > > for example : q=term1 term2^0.6 > > > > > On Saturday, February 1, 2014 1:51 PM, abhishek jain > wrote: > Hi, > Ok thanks, will look more into it, > > Any info on boosting without altering the resultset? > > Thanks > Abhishek > > > > -Original Message- > > > > Hi Abhishek, > > > > dot is not a special character. Your field type / analyzer is > > stripping that character. Please see similar discussions and > > alternative solutions. > > > > http://search-lucene.com/m/6dbI9zMSob1 > > http://search-lucene.com/m/Ac71G0KlGz > > http://search-lucene.com/m/RRD2D1p1mi > > > > Ahmet > > > > > > > > On Friday, January 31, 2014 8:23 PM, abhishek jain > > wrote: > > Hi friends, > > > > I am facing a strange problem, When I search a term eg .Net , > > the solr searches for Net and not includes '.' > > > > Is dot a special character in Solr? I tried escaping it with > backslash > > in the url call to solr, but no use same resultset, > > > > > > > > Also , is there a way to boost some terms within a resultset. > > > > I mean I want to boost a term within a result and I don't want to > fire > > a separate query. I couldn't use OR operator as it will modify the > > resultset. > > I want to use a single query and boost. I don't want to use dismax > > query as well, > > > > > > > > Please advice. > > > > > > > > Thanks, > > > > Abhishek
Remove stemming without reindexing - currently using KStem
Hi Friends, Is it possible to remove stemming without having to reindex the entire data, I am using KStem. Can we do so by query itself, not sure how? I am not using dismax. Thanks Abhishek
"facet.mincount=0" returns facet values with 0 counts for "q=*" query
Hi, Can any one help me understand what does it mean to have facet results like this - "values": [ "4th of july flags", 0, "angela moore", 0, "anklets", 0, "applique flags", 0, "army national guard", 0, "bangles", 0, "beatriz ball" ] for a *q=** query with *facet.mincount=0?* What do the* results signify? *In what condition can we have *facet count as 0* for *q=** query?
Solr Memory Usage - How to reduce memory footprint for solr
*Q* - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i keep this low, my CPU hits 100% and response time for indexing increases a lot.. And i have hit OOM Error as well when this value is low.. Is this too high? If so, how can I reduce this? *Machine Details* 4 G RAM, SSD *Solr App Details* (Standalone solr app, no shards) 1. num. of Solr Cores = 5 2. Index Size - 2 g 3. num. of Search Hits per sec - 10 [*IMP* - All search queries have faceting..] 4. num. of times Re-Indexing per hour per core - 10 (it may happen at the same time at a moment for all the 5 cores) 5. Query Result Cache, Document cache and Filter Cache are all default size - 4 kb. *top* stats - VIRTRESSHR S %CPU %MEM 6446600 3.478g 18308 S 11.3 94.6 *iotop* stats DISK READ DISK WRITE SWAPIN IO> 0-1200 K/s0-100 K/s 0 0-5%
DocValues with docValuesFormat="Disk"
Hi all, I am trying to experiment with DocValues (http://wiki.apache.org/solr/DocValues) and use the "Disk" docValuesFormat. Here's how my field type declaration looks like: sortMissingLast="true" omitNorms="true" docValuesFormat="Disk"/> I don't even have any fields using that type. Also I've updated solrconfig.xml with: LUCENE_42 Am running with solr-4.2.1. My solr core is totally empty, and there is nothing in the data dir. Am getting this weird error while starting up the solr core: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.(SolrCore.java:822) at org.apache.solr.core.SolrCore.(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.(SolrCore.java:735) ... 13 more Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: p5-upsShard-1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.(SolrCore.java:822) at org.apache.solr.core.SolrCore.(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.(SolrCore.java:735) ... 13 more Is there any other config change that I need to do? I've read http://wiki.apache.org/solr/DocValues multiple times, but am unable to see any light to solve the problem. -- - Cheers, Abhishek
Re: DocValues with docValuesFormat="Disk"
Answering myself - adding this line in solrconfig.xml made it work: On 4/23/13 3:42 PM, Abhishek Sanoujam wrote: Hi all, I am trying to experiment with DocValues (http://wiki.apache.org/solr/DocValues) and use the "Disk" docValuesFormat. Here's how my field type declaration looks like: sortMissingLast="true" omitNorms="true" docValuesFormat="Disk"/> I don't even have any fields using that type. Also I've updated solrconfig.xml with: LUCENE_42 Am running with solr-4.2.1. My solr core is totally empty, and there is nothing in the data dir. Am getting this weird error while starting up the solr core: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.(SolrCore.java:822) at org.apache.solr.core.SolrCore.(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.(SolrCore.java:735) ... 13 more Apr 23, 2013 3:34:06 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: p5-upsShard-1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.(SolrCore.java:822) at org.apache.solr.core.SolrCore.(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: FieldType 'stringDv' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:870) at org.apache.solr.core.SolrCore.(SolrCore.java:735) ... 13 more Is there any other config change that I need to do? I've read http://wiki.apache.org/solr/DocValues multiple times, but am unable to see any light to solve the problem. -- - Cheers, Abhishek -- - Cheers, Abhishek
Solr performance issues for simple query - q=*:* with start and rows
rIndexSearcher.getDocListNC(SolrIndexSearcher.java:1491) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) -- - Cheers, Abhishek
Re: Solr performance issues for simple query - q=*:* with start and rows
We have a single shard, and all the data is in a single box only. Definitely looks like "deep-paging" is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first "start" count? This will definitely take a toll when we reach higher "start" values. On 4/29/13 2:28 PM, Jan Høydahl wrote: Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with &distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. apr. 2013 kl. 09:17 skrev Abhishek Sanoujam : We have a solr core with about 115 million documents. We are trying to migrate data and running a simple query with *:* query and with start and rows param. The performance is becoming too slow in solr, its taking almost 2 mins to get 4000 rows and migration is being just too slow. Logs snippet below: INFO: [coreName] webapp=/solr path=/select params={start=55438000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=168308 INFO: [coreName] webapp=/solr path=/select params={start=55446000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=122771 INFO: [coreName] webapp=/solr path=/select params={start=55454000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=137615 INFO: [coreName] webapp=/solr path=/select params={start=5545&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=141223 INFO: [coreName] webapp=/solr path=/select params={start=55462000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=97474 INFO: [coreName] webapp=/solr path=/select params={start=55458000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=98115 INFO: [coreName] webapp=/solr path=/select params={start=55466000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=143822 INFO: [coreName] webapp=/solr path=/select params={start=55474000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=118066 INFO: [coreName] webapp=/solr path=/select params={start=5547&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=121498 INFO: [coreName] webapp=/solr path=/select params={start=55482000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=164062 INFO: [coreName] webapp=/solr path=/select params={start=55478000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=165518 INFO: [coreName] webapp=/solr path=/select params={start=55486000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=118163 INFO: [coreName] webapp=/solr path=/select params={start=55494000&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=141642 INFO: [coreName] webapp=/solr path=/select params={start=5549&q=*:*&wt=javabin&version=2&rows=4000} hits=115760479 status=0 QTime=145037 I've taken some thread dumps in the solr server and most of the time the threads seem to be busy in the following stacks mostly: Is there anything that can be done to improve the performance? Is it a known issue? Its very surprising that querying for some just rows starting at some points is taking in order of minutes. "395883378@qtp-162198005-7" prio=10 tid=0x7f4aa0636000 nid=0x295a runnable [0x7f42865dd000] java.lang.Thread.State: RUNNABLE at org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:252) at org.apache.lucene.util.PriorityQueue.pop(PriorityQueue.java:184) at org.apache.lucene.search.TopDocsCollector.populateResults(TopDocsCollector.java:61) at org.apache.lucene.search.TopDocsCollector.topDocs(TopDocsCollector.java:156) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1499) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1366) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter
Need solr query help
We are doing spatial search. with following logic. a) There are shops in a city . Each provides the facility of home delivery b) each shop has different max_delivery_distance . Now my query is suppose some one is searching from point P1 with radius R. User wants the result of shops those can deliver him.(distance between P1 to shop s1 say d1 should be less than max_delivery distance say md1 ) how can i implement this by solr spatial query.
Need help on Solr
invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate field definition for 'id' [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] and [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, required=true}]]] at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176) at org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) ... 1 more with regards, Abhishek Bansal
Re: Need help on Solr
Yeah I know, out of the box there is one id field. I removed it from schema.xml I have also added below code to automatically generate an ID. with regards, Abhishek Bansal On 20 June 2013 21:49, Shreejay wrote: > org.apache.solr.common.SolrException: [schema.xml] Duplicate field > definition for 'id' > > You might have defined an id field in the schema file. The out of box > schema file already contains an id field . > > -- > Shreejay > > > On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: > > > Hello, > > > > I am trying to index a pdf file on Solr. I am running icurrently Solr on > > Apache Tomcat 6. > > > > When I try to index it I get below error. Please help. I was not able to > > rectify this error with help of internet. > > > > > > > > > > ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; > Unable > > to create core: collection1 > > org.apache.solr.common.SolrException: [schema.xml] Duplicate field > > definition for 'id' > > > [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, > > required=true}]]] and > > > [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, > > required=true}]]] > > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) > > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176) > > at > > > org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) > > at > > > org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) > > at > > > org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) > > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) > > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) > > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > > at java.lang.Thread.run(Thread.java:662) > > ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; > > null:org.apache.solr.common.SolrException: Unable to create core: > > collection1 > > at > > > org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) > > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) > > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) > > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate > > field definition for 'id' > > > [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, > > required=true}]]] and > > > [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, > > required=true}]]] > > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) > > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176) > > at > > > org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) > > at > > > org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) > > at > > > org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) > > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:9
Re: Need help on Solr
As I am running Solr on windows + tomcat I am using below command to index pdf. I hope this command is not faulty. Please check java -jar -Durl=" http://localhost:8080/solr-4.3.0/update/extract?literal.id=1&commit=true"; post.jar sample.pdf with regards, Abhishek Bansal On 20 June 2013 21:56, Abhishek Bansal wrote: > Yeah I know, out of the box there is one id field. I removed it from > schema.xml > > I have also added below code to automatically generate an ID. > > multiValued="false"/> > > > > with regards, > Abhishek Bansal > > > On 20 June 2013 21:49, Shreejay wrote: > >> org.apache.solr.common.SolrException: [schema.xml] Duplicate field >> definition for 'id' >> >> You might have defined an id field in the schema file. The out of box >> schema file already contains an id field . >> >> -- >> Shreejay >> >> >> On Thursday, June 20, 2013 at 9:16, Abhishek Bansal wrote: >> >> > Hello, >> > >> > I am trying to index a pdf file on Solr. I am running icurrently Solr on >> > Apache Tomcat 6. >> > >> > When I try to index it I get below error. Please help. I was not able to >> > rectify this error with help of internet. >> > >> > >> > >> > >> > ERROR - 2013-06-20 20:43:41.549; org.apache.solr.core.CoreContainer; >> Unable >> > to create core: collection1 >> > org.apache.solr.common.SolrException: [schema.xml] Duplicate field >> > definition for 'id' >> > >> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, >> > required=true}]]] and >> > >> [[[id{type=string,properties=indexed,stored,omitNorms,omitTermFreqAndPositions,sortMissingLast,required, >> > required=true}]]] >> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) >> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:176) >> > at >> > >> org.apache.solr.schema.ClassicIndexSchemaFactory.create(ClassicIndexSchemaFactory.java:62) >> > at >> > >> org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:36) >> > at >> > >> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:946) >> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) >> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) >> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> > at java.lang.Thread.run(Thread.java:662) >> > ERROR - 2013-06-20 20:43:41.551; org.apache.solr.common.SolrException; >> > null:org.apache.solr.common.SolrException: Unable to create core: >> > collection1 >> > at >> > >> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450) >> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993) >> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) >> > at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) >> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >> > at java.lang.Thread.run(Thread.java:662) >> > Caused by: org.apache.solr.common.SolrException: [schema.xml] Duplicate >> > field definition for 'id' >> > >> [[[id{type=string,properties=indexe
Re: help to build query
jack Thanks for your response.. we have a deal web application.. and having free text search in it . here free text means you can type any thing in it.. we have deals of different categories.. and tagged at different merchant locations.. As per requirement i have to do some tweaks in search .. for example user can search deals like : a) cat1 in location1 , location 2.( spa in malviya nagar, Ashok vihar ... here spa : cat1, location1: malviya nagar, location2:Ashok vihar b) cat1 and cat2 in location1 c) cat1 in location1 and location2 Hope i am able to explain you better.. On Wed, Jan 30, 2013 at 9:06 PM, Jack Krupansky wrote: > Start by expressing the specific semantics of those queries in strict > boolean form. I mean, what exactly do you mean by "in", and "location1, > location 2", and "location1, loc2 and loc3? Is the latter an AND or an OR? > > Or at least fully express those two queries, unambiguously in plain > English. There is too much ambiguity present to give you any solid > direction. > > -- Jack Krupansky > > -Original Message- From: Abhishek tiwari > Sent: Wednesday, January 30, 2013 12:55 AM > To: solr-user@lucene.apache.org > Subject: help to build query > > > want to execute queries like : > a) cat in location1 , location2 > b) cat 1 and cat2 in location1 ,loc2 and loc3 > > in our search . > > our challenges : > > 1) picking right keywords(category and locality) from query entered. > 2) its mapping to relevant entity > > How should i proceed for it . > > we have localities and categories data indexed . > > thanks in advance. > > ~abhishek >
Re: Writing a french Solr book - Ecrire un livre en français
If you are thinking then do it, why do want people to tell you what you should do. bestaluck! On Sun, Jan 29, 2012 at 8:20 PM, SR wrote: > Hi there, > > Have you heard of any existing Solr book in French? If no, I'm thinking of > writing one. Do you think this could be useful for francophone community? > > Thanks > -SR -- Abhishek Tyagi Let's just say.. I'm the Frankenstein's Monster.
Re: schema design help
thanks for replying .. In our RDBMS schema we have Establishment/Event/Movie master relations. Establishment has title ,description , ratings, tags, cuisines (multivalued), services (multivalued) and features (multivalued) like fields..similarly in Event title, description, category(multivalued) and venue(multivalued) ..fields..and in movies name,start date and end date ,genre, theater ,rating , review like fields .. we are having nearly 1 M data in each entity and movie and event expire frequently and we have to update on expire we are having the data additional to index data ( stored data) to reduce RDBMS query.. please suggest me how to proceed for schema design.. single core or multiple core for each entity? On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty wrote: > On 6 March 2012 18:01, Abhishek tiwari > wrote: > > i am new in solr want help in shema design . i have multiple entities > > like Event , Establishments and Movies ..each have different types of > > relations.. should i make diffrent core for each entities ? > > It depends on your use case, i.e., what would your typical searches > be on. Normally, using a separate core for each entity would be > unusual, and instead one would flatten out typical RDBMS data for > Solr. > > Please describe what you want to achieve, and people might be > better able to help you. > > Regards, > Gora >
Re: schema design help
please suggest me when one should create multiple core..? On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood wrote: > Solr is not relational, so you will probably need to take a fresh look at > your data. > > Here is one method. > > 1. Sketch your search results page. > 2. Each result is a document in Solr. > 3. Each displayed item is a stored field in Solr. > 4. Each searched item is an indexed field in Solr. > > It may help to think of this as a big flat materialized view in your DBMS. > > wunder > Search Guy, Chegg.com > > On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote: > > > thanks for replying .. > > > > In our RDBMS schema we have Establishment/Event/Movie master relations. > > Establishment has title ,description , ratings, tags, cuisines > > (multivalued), services (multivalued) and features (multivalued) like > > fields..similarly in Event title, description, category(multivalued) and > > venue(multivalued) ..fields..and in movies name,start date and end date > > ,genre, theater ,rating , review like fields .. > > > > we are having nearly 1 M data in each entity and movie and event expire > > frequently and we have to update on expire > > we are having the data additional to index data ( stored data) to reduce > > RDBMS query.. > > > > please suggest me how to proceed for schema design.. single core or > > multiple core for each entity? > > > > > > On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty wrote: > > > >> On 6 March 2012 18:01, Abhishek tiwari > >> wrote: > >>> i am new in solr want help in shema design . i have multiple entities > >>> like Event , Establishments and Movies ..each have different types of > >>> relations.. should i make diffrent core for each entities ? > >> > >> It depends on your use case, i.e., what would your typical searches > >> be on. Normally, using a separate core for each entity would be > >> unusual, and instead one would flatten out typical RDBMS data for > >> Solr. > >> > >> Please describe what you want to achieve, and people might be > >> better able to help you. > >> > >> Regards, > >> Gora > >> > > > > > >
Re: schema design help
my page have layout in following manner *All tab* :which will contain all entities (Establishment/Event/Movie) Establishment: contain Establishment search results Event tab : will contain Event search results Movie tab : will contain Movie search results please suggest me how to design my schema ? On Thu, Mar 8, 2012 at 10:21 AM, Walter Underwood wrote: > You should create multiple cores when each core is an independent search. > If you have three separate search pages, you may want three separate cores. > > wunder > Search Guy, Chegg.com > > On Mar 7, 2012, at 8:48 PM, Abhishek tiwari wrote: > > > please suggest me when one should create multiple core..? > > > > On Thu, Mar 8, 2012 at 12:12 AM, Walter Underwood >wrote: > > > >> Solr is not relational, so you will probably need to take a fresh look > at > >> your data. > >> > >> Here is one method. > >> > >> 1. Sketch your search results page. > >> 2. Each result is a document in Solr. > >> 3. Each displayed item is a stored field in Solr. > >> 4. Each searched item is an indexed field in Solr. > >> > >> It may help to think of this as a big flat materialized view in your > DBMS. > >> > >> wunder > >> Search Guy, Chegg.com > >> > >> On Mar 6, 2012, at 10:56 PM, Abhishek tiwari wrote: > >> > >>> thanks for replying .. > >>> > >>> In our RDBMS schema we have Establishment/Event/Movie master relations. > >>> Establishment has title ,description , ratings, tags, cuisines > >>> (multivalued), services (multivalued) and features (multivalued) like > >>> fields..similarly in Event title, description, category(multivalued) > and > >>> venue(multivalued) ..fields..and in movies name,start date and end date > >>> ,genre, theater ,rating , review like fields .. > >>> > >>> we are having nearly 1 M data in each entity and movie and event expire > >>> frequently and we have to update on expire > >>> we are having the data additional to index data ( stored data) to > reduce > >>> RDBMS query.. > >>> > >>> please suggest me how to proceed for schema design.. single core or > >>> multiple core for each entity? > >>> > >>> > >>> On Tue, Mar 6, 2012 at 7:40 PM, Gora Mohanty > wrote: > >>> > >>>> On 6 March 2012 18:01, Abhishek tiwari > > >>>> wrote: > >>>>> i am new in solr want help in shema design . i have multiple > entities > >>>>> like Event , Establishments and Movies ..each have different types of > >>>>> relations.. should i make diffrent core for each entities ? > >>>> > >>>> It depends on your use case, i.e., what would your typical searches > >>>> be on. Normally, using a separate core for each entity would be > >>>> unusual, and instead one would flatten out typical RDBMS data for > >>>> Solr. > >>>> > >>>> Please describe what you want to achieve, and people might be > >>>> better able to help you. > >>>> > >>>> Regards, > >>>> Gora > >>>> > > > > >
Re: schema design help
Gora, we are not having the related search ... like u have mentioned ... * will a search on an Establishment also require results from Movie, such as what movies are showing at the establishment* Establishment doesnot require movie reults .. each enitity has there separate search.. On Thu, Mar 8, 2012 at 10:49 AM, Gora Mohanty wrote: > On 8 March 2012 10:40, Abhishek tiwari > wrote: > > my page have layout in following manner > > *All tab* :which will contain all entities (Establishment/Event/Movie) > > Establishment: contain Establishment search results > > Event tab : will contain Event search results > > Movie tab : will contain Movie search results > > > > please suggest me how to design my schema ? > [...] > > You will need to think more about your search requirements, and > provide more details. E.g., will a search on an Establishment > also require results from Movie, such as what movies are showing > at the establishment? Similarly, will results from an Event search > require a list of Movies showing at the events? As Solr is not a > RDBMS, if you need such correlated data, you should typically use > a single, flat index, rather than multiple cores. > > IMHO, a multi-core setup would be unusual for what you are > trying to do. However, this is difficult to say for sure without an > insight into your search requirements. > > Regards, > Gora >
Re: schema design help
Hi Gora, Thanks, My one more concern, Though Establishments, Events, Movies are not related to each other, I have to make 3 search queries to their independent cores and club the data to show, will that effect my relevancy? There is movie with title "Striker" and Establishment with title "Striker". So which one is better: - 3 queries to independent cores and clubbing data - Single query to one core which contains all the data. Thanks Abhishek On Thu, Mar 8, 2012 at 11:07 AM, Gora Mohanty wrote: > On 8 March 2012 11:05, Abhishek tiwari > wrote: > > Gora, > > we are not having the related search ... > > like u have mentioned ... * will a search on an Establishment > > also require results from Movie, such as what movies are showing > > at the establishment* > > > > Establishment doesnot require movie reults .. each enitity has there > > separate search.. > [...] > > In that case, multiple cores should be OK. > > Regards, > Gora >
query help
Hi , i have multi valued field want to sort the docs order the particular text eq:'B1' is added. how i should query? ad_text is multivalued field. t B1 B2 B3 B2 B1 B3 B1 B2 B3 B3 B2 B1
Re: query help
a) No. i do not want to sort the content within document . I want to sort the documents . b) As i have explained i have result set( documents ) and each document contains a fields "*ad_text*" (with other fields also) which is multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags are different for each doc. say (B1, B2, B3) *for doc1*, B3,B1 B2*, for doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4* if i search for B1: result should come in following order: doc1,doc3,doc2,doc4 (As B1 is first value in maltivalued result for doc1and doc3, and B1 is in 2nd value in doc2 while B1 is at 3rd in doc4 ) if i search for B2: result should come in following order: doc4 ,doc1,doc3,doc2 I donot know whether it is possible or not .. but please suggest how it can be done. On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson wrote: > Hmmm, I don't quite get this. Are you saying that you want > to sort the documents or sort the content within the document? > > Sorting documents (i.e the results list) requires a single-valued > field. So you'd have to, at index time, sort the entries. > > Sorting the content within the document is something you'd > have to do when you index, Solr doesn't rearrange the > contents of a document. > > If all you want to do is display the results within the document > in order, your app can do that as it builds the display page. > > Best > Erick > > On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari > wrote: > > Hi , > > i have multi valued field want to sort the docs order the particular > > text eq:'B1' is added. > > how i should query? ad_text is multivalued field. > > > > t > > > > > > > > B1 > > B2 > > B3 > > > > > > > > > > B2 > > B1 > > B3 > > > > > > > > > > > > B1 > > B2 > > B3 > > > > > > > > > > B3 > > B2 > > B1 > > > > >
Re: query help
can i achieve this with help of boosting technique ? On Thu, Mar 29, 2012 at 10:42 PM, Erick Erickson wrote: > Solr doesn't support sorting on multValued fields so I don't think this > is possible OOB. > > I can't come up with a clever indexing solution that does this either, > sorry. > > Best > Erick > > On Thu, Mar 29, 2012 at 8:27 AM, Abhishek tiwari > wrote: > > a) No. i do not want to sort the content within document . > > I want to sort the documents . > > b) As i have explained i have result set( documents ) and each document > > contains a fields "*ad_text*" (with other fields also) which is > > multivalued..storing some tags say "B1, B2, B3" in each. bt order of tags > > are different for each doc. say (B1, B2, B3) *for doc1*, B3,B1 B2*, for > > doc2*, B1, B3, B2*, doc3*, B2, B3, B1* for doc4* > > > > if i search for B1: result should come in following order: > > doc1,doc3,doc2,doc4 > > (As B1 is first value in maltivalued result for doc1and doc3, and B1 is > in > > 2nd value in doc2 while B1 is at 3rd in doc4 ) > > if i search for B2: result should come in following order: doc4 > > ,doc1,doc3,doc2 > > > > > > I donot know whether it is possible or not .. > > > > but please suggest how it can be done. > > > > > > > > On Thu, Mar 29, 2012 at 5:18 PM, Erick Erickson >wrote: > > > >> Hmmm, I don't quite get this. Are you saying that you want > >> to sort the documents or sort the content within the document? > >> > >> Sorting documents (i.e the results list) requires a single-valued > >> field. So you'd have to, at index time, sort the entries. > >> > >> Sorting the content within the document is something you'd > >> have to do when you index, Solr doesn't rearrange the > >> contents of a document. > >> > >> If all you want to do is display the results within the document > >> in order, your app can do that as it builds the display page. > >> > >> Best > >> Erick > >> > >> On Wed, Mar 28, 2012 at 9:02 AM, Abhishek tiwari > >> wrote: > >> > Hi , > >> > i have multi valued field want to sort the docs order the particular > >> > text eq:'B1' is added. > >> > how i should query? ad_text is multivalued field. > >> > > >> > t > >> > > >> > > >> > > >> > B1 > >> > B2 > >> > B3 > >> > > >> > > >> > > >> > > >> > B2 > >> > B1 > >> > B3 > >> > > >> > > >> > > >> > > >> > > >> > B1 > >> > B2 > >> > B3 > >> > > >> > > >> > > >> > > >> > B3 > >> > B2 > >> > B1 > >> > > >> > > >> >
Re: Error
i am using 3.4 solr version... please assist... On Thu, Apr 12, 2012 at 8:41 PM, Erick Erickson wrote: > Please review: > > http://wiki.apache.org/solr/UsingMailingLists > > You haven't said whether, for instance, you're using trunk which > is the only version that supports the "termfreq" function. > > Best > Erick > > On Thu, Apr 12, 2012 at 4:08 AM, Abhishek tiwari > wrote: > > > http://xyz.com:8080/newschema/mainsearch/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&sort=termfreq%28cuisine_priorities_list,%27Chinese%27%29%20desc > > > > Error : HTTP Status 400 - Missing sort order. > > Why i am getting error ? >
Searching .msg files
Hello Everyone, In my company, we store a lot of old emails (.msg files) in a database (done for the purpose of legal compliance). The users have been asking us to give search functionality on the old emails. One of the primary requirement is that when people search, they should only be able to search in their own emails (emails in which they were in the to, cc or bcc list). How can solr be used? from what I know about this product is that it only searches xml content... so I will have to extract the body of the email and convert it to xml right? How will I limit the search results to only those emails where the user who is searching was in the to, cc or bcc list? Please do recommend me an approach for providing a solution to our requirement.
Comparison of Solr with Sharepoint Search
Has anyone done a functionality comparison of Solr with Sharepoint/Fast Search? If yes, kindly share a few details here. Thanks for your help in advance! Regards, Abhishek.
Question on Tokenizing email address
Hello Everyone, I have a field in my solr schema which stores emails. The way I want the emails to be tokenized is like this. if the email address is abc@alpha-xyz.com User should be able to search on 1. abc@alpha-xyz.com (whole address) 2. abc 3. def 4. alpha-xyz Which tokenizer should I use? Also, is there a feature like "Must Match" in solr? in my schema there is field called "from" which contains the email address of the person who sent an email. For this field, I don't want any tokenization. When the user issues a search. The users email ID must exactly match the "for" column value for that document/record to be returned. How can I do this? Regards, Abhishek
regarding 'sharedlib' in solr
Hi, I want to share a folder containing text files in solr among different cores so if the folder is updated ,so it would reflect in all the cores having path specified but the problem I am facing is that , i am using sharedlib in solr.xml and specifying the default path there.And also I am updating schema.xml of my core but when I am loading core ,it is giving error 'unsafe loading' .and not getting reload. Please help me in this. -- Thanks, Abhishek Agarwal
Re: Error:Missing Required Fields for Atomic Updates
Update Handler expect all the required fields to be passed in even for the atomic update request payload. https://github.com/apache/lucene-solr/blob/branch_7_5/solr/core/src/java/org/apache/solr/update/DocumentBuilder.java Hope this helps! // Now validate required fields or add default values // fields with default values are defacto 'required' // Note: We don't need to add default fields if this document is to be used for // in-place updates, since this validation and population of default fields would've happened // during the full indexing initially. if (!forInPlaceUpdate) { for (SchemaField field : schema.getRequiredFields()) { if (out.getField(field.getName() ) == null) { if (field.getDefaultValue() != null) { addField(out, field, field.getDefaultValue(), false); } else { String msg = getID(doc, schema) + "missing required field: " + field.getName(); throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, msg ); } } } } Cheers! Abhishek On Tue, Nov 20, 2018 at 11:47 AM Rahul Goswami wrote: > What is the Router name for your collection? Is it "implicit" (You can > know this from the "Overview" of you collection in the admin UI) ? If yes, > what is the router.field parameter the collection was created with? > > Rahul > > > On Mon, Nov 19, 2018 at 11:19 PM Rajeswari Kolluri < > rajeswari.koll...@oracle.com> wrote: > > > > > Hi Rahul > > > > Below is part of schema , entityid is my unique id field. Getting > > exception missing required field for "category" during atomic updates. > > > > > > entityid > > > required="true" multiValued="false" /> > > > required="false" multiValued="false" /> > > > stored="true" required="false" multiValued="false" /> > > > stored="true" required="false" multiValued="false" /> > > > stored="true" required="false" multiValued="false" /> > > > stored="true" required="false" multiValued="false" /> > > > stored="true" required="false" multiValued="false" /> > > > required="true" docValues="true" /> > > > required="false" multiValued="true" /> > > > > > > > > Thanks > > Rajeswari > > > > -Original Message- > > From: Rahul Goswami [mailto:rahul196...@gmail.com] > > Sent: Tuesday, November 20, 2018 9:33 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Error:Missing Required Fields for Atomic Updates > > > > What’s your update query? > > > > You need to provide the unique id field of the document you are updating. > > > > Rahul > > > > On Mon, Nov 19, 2018 at 10:58 PM Rajeswari Kolluri < > > rajeswari.koll...@oracle.com> wrote: > > > > > Hi, > > > > > > > > > > > > > > > > > > Using Solr 7.5.0. While performing atomic updates on a document on > > > Solr Cloud using SolJ getting exceptions "Missing Required Field". > > > > > > > > > > > > Please let me know the solution, would not want to update the > > > required fields during atomic updates. > > > > > > > > > > > > > > > > > > Thanks > > > > > > Rajeswari > > > > > >
Re: Reg:- Create Solr Core Using Command Line
Hello, I followed the steps outlined in your mail. i was able to get a running core up fine. only thing I can think of in your case is the config directory having all the required files for the SOLR Core to get initialized. can you check if you have all the SOLR config files in the config directory ( i.e, schema.xml, solrconfig.xml, also various supporting files referred in the schema.xml) you specified on the command line? can you share the conf directory, if you can? Cheers! Abhishek On Tue, Feb 6, 2018 at 9:30 AM, @Nandan@ wrote: > Hi Sadiki, > I checked Sample Techproduct Conf folder. Inside that folder, there are > numerous files. So Again my question will be how those files came. > I want to create core from Scratch and want to check and create each and > every config files from my sides. Then only I can able to understand what > and which files needs in different solr search function. > I hope you can understand my query. > > Thanks > > On Tue, Feb 6, 2018 at 11:48 AM, Sadiki Latty wrote: > > > If I'm not mistaken the command requires that the books_data folder > > already exists with a conf folder inside and the various required files > > (solrconfig.xml, solr.xml,etc). To get an idea of what you should have in > > your conf folder you can check out the included configsets > > (sample_techproducts_configs for example). These configsets have the > > required files and you can copy and modify to accommodate your own > needs. I > > am not 100% sure where to find them on a windows installation but I > believe > > it would be C:\solr\server\configsets\ or another subfolder of the > server > > folder. > > > > -Original Message- > > From: @Nandan@ [mailto:nandanpriyadarshi...@gmail.com] > > Sent: Monday, February 5, 2018 9:46 PM > > To: solr-user@lucene.apache.org > > Subject: Reg:- Create Solr Core Using Command Line > > > > Hi , > > This question might be very basic, but need to clarify my basic > > understanding. > > I am using Solr Version 7.2.1 > > I have one CSV file named as books_data.csv which contains 2 records. > > Now I want to create solr core to start my basic search using Solr UI. > > Steps which I Follow :- > > 1) Go to bin directory and start solr > > C:\solr\bin>solr start -p 8983 > > 2) books_data.csv is in C:\solr location > > 3) Now I try to create solr core. > > C:\solr\bin>solr create_core -c books_data -d C:\solr Got Error :- No > Conf > > Sub folder or Solrconfig.xml file present. > > 4) Then I created folder "books_data" in C:\solr location and Created > conf > > subfolder under books_data folder and put solrconfig.xml inside conf > > subfolder. > > 5) Again start to execute query > > C:\solr\bin>solr create_core -c books_data -d C:\solr\books_data Got > Error > > :- Already core existed. > > When I checked Solr Admin UI , showing error message as SolrCore > > Initialization Failures > > > >- *books_data:* > >org.apache.solr.common.SolrException:org.apache.solr. > > common.SolrException: > >Could not load conf for core books_data: Error loading solr config > from > >C:\solr\bin\books_data\conf\solrconfig.xml > > > > > > Please tell me where am I doing wrong? > > Thanks. > > >
Re: Reg:- Create Solr Core Using Command Line
you can try using the post Tool. https://lucene.apache.org/solr/guide/6_6/post-tool.html bin/post -c film example/books_data.csv Cheers! Abhishek On Tue, Feb 6, 2018 at 1:22 PM, @Nandan@ wrote: > Hi , > I created core name as "films". Now I am trying to insert my csv file by > below step:- > C:\solr>curl "http://localhost:8983/solr/films/update?commit=true"; > --data-binary @example/books_data.csv -H 'Content-type:application/csv' > Got Below result. > { > "responseHeader":{ > "status":0, > "QTime":279}} > > But in Solr Admin UI, can't able to see any data. > please tell me where am i wrong ? > Thanks > > > On Tue, Feb 6, 2018 at 1:42 PM, Shawn Heisey wrote: > > > On 2/5/2018 10:39 PM, Shawn Heisey wrote: > > > >> In order for this solr script command to work, the argument to the -d > >> option (which you have as C:\solr) would have to be a config directory, > >> containing a minimum of solrconfig.xml and the schema. > >> > > Replying to myself because I made an error here. > > > > The directory provided with -d needs to contain a "conf" subdirectory, > > which in turn must contain the files that I mentioned. > > > > Thanks, > > Shawn > > >
SOLR 7.x stable version
Hi All - I am using SOLR Cloud v6.5.0 and looking to upgrade it to SOLR 7.x; any suggestions which are the most stable version in SOLR 7.x series. from my initial reading, I see until SOLR 7.2 we had issues with CDCR updates. Thank you for your suggestions. Thanks, Abhishek
Inconsistent recovery status of replicas
Hello guys I am using Solr cloud 7.7 on Kubernetes. During the adding of replica sometimes we see inconsistency after successful addition nodes go to recovery status sometimes it takes 2-3 minute to recover while sometimes it takes more than an hour. We are getting this error. We have 4 shards each shard has around 7GB of data. After seeing the system metrics we see bandwidth exchanges are high between the leader and the new replica node. Do we have any way to rate-limit the bandwidth exchange like we had some configuration for it in master-slave? maxMbpersec something like that? Error > 2020-12-01 13:40:34.983 ERROR > (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr > x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec > s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 > x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy > Error while trying to > recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured > while waiting response from server at: > http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139 > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) > at > org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215) > at > org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.net.SocketTimeoutException: Read timed out > at java.base/java.net.SocketInputStream.socketRead0(Native Method) > at > java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115) > at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168) > at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140) > at > org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) > at > org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) > at > org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) > at > org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) > at > org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) > at > org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) > at > org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) > at > org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) > at > org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) > at > org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120) > at > org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) > at > org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) > at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) > at > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) >
Migrating from solr 7.7 to solr 8.6 issues
We are trying to migrate from solr 7.7 to solr 8.6 on Kubernetes. We are using zookeeper-3.4.13. While adding a replica to the cluster, it returns 500 status code. While in the background it is added sometimes successfully while sometime it is in the inactive node. We are using http2 without SSL. Error: > { "responseHeader":{ "status":500, "QTime":307}, "failure":{ "solr-pklatest-statefulset-pull-0.solr-pklatest-statefulset-headless.relevance:8983_solr":"org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: null"}, "Operation addreplica caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: ADDREPLICA failed to create replica", "exception":{ "msg":"ADDREPLICA failed to create replica", "rspCode":500}, "error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","org.apache.solr.common.SolrException"], "msg":"ADDREPLICA failed to create replica", "trace":"org.apache.solr.common.SolrException: ADDREPLICA failed to create replica\n\tat org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:286)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:214)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:854)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:818)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:566)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:500)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThr
solrcloud with EKS kubernetes
Hello guys, We are kind of facing some of the issues(Like timeout etc.) which are very inconsistent. By any chance can it be related to EKS? We are using solr 7.7 and zookeeper 3.4.13. Should we move to ECS? Regards, Abhishek
Re: solrcloud with EKS kubernetes
Hi Houston, Sorry for the late reply. Each shard has a 9GB size around. Yeah, we are providing enough resources to pods. We are currently using c5.4xlarge. XMS and XMX is 16GB. The machine is having 32 GB and 16 core. No, I haven't run it outside Kubernetes. But I do have colleagues who did the same on 7.2 and didn't face any issue regarding it. Storage volume is gp2 50GB. It's not the search query where we are facing inconsistencies or timeouts. Seems some internal admin APIs sometimes have issues. So while adding new replica in clusters sometimes result in inconsistencies. Like recovery takes some time more than one hour. Regards, Abhishek On Thu, Dec 10, 2020 at 10:23 AM Houston Putman wrote: > Hello Abhishek, > > It's really hard to provide any advice without knowing any information > about your setup/usage. > > Are you giving your Solr pods enough resources on EKS? > Have you run Solr in the same configuration outside of kubernetes in the > past without timeouts? > What type of storage volumes are you using to store your data? > Are you using headless services to connect your Solr Nodes, or ingresses? > > If this is the first time that you are using this data + Solr > configuration, maybe it's just that your data within Solr isn't optimized > for the type of queries that you are doing. > If you have run it successfully in the past outside of Kubernetes, then I > would look at the resources that you are giving your pods and the storage > volumes that you are using. > If you are using Ingresses, that might be causing slow connections between > nodes, or between your client and Solr. > > - Houston > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra > wrote: > > > Hello guys, > > We are kind of facing some of the issues(Like timeout etc.) which are > very > > inconsistent. By any chance can it be related to EKS? We are using solr > 7.7 > > and zookeeper 3.4.13. Should we move to ECS? > > > > Regards, > > Abhishek > > >
Custom cache for Solr Cloud mode
Hi, I am trying make use of User Defined cache functionality to optimise a particular workflow. We are using Solr 7.4. Step 1. I noticed, first we would have to add Custom Cache entry in solrconfig.xml. What’s its Config API alternative for solrCould ? I couldn’t find one at, https://lucene.apache.org/solr/guide/7_4/config-api.html (or may be I missed it). Could anyone point out to some link ? Step 2. In step 2, to insert the required cache, I could see there is cacheInsert() method available for SolrIndexSearcher class. I am not sure how to build object for this class. I started with CoreContainer object, which just needs SOLR_HOME for initialisation. >From this I was trying to get SolrCore objects. And then I was trying to build object of SolrIndexSearcher from above SolrCore class objects : => SolrIndexSearcher newSearcher = new SolrIndexSearcher(_core, _core.getNewIndexDir(), _core.getLatestSchema(), _core.getSolrConfig().indexConfig, “test query", false, _core.getDirectoryFactory()); But getAllCoreNames return me empty list of SolrCore objects. So it didn’t work. Not sure, what am I missing , any pointer would be greatly appreciated. Regards, Abhishek
Re: solrcloud with EKS kubernetes
Hi Jonathan, Merry Christmas. Thanks for the suggestion. To manage IOPS can we do something on rate-limiting behalf? Regards, Abhishek On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan wrote: > Hi Abhishek, > > We're running Solr Cloud 8.6 on GKE. > 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM > configured, all with anti-affinity so they never exist on the same node. > It's got 2 collections of ~13documents each, 6 shards, 3 replicas each, > disk usage on each node is ~54gb (we've got all the shards replicated to > all nodes) > > We're also using a 200gb zonal SSD, which *has* been necessary just so that > we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for > read & write each, and 96MB/s for read & write each) > > Various lessons learnt... > You definitely don't want them ever on the same kubernetes node. From a > resilience perspective, yes, but also when one SOLR node gets busy, they > tend to all get busy, so now you'll have resource contention. Recovery can > also get very busy and resource intensive, and again, sitting on the same > node is problematic. We also saw the need to move to SSDs because of how > IOPS bound we were. > > Did I mention use SSDs? ;) > > Good luck! > > On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra > wrote: > > > Hi Houston, > > Sorry for the late reply. Each shard has a 9GB size around. > > Yeah, we are providing enough resources to pods. We are currently > > using c5.4xlarge. > > XMS and XMX is 16GB. The machine is having 32 GB and 16 core. > > No, I haven't run it outside Kubernetes. But I do have colleagues who did > > the same on 7.2 and didn't face any issue regarding it. > > Storage volume is gp2 50GB. > > It's not the search query where we are facing inconsistencies or > timeouts. > > Seems some internal admin APIs sometimes have issues. So while adding new > > replica in clusters sometimes result in inconsistencies. Like recovery > > takes some time more than one hour. > > > > Regards, > > Abhishek > > > > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman > > > wrote: > > > > > Hello Abhishek, > > > > > > It's really hard to provide any advice without knowing any information > > > about your setup/usage. > > > > > > Are you giving your Solr pods enough resources on EKS? > > > Have you run Solr in the same configuration outside of kubernetes in > the > > > past without timeouts? > > > What type of storage volumes are you using to store your data? > > > Are you using headless services to connect your Solr Nodes, or > ingresses? > > > > > > If this is the first time that you are using this data + Solr > > > configuration, maybe it's just that your data within Solr isn't > optimized > > > for the type of queries that you are doing. > > > If you have run it successfully in the past outside of Kubernetes, > then I > > > would look at the resources that you are giving your pods and the > storage > > > volumes that you are using. > > > If you are using Ingresses, that might be causing slow connections > > between > > > nodes, or between your client and Solr. > > > > > > - Houston > > > > > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra > > > wrote: > > > > > > > Hello guys, > > > > We are kind of facing some of the issues(Like timeout etc.) which are > > > very > > > > inconsistent. By any chance can it be related to EKS? We are using > solr > > > 7.7 > > > > and zookeeper 3.4.13. Should we move to ECS? > > > > > > > > Regards, > > > > Abhishek > > > > > > > > > >
How pull replica works
I want to know how pull replica replicate from leader in real? Does internally admin API get data from the leader in form of batches? Regards, Abhishek
Re: How pull replica works
Thanks, Tomas. It was really helpful. Regards, Abhishek On Thu, Jan 7, 2021 at 7:03 AM Tomás Fernández Löbbe wrote: > Hi Abhishek, > The pull replicas uses the "/replication" endpoint to copy full segment > files (sections of the index) from the leader. It works in a similar way to > the legacy leader/follower replication. This[1] talk tries to explain the > different replica types and how they work. > > HTH, > > Tomás > > [1] https://www.youtube.com/watch?v=C8C9GRTCSzY > > On Tue, Jan 5, 2021 at 10:29 PM Abhishek Mishra > wrote: > > > I want to know how pull replica replicate from leader in real? Does > > internally admin API get data from the leader in form of batches? > > > > Regards, > > Abhishek > > >
Re: Solr background merge in case of pull replicas
Hi Kshitij What I can guess over here. Pull replicas replicate segments from tlog, so whenever merge happens on tlog it will decrease the number of segments which is more than ideal case(i.e. adding a new segment). Afaik adding/deleting the segment is kind of a stop the world moment. This can be the reason for the increase in response time. Regards, Abhishek On Thu, Jan 7, 2021 at 12:43 PM kshitij tyagi wrote: > Hi, > > I am not querying on tlog replicas, solr version is 8.6 and 2 tlogs and 4 > pull replica setup. > > why should pull replicas be affected during background segment merges? > > Regards, > kshitij > > On Wed, Jan 6, 2021 at 9:48 PM Ritvik Sharma > wrote: > > > Hi > > It may be the cause of rebalancing and querying is not available not on > > tlog at that moment. > > You can check tlog logs and pull log when u are facing this issue. > > > > May i know which version of solr you are using? and what is the ration of > > tlog and pull nodes. > > > > On Wed, 6 Jan 2021 at 2:46 PM, kshitij tyagi > > wrote: > > > > > Hi, > > > > > > I am having a tlog + pull replica solr cloud setup. > > > > > > 1. I am observing that whenever background segment merge is triggered > > > automatically, i see high response time on all of my solr nodes. > > > > > > As far as I know merges must be happening on tlog and hence the > increase > > > response time, i am not able to understand that why my pull replicas > are > > > affected during background index merges. > > > > > > Can someone give some insights on this? What is affecting my pull > > replicas > > > during index merges? > > > > > > Regards, > > > kshitij > > > > > >