Re: SOLR-769 clustering
hi Staszek, Thank you very much for your advice. My problem has been solved. It is caused by the regexp in the stoplables.en. I didn't released that regular expression is required in order to filter out the words. I have add in the regexp in my stoplabels.en and it works like a charm. -GC On Wed, Sep 9, 2009 at 3:34 AM, Stanislaw Osinski wrote: > Hi, > > It seems like the problem can be on two layers: 1) getting the right > contents of stop* files for Carrot2, 2) making sure Solr picks up the > changes. > > I tried your quick and dirty hack too. It didn't work also. phase like > > "Carbon Atoms in the Group" with "in" still appear in my clustering > labels. > > > > Here most probably layer 1) applies: if you add "in" to stopwords, the > Lingo > algorithm (Carrot2's default) will still create labels with "in" inside, > but > will not create labels starting / ending in "in". If you'd like to > eliminate > "in" completely, you'd need to put an appropriate regexp in stoplabels.*. > > For more details, please see Carrot2 manual: > > > http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning.stop-words > > http://download.carrot2.org/head/manual/#section.advanced-topics.fine-tuning.stop-regexps > > The easiest way to tune the stopwords and see their impact on clusters is > to > use Carrot2 Document Clustering Workbench (see > http://wiki.apache.org/solr/ClusteringComponent). > > > > What i did is, > > > > 1. use "java uf carrot2-mini.jar stoplabels.en" command to replace the > > stoplabel.en file. > > 2. apply clustering patch. re-complie the solr with the new > > carrot2-mini.jar. > > 3. deploy the new apache-solr-1.4-dev.war to tomcat. > > > > Once you make sure the changes to stopwords.* and stoplabels.* have the > desired effect on clusters, the above procedure should do the trick. You > can > also put the modified files in WEB-INF/classes of the WAR, if that's any > easier. > > For your reference, I've updated > http://wiki.apache.org/solr/ClusteringComponent to contain a procedure > working with the Jetty starter distributed in Solr's examples folder. > > > > > class="org.apache.solr.handler.clustering.ClusteringComponent" > > name="clustering"> > > > >default > > > > > > name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm > >20 > >0.150 > > > name="carrot.lingo.threshold.candidateClusterThreshold">0.775 > > > > Not really related to your issue, but the above file looks a little > outdated > -- the two parameters:"carrot.lingo.threshold.clusterAssignment" and > "carrot.lingo.threshold.candidateClusterThreshold" are not there anymore > (but there are many others: > http://download.carrot2.org/stable/manual/#section.component.lingo). For > most up to date examples, please see > http://wiki.apache.org/solr/ClusteringComponent and solrconfig.xml in > contrib\clustering\example\conf. > > Cheers, > > Staszek >
Re: Results from 2 core
On Wed, Sep 9, 2009 at 8:58 AM, Mohamed Parvez wrote: > I have a multi core Solr setup. > > Is it possible to return results from the second core, if the search on the > first core, does not return any results. > No but you can make two queries > > Or if its possible to return, the results from both the cores, in > one response > > Both the core, Have the different schema, one is getting its data from > database another is getting the payload from the Nutch crawl. > > If the schema is different, how can the same query work on both cores? -- Regards, Shalin Shekhar Mangar.
Re: abortOnConfigurationError=false not taking effect in solr 1.3
On Mon, Sep 7, 2009 at 8:58 PM, djain101 wrote: > > > Please suggest what is the right way to configure so that if one core fails > due to configuration errors, all other cores remain unaffected? > > * > Check your log files for more detailed information on what may be wrong. > > If you want solr to continue after configuration errors, change: > > false > > in solr.xml > > java.lang.RuntimeException: java.io.IOException: Cannot create directory: > /usr/local/app/data/search/core09/index > That error suggests that you don't have a configuration error. The data directory you have given either does not exist or is read-only. It is a runtime error. -- Regards, Shalin Shekhar Mangar.
Re: Very Urjent
On Wed, Sep 9, 2009 at 11:53 AM, dharhsana wrote: > > Iam new to solr, > My requirement is that,i need to have Autocompletion text box in my blog > application,i need to know how to implement it with solr 1.4. > > I have gone through TermsComponent,but TermsComponent is not available in > solr 1.4 which i have downloaded. > > TermsComponent is definitely available in Solr 1.4, check again. > Can any one please help out ,how to do autosuggest using solr 1.4,and > provide me the code along with schema.xml and solrConfig.xml.So that it > will > be useful for me to know how to configure it. > > See an alternative approach at http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -- Regards, Shalin Shekhar Mangar.
RE: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?
On question to this; Do you need to explicitly configure a 'dismax' queryparser in the solrconfig.xml to enable this, or is a queryparser named 'dismax' available per default? Cheers, Gert. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, September 02, 2009 2:44 AM To: solr-user@lucene.apache.org Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ? : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply the : standard request handler with the default query parser set to the : DisMax Query Parser (defType=dismax).". I just made a checkout of svn : and dismax doesn't seems to be the default as : that paragraph doesn't say that dismax is the "default handler" ... it says that using qt=dismax is the same as using qt=standard with the " query parser" set to be the DisMaxQueryParser (using defType=dismax) so doing this replacement on any URL... qt=dismax => qt=standard&defTYpe=dismax ...should produce identical results. : Secondly, I've patched solr with : http://issues.apache.org/jira/browse/SOLR-629 as I would like to have : fuzzy with dismax. I built it with "ant example". Now, behavior is : still the same, no fuzzy search with dismax (using the qt=dismax : parameter in GET URL). questions/discussion of uncommitted patches is best done in the Jira issue wherey ou found the patch ... that way it helps other people evaluate the patch, and the author of the patch is more likelye to see your feedback. -Hoss Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: Very Urjent
Hi Shalin Shekhar Mangar, I got some come from this site http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ When i use that code in my project ,then only i came to know that there is no Termscomponent jar or plugin .. There is any other way for doing autocompletion search with out terms component. If so please tell me how to implement it. waiting for your reply Regards, Rekha. -- View this message in context: http://www.nabble.com/Very-Urjent-tp25359244p25360892.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?
Hi Gert, &qt=dismax in URL works with Solr 1.3 and 1.4 without further configuration. You are right, you should find a "dismax" query parser in solrconfig.xml by default. Erwin On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert wrote: > On question to this; > > Do you need to explicitly configure a 'dismax' queryparser in the > solrconfig.xml to enable this, or is a queryparser named 'dismax' > available per default? > > Cheers, > Gert. > > > > > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Wednesday, September 02, 2009 2:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > support fuzzy search ? > > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply the > : standard request handler with the default query parser set to the > : DisMax Query Parser (defType=dismax).". I just made a checkout of svn > : and dismax doesn't seems to be the default as : > > that paragraph doesn't say that dismax is the "default handler" ... it > says that using qt=dismax is the same as using qt=standard with the " > query parser" set to be the DisMaxQueryParser (using defType=dismax) > > > so doing this replacement on any URL... > > qt=dismax => qt=standard&defTYpe=dismax > > ...should produce identical results. > > : Secondly, I've patched solr with > : http://issues.apache.org/jira/browse/SOLR-629 as I would like to have > : fuzzy with dismax. I built it with "ant example". Now, behavior is > : still the same, no fuzzy search with dismax (using the qt=dismax > : parameter in GET URL). > > questions/discussion of uncommitted patches is best done in the Jira > issue > wherey ou found the patch ... that way it helps other people evaluate > the > patch, and the author of the patch is more likelye to see your feedback. > > > -Hoss > > > > Please help Logica to respect the environment by not printing this email / > Pour contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen > Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente nao imprimindo este correio electronico. > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. > > >
where can i find solr1.4
Hi Where can I find solr1.4.war Thanks Arun -Original Message- From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin Sent: Wednesday, September 09, 2009 2:25 PM To: solr-user@lucene.apache.org Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ? Hi Gert, &qt=dismax in URL works with Solr 1.3 and 1.4 without further configuration. You are right, you should find a "dismax" query parser in solrconfig.xml by default. Erwin On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert wrote: > On question to this; > > Do you need to explicitly configure a 'dismax' queryparser in the > solrconfig.xml to enable this, or is a queryparser named 'dismax' > available per default? > > Cheers, > Gert. > > > > > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Wednesday, September 02, 2009 2:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > support fuzzy search ? > > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply the > : standard request handler with the default query parser set to the > : DisMax Query Parser (defType=dismax).". I just made a checkout of svn > : and dismax doesn't seems to be the default as : > > that paragraph doesn't say that dismax is the "default handler" ... it > says that using qt=dismax is the same as using qt=standard with the " > query parser" set to be the DisMaxQueryParser (using defType=dismax) > > > so doing this replacement on any URL... > > qt=dismax => qt=standard&defTYpe=dismax > > ...should produce identical results. > > : Secondly, I've patched solr with > : http://issues.apache.org/jira/browse/SOLR-629 as I would like to have > : fuzzy with dismax. I built it with "ant example". Now, behavior is > : still the same, no fuzzy search with dismax (using the qt=dismax > : parameter in GET URL). > > questions/discussion of uncommitted patches is best done in the Jira > issue > wherey ou found the patch ... that way it helps other people evaluate > the > patch, and the author of the patch is more likelye to see your feedback. > > > -Hoss > > > > Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. > > > > This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. > > >
Re: where can i find solr1.4
Hi, Juste checkout trunk of svn. After that, war file is at ./trunk/dist/apache-solr-1.4-dev.war On Wed, Sep 9, 2009 at 8:56 AM, Venkatesan A. wrote: > Hi > > Where can I find solr1.4.war > > Thanks > Arun > > -Original Message- > From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin > Sent: Wednesday, September 09, 2009 2:25 PM > To: solr-user@lucene.apache.org > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > support fuzzy search ? > > Hi Gert, > > &qt=dismax in URL works with Solr 1.3 and 1.4 without further > configuration. You are right, you should find a "dismax" query parser > in solrconfig.xml by default. > > Erwin > > On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert > wrote: >> On question to this; >> >> Do you need to explicitly configure a 'dismax' queryparser in the >> solrconfig.xml to enable this, or is a queryparser named 'dismax' >> available per default? >> >> Cheers, >> Gert. >> >> >> >> >> -Original Message- >> From: Chris Hostetter [mailto:hossman_luc...@fucit.org] >> Sent: Wednesday, September 02, 2009 2:44 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't >> support fuzzy search ? >> >> : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply the >> : standard request handler with the default query parser set to the >> : DisMax Query Parser (defType=dismax).". I just made a checkout of svn >> : and dismax doesn't seems to be the default as : >> >> that paragraph doesn't say that dismax is the "default handler" ... it >> says that using qt=dismax is the same as using qt=standard with the " >> query parser" set to be the DisMaxQueryParser (using defType=dismax) >> >> >> so doing this replacement on any URL... >> >> qt=dismax => qt=standard&defTYpe=dismax >> >> ...should produce identical results. >> >> : Secondly, I've patched solr with >> : http://issues.apache.org/jira/browse/SOLR-629 as I would like to have >> : fuzzy with dismax. I built it with "ant example". Now, behavior is >> : still the same, no fuzzy search with dismax (using the qt=dismax >> : parameter in GET URL). >> >> questions/discussion of uncommitted patches is best done in the Jira >> issue >> wherey ou found the patch ... that way it helps other people evaluate >> the >> patch, and the author of the patch is more likelye to see your feedback. >> >> >> -Hoss >> >> >> >> Please help Logica to respect the environment by not printing this email > / Pour contribuer comme Logica au respect de l'environnement, merci de ne > pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und > helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a > Logica a respeitar o ambiente nao imprimindo este correio electronico. >> >> >> >> This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. Thank you. >> >> >> > >
Re: Geographic clustering
Hi Joe, Thanks for the link, I'll check it out, I'm not sure it'll help in my situation though since the clustering should happen at runtime due to faceted browsing (unless I'm mistaken at what the preprocessing does). More on my progress though, I thought some more about using Hilbert curve mapping and it seems really suited for what I want. I've just added a Hilbert field to my schema (Trie Integer field) with latitude and longitude at 15bits precision (didn't use 16 bits to avoid the sign bit) so I have a 30 bit number in said field. Getting facet counts for 0 to (2^30 - 1) should get me the entire map while getting counts for 0 to (2^28 - 1), 2^28 to (2^29 - 1), 2^29 to (2^29 + 2^28 - 1) and (2^29 + 2^28) to (2^30 - 1) should give me counts for four equal quadrants, all the way down to 0 to 3, 4 to 7, 8 to 11 (2^30 - 4 to 2^30 - 1) and of course faceting on every separate term. Of course since if you're zoomed in far enough to need such fine grained clustering you'll be looking at a small portion of the map and only a part of the whole range should be counted, but that should be doable by calculating the Hilbert number for the lower and upper bounds. The only problem is the location of the clusters, if I use this method I'll only have the Hilbert number and the number of items in that part of the, what is essentially a quadtree. But I suppose I can calculate the facet counts for one precision finer than the requested precision and use a weighted average of the four parts of the cluster, I'll have to see if that is accurate enough. Hopefully I'll have the time to complete this today or tomorrow. I'll report back if it has worked. Regards, gwk Joe Calderon wrote: there are clustering libraries like http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/, that have bindings to perl/python, you can preprocess your results and create clusters for each zoom level On Tue, Sep 8, 2009 at 8:08 AM, gwk wrote: Hi, I just completed a simple proof-of-concept clusterer component which naively clusters with a specified bounding box around each position, similar to what the javascript MarkerClusterer does. It's currently very slow as I loop over the entire docset and request the longitude and latitude of each document (Not to mention that my unfamiliarity with Lucene/Solr isn't helping the implementations performance any, most code is copied from grep-ing the solr source). Clustering a set of about 80.000 documents takes about 5-6 seconds. I'm currently looking into storing the hilber curve mapping in Solr and clustering using facet counts on numerical ranges of that mapping but I'm not sure it will pan out. Regards, gwk Grant Ingersoll wrote: Not directly related to geo clustering, but http://issues.apache.org/jira/browse/SOLR-769 is all about a pluggable interface to clustering implementations. It currently has Carrot2 implemented, but the APIs are marked as experimental. I would definitely be interested in hearing your experience with implementing your clustering algorithm in it. -Grant On Sep 8, 2009, at 4:00 AM, gwk wrote: Hi, I'm working on a search-on-map interface for our website. I've created a little proof of concept which uses the MarkerClusterer (http://code.google.com/p/gmaps-utility-library-dev/) which clusters the markers nicely. But because sending tens of thousands of markers over Ajax is not quite as fast as I would like it to be, I'd prefer to do the clustering on the server side. I've considered a few options like storing the morton-order and throwing away precision to cluster, assigning all locations to a grid position. Or simply cluster based on country/region/city depending on zoom level by adding latitude on longitude fields for each zoom level (so that for smaller countries you have to be zoomed in further to get the next level of clustering). I was wondering if anybody else has worked on something similar and if so what their solutions are. Regards, gwk -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Creating facet query using SolrJ
hello, I am using SolrJ to access solr indexes. When constructing query, I create a lucene query and use query.toString to create SolrQuery. I am facing difficulty while creating facet query for individual field, as I could not find an easy and clean way of constructing facet query with parameters specified at field level. As I understand, the faceting parameters like limit, sort order etc. can be set on SolrQuery object but they are used for all the facets in query. I would like to provide these parameters separately for each field. I am currently building such query in Java code using string append. But it looks really bad, and would be prone to breaking when query syntax changes in future. If there any better way of constructing such detailed facet queries, the way we build the main solr search query? regards, aakash
RE: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?
Sorry for being a bit dim, I dont understand this; Looking at my default configuration for SOLR, I have a request handler named 'dismax' and request handler named 'standard' with the default="true". I understand that I can configure the usage of this in the query using the qt=dismax or qt=standard (... Or no qt as standard is set to default). And if I set the 'defType=dismax' flag in the standard requesthandler then I will use the dismax queryparser per default. This far, so good. What I dont understand is whether a requesthandler and a queryparser is the same thing, i.e. The configuration contains a REQUESTHANDLER with the name 'dismax', but does not contain a QUERYPARSER with the name 'dismax'. Where does the 'dismax' queryparser come from? Do I have to configure this extra? Or is it there per default? Or does it come from the 'dismax' requesthandler? Gert. -Original Message- From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin Sent: Wednesday, September 09, 2009 10:55 AM To: solr-user@lucene.apache.org Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ? Hi Gert, &qt=dismax in URL works with Solr 1.3 and 1.4 without further configuration. You are right, you should find a "dismax" query parser in solrconfig.xml by default. Erwin On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert wrote: > On question to this; > > Do you need to explicitly configure a 'dismax' queryparser in the > solrconfig.xml to enable this, or is a queryparser named 'dismax' > available per default? > > Cheers, > Gert. > > > > > -Original Message- > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > Sent: Wednesday, September 02, 2009 2:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > support fuzzy search ? > > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply > the > : standard request handler with the default query parser set to the > : DisMax Query Parser (defType=dismax).". I just made a checkout of > svn > : and dismax doesn't seems to be the default as : > > that paragraph doesn't say that dismax is the "default handler" ... it > says that using qt=dismax is the same as using qt=standard with the " > query parser" set to be the DisMaxQueryParser (using defType=dismax) > > > so doing this replacement on any URL... > > qt=dismax => qt=standard&defTYpe=dismax > > ...should produce identical results. > > : Secondly, I've patched solr with > : http://issues.apache.org/jira/browse/SOLR-629 as I would like to > have > : fuzzy with dismax. I built it with "ant example". Now, behavior is > : still the same, no fuzzy search with dismax (using the qt=dismax > : parameter in GET URL). > > questions/discussion of uncommitted patches is best done in the Jira > issue wherey ou found the patch ... that way it helps other people > evaluate the patch, and the author of the patch is more likelye to see > your feedback. > > > -Hoss > > > > Please help Logica to respect the environment by not printing this email / > Pour contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen > Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente nao imprimindo este correio electronico. > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. > > > Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Sort a Multivalue field
Hallo Friends, I have a Problem... my Search engins Server runs since a lot of weeks... Now i gett new XML, and one of the fields ar Multivalue,, Ok, i change the Schema.xml, set it to Multivalue and it works :-) no Error by the Indexing.. Now i go to the Gui, and will sort this Field, and BAM, i cant sort. "it is impossible to sort a Tokenized field" Than i think, ok, i doo it in a CopyField and sort the CopyField.. and voila, i dont get an error, but hie dosent sort realy, i get an output, but no change by "desc" ore "asc" What can i do to sort this Field.. i thinkt, when i soert this field (only Numbers) the file comes multible in the output, like this... xml: field aaa>1122 field aaa>2211 field aaa>3322 sort field aaa *1122* 1134 1145 *2211* 2233 3355 3311 3312 *3322* ... ... ... i hope you have a idea, i am at the end with my ideas KingArtus
Catchall field and facet search
Hi Solr users, This is my first post on this list, so nice to meet you. I need to do something with solr, but I have no idea how to achieve this. Let me describe my problem. I'm building an address search engine. In my Solr schema, I've got many fields like «country», «state», «town», «street». I want my users to search an address by location, so I've set up a catchall field containing a copy of all the other fields. This is my default search field. I want to propose a dynamic facet search : if a user searches for the term «USA», the used facet.field will be «state», but if he searches for «Chicago», facet.field will be «street». If a user is searching for an address in Chicago, it would be stupid to propose a facet search on the «country» field, would'nt it? However, how can I know which field is matched ? If the user search «France», how can I know if this is a country or a town ? Is anybody has an idea? Best regards, Thibault.
query too long / has-many relation
Hi all, I am pretty fresh to solr and I have encountered a problem. In short: Is there a way to configure Solr to accept POST queries (instead of GET only?). Or: is there some other way to make Solr accept queries longer than 2,000 characters? (Up to 10,000 would be nice) Longer version: I have a Solr 1.3 index (served by Tomcat) of People, containing id, name, address, description etc. This works fine. Now I want to store and retrieve Events (time location, person), so each person has 0 or more events. As I understood it, there is no way to model a has-many relation in Solr (at least not between two structures with more than 1 properties), so I decided to store the Events in a separate mysql table. An example of a query I would like to do is: give me all people that will have an Event on location x coming month, that have in their description. I do this in two steps now: first I query the mysql table, then I build a solr query, with a big OR of all the ids. The problem is that this can generate long (too long) querystrings. Thanks in advance, Cain Jones
Re: query too long / has-many relation
> Is there a way to configure Solr to accept POST queries (instead of GET > only?). > Or: is there some other way to make Solr accept queries longer than 2,000 > characters? (Up to 10,000 would be nice) Solr accepts POST queries by default. I switched to POST for exactly the same reason. I use Solr 1.4 ( trunk version ) though. > I have a Solr 1.3 index (served by Tomcat) of People, containing id, name, > address, description etc. This works fine. > Now I want to store and retrieve Events (time location, person), so each > person has 0 or more events. > As I understood it, there is no way to model a has-many relation in Solr (at > least not between two structures with more than 1 properties), so I decided > to store the Events in a separate mysql table. > An example of a query I would like to do is: give me all people that will > have an Event on location x coming month, that have in their > description. > I do this in two steps now: first I query the mysql table, then I build a > solr query, with a big OR of all the ids. > The problem is that this can generate long (too long) querystrings. Another option would be to put all your event objects (time, location, person_id, description) into Solr index ( normalization ) Then you can generate Solr query "give me all events on location x coming month that have smth in their description" and asks Solr to return facets values for field person_id. Solr will return all distinct values of field "person_id" that matches the query with count values. Then you can take list of related person_ids and load all persons from MySQL database using SQL "in IN ()" clause.
Re: TermsComponent
Hi, I have a requirement on Autocompletion search , iam using solr 1.4. Could you please tell me how you worked on that Terms component using solr 1.4, i could'nt find terms component in solr 1.4 which i have downloaded,is there anyother configuration should be done. Do you have code for autocompletion, please share wih me.. Regards Rekha tbenge wrote: > > Hi, > > I was looking at TermsComponent in Solr 1.4 as a way of building a > autocomplete function. I have a prototype working but noticed that terms > that have whitespace in them when indexed are absent the whitespace when > returned from the TermsComponent. > > Any ideas on why that may be happening? Am I just missing a configuration > option? > > Thanks, > > Todd > > -- View this message in context: http://www.nabble.com/TermsComponent-tp25302503p25362829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query too long / has-many relation
>> Is there a way to configure Solr to accept POST queries (instead of GET >> only?). >> Or: is there some other way to make Solr accept queries longer than 2,000 >> characters? (Up to 10,000 would be nice) > Solr accepts POST queries by default. I switched to POST for exactly > the same reason. I use Solr 1.4 ( trunk version ) though. Don't forget to increase maxBooleanClauses in solrconfig.xml http://wiki.apache.org/solr/SolrConfigXml#head-69ecb985108d73a2f659f2387d916064a2cf63d1
Re: Very Urjent
On Wed, Sep 9, 2009 at 2:15 PM, dharhsana wrote: > > Hi Shalin Shekhar Mangar, > > I got some come from this site > > http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ > > When i use that code in my project ,then only i came to know that there is > no Termscomponent jar or plugin .. > > TermsComponent exists in Solr 1.4. I am guessing that you are using 1.3 If you go to Solr's info page (through the admin dashboard), it will tell you the version you are using. > There is any other way for doing autocompletion search with out terms > component. > > If so please tell me how to implement it. > > I already gave you a link which describes an alternative way. Have a look. -- Regards, Shalin Shekhar Mangar.
Re: Catchall field and facet search
Hi, This is a bit tricky but I think you can achieve it as follows: 1. have a field called "location_facet" which holds the logical path of the location for each address (e.g. /Eurpoe/England/London) 2. have another multi valued filed "location_search" that holds all the locations - your "catchall" field. 3. When the user searches for "England", perform the search on the "location_search" field. 4. Always facet on the "location_facet" field 5. When you get the response, drop the most common prefix from all the facet values, so for example if you search on "England": returned facets: /Europe/England/London..5 /Europe/England/Manchester6 /Europe/England/Liverpool...3 after dropping the common prefix (which is /Europe/England): London5 Manchester.6 Liverpool3 note that theoretically (and perhaps even realistically) you might also have multiple prefixes (for example, in the US you can definitely have several cities with the same name in different states), in which case you'd probably want to group these results by the prefix: (for the sake of the argument, let's assume there's an "England" state in the US :-)) /Europe/England London5 Manchester..6 Liverpool.3 /North America/USA/England AnotherCity..10 On the client side, when the user clicks on one of the facet values, you should use value path as a wildcard filter on the "location_facet" field. For example, if the user click on London (the city in England), the you should add the following filter: location_facet:/Europe/England/London/* this is a bit of manual work to do on the results, but I think it should work, but maybe someone has a better idea on how to do it in a cleaner way. cheers, Uri thibault jouannic wrote: Hi Solr users, This is my first post on this list, so nice to meet you. I need to do something with solr, but I have no idea how to achieve this. Let me describe my problem. I'm building an address search engine. In my Solr schema, I've got many fields like «country», «state», «town», «street». I want my users to search an address by location, so I've set up a catchall field containing a copy of all the other fields. This is my default search field. I want to propose a dynamic facet search : if a user searches for the term «USA», the used facet.field will be «state», but if he searches for «Chicago», facet.field will be «street». If a user is searching for an address in Chicago, it would be stupid to propose a facet search on the «country» field, would'nt it? However, how can I know which field is matched ? If the user search «France», how can I know if this is a country or a town ? Is anybody has an idea? Best regards, Thibault.
Re: TermsComponent
Hi, I tried setting the terms.raw param to true but didn't see any difference. I did a little more digging and it appears the text in the TermEnum is missing the whitespace inside Lucene so I'm not sure if it's because of the way we're indexing the value or not. One thing I noticed is we're indexing with Lucene 2.4 and Solr is using 2.9 rc2 in the nightly build. Any chance that could be causing the problem? Thanks, Todd On Sat, Sep 5, 2009 at 11:50 AM, Todd Benge wrote: > Thanks - I'll give it a try > > On 9/5/09, Yonik Seeley wrote: > > On Fri, Sep 4, 2009 at 5:46 PM, Todd Benge wrote: > >> I was looking at TermsComponent in Solr 1.4 as a way of building a > >> autocomplete function. I have a prototype working but noticed that > terms > >> that have whitespace in them when indexed are absent the whitespace when > >> returned from the TermsComponent. > > > > It works for me with the example data: > > http://localhost:8983/solr/terms?terms.fl=manu_exact > > > > -Yonik > > http://www.lucidimagination.com > > > > -- > Sent from my mobile device >
Re: TermsComponent
Hi Rekha, Here's teh link to the TermsComponent info: http://wiki.apache.org/solr/TermsComponent and another link Matt Weber did on autocompletion: http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ We had to upgrade to the latest nightly to get the TermsComponent to work. Good Luck! Todd On Wed, Sep 9, 2009 at 5:17 AM, dharhsana wrote: > > Hi, > > I have a requirement on Autocompletion search , iam using solr 1.4. > > Could you please tell me how you worked on that Terms component using solr > 1.4, > i could'nt find terms component in solr 1.4 which i have downloaded,is > there > anyother configuration should be done. > > Do you have code for autocompletion, please share wih me.. > > Regards > Rekha > > > > tbenge wrote: > > > > Hi, > > > > I was looking at TermsComponent in Solr 1.4 as a way of building a > > autocomplete function. I have a prototype working but noticed that terms > > that have whitespace in them when indexed are absent the whitespace when > > returned from the TermsComponent. > > > > Any ideas on why that may be happening? Am I just missing a > configuration > > option? > > > > Thanks, > > > > Todd > > > > > > -- > View this message in context: > http://www.nabble.com/TermsComponent-tp25302503p25362829.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
slow response
Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine
Re: query too long / has-many relation
I had some trouble with maxBooleanClauses -- I have to set it twice the size I would expect. But apart from that everything works fine now (10,000 OR clauses takes 10 seconds). Thank you Alexey. On Wed, Sep 9, 2009 at 1:19 PM, Alexey Serba wrote: > >> Is there a way to configure Solr to accept POST queries (instead of GET > >> only?). > >> Or: is there some other way to make Solr accept queries longer than > 2,000 > >> characters? (Up to 10,000 would be nice) > > Solr accepts POST queries by default. I switched to POST for exactly > > the same reason. I use Solr 1.4 ( trunk version ) though. > Don't forget to increase maxBooleanClauses in solrconfig.xml > > http://wiki.apache.org/solr/SolrConfigXml#head-69ecb985108d73a2f659f2387d916064a2cf63d1 >
Re: TermsComponent
How are you tokenizing/analyzing the field you are accessing? On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: Hi Rekha, Here's teh link to the TermsComponent info: http://wiki.apache.org/solr/TermsComponent and another link Matt Weber did on autocompletion: http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ We had to upgrade to the latest nightly to get the TermsComponent to work. Good Luck! Todd On Wed, Sep 9, 2009 at 5:17 AM, dharhsana wrote: Hi, I have a requirement on Autocompletion search , iam using solr 1.4. Could you please tell me how you worked on that Terms component using solr 1.4, i could'nt find terms component in solr 1.4 which i have downloaded,is there anyother configuration should be done. Do you have code for autocompletion, please share wih me.. Regards Rekha tbenge wrote: Hi, I was looking at TermsComponent in Solr 1.4 as a way of building a autocomplete function. I have a prototype working but noticed that terms that have whitespace in them when indexed are absent the whitespace when returned from the TermsComponent. Any ideas on why that may be happening? Am I just missing a configuration option? Thanks, Todd -- View this message in context: http://www.nabble.com/TermsComponent-tp25302503p25362829.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add &debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: abortOnConfigurationError=false not taking effect in solr 1.3
Yes, that runtime error occurred due to incorrect configuration. So, such runtime errors in one core will affect all the cores? Is there any way to avoid affecting all other cores which are fine? Shalin Shekhar Mangar wrote: > > On Mon, Sep 7, 2009 at 8:58 PM, djain101 wrote: > >> >> >> Please suggest what is the right way to configure so that if one core >> fails >> due to configuration errors, all other cores remain unaffected? >> >> * >> Check your log files for more detailed information on what may be wrong. >> >> If you want solr to continue after configuration errors, change: >> >> false >> >> in solr.xml >> >> java.lang.RuntimeException: java.io.IOException: Cannot create directory: >> /usr/local/app/data/search/core09/index >> > > That error suggests that you don't have a configuration error. The data > directory you have given either does not exist or is read-only. It is a > runtime error. > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/abortOnConfigurationError%3Dfalse-not-taking-effect-in-solr-1.3-tp25332254p25365945.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow response
There is a good article on how to scale the Lucene/Solr solution: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr Also, if you have heavy load on the server (large amount of concurrent requests) then I'd suggest to consider loading the index into RAM. It worked well for me on the project with 140+ million documents and 30 concurrent user requests per second. If your index can be placed in RAM you can reduce the architecture complexity. Alex Baranov On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li wrote: > Hi, > > I have 20 million docs on solr. If my query would return more than > 10,000 results, the response time will be very very long. How to > resolve such problem? Can I slice my docs into pieces and let the > query operate within one piece at a time so the response time and > response data will be more managable? Thanks. > > Elaine >
Re: slow response
Just wondering, is there an easy way to load the whole index into ram? On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov wrote: > There is a good article on how to scale the Lucene/Solr solution: > > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr > > Also, if you have heavy load on the server (large amount of concurrent > requests) then I'd suggest to consider loading the index into RAM. It > worked > well for me on the project with 140+ million documents and 30 concurrent > user requests per second. If your index can be placed in RAM you can reduce > the architecture complexity. > > Alex Baranov > > On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li > wrote: > > > Hi, > > > > I have 20 million docs on solr. If my query would return more than > > 10,000 results, the response time will be very very long. How to > > resolve such problem? Can I slice my docs into pieces and let the > > query operate within one piece at a time so the response time and > > response data will be more managable? Thanks. > > > > Elaine > > >
Re: slow response
I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersoll wrote: > Do you need 10K results at a time or are you just getting the top 10 or so > in a set of 10K? Also, are you retrieving really large stored fields? If > you add &debugQuery=true to your request, Solr will return timing > information for the various components. > > > On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: > >> Hi, >> >> I have 20 million docs on solr. If my query would return more than >> 10,000 results, the response time will be very very long. How to >> resolve such problem? Can I slice my docs into pieces and let the >> query operate within one piece at a time so the response time and >> response data will be more managable? Thanks. >> >> Elaine > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: slow response
Please, take a look at http://issues.apache.org/jira/browse/SOLR-1379 Alex. On Wed, Sep 9, 2009 at 5:28 PM, Constantijn Visinescu wrote: > Just wondering, is there an easy way to load the whole index into ram? > > On Wed, Sep 9, 2009 at 4:22 PM, Alex Baranov >wrote: > > > There is a good article on how to scale the Lucene/Solr solution: > > > > > > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr > > > > Also, if you have heavy load on the server (large amount of concurrent > > requests) then I'd suggest to consider loading the index into RAM. It > > worked > > well for me on the project with 140+ million documents and 30 concurrent > > user requests per second. If your index can be placed in RAM you can > reduce > > the architecture complexity. > > > > Alex Baranov > > > > On Wed, Sep 9, 2009 at 5:10 PM, Elaine Li > > wrote: > > > > > Hi, > > > > > > I have 20 million docs on solr. If my query would return more than > > > 10,000 results, the response time will be very very long. How to > > > resolve such problem? Can I slice my docs into pieces and let the > > > query operate within one piece at a time so the response time and > > > response data will be more managable? Thanks. > > > > > > Elaine > > > > > >
Re: Creating facet query using SolrJ
> > When constructing query, I create a lucene query and use query.toString to > create SolrQuery. > Go this thread - http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query I am facing difficulty while creating facet query for individual field, as I > could not find an easy and clean way of constructing facet query with > parameters specified at field level. > Per field overrides for facet params using SolrJ is not supported yet. However, you can always use solrQuery.set("f.myField.facet.limit",10) ... to pass field specific facet params to the SolrServer. Cheers Avlesh On Wed, Sep 9, 2009 at 2:42 PM, Aakash Dharmadhikari wrote: > hello, > > I am using SolrJ to access solr indexes. When constructing query, I create > a lucene query and use query.toString to create SolrQuery. > > I am facing difficulty while creating facet query for individual field, as > I could not find an easy and clean way of constructing facet query with > parameters specified at field level. > > As I understand, the faceting parameters like limit, sort order etc. can > be set on SolrQuery object but they are used for all the facets in query. I > would like to provide these parameters separately for each field. I am > currently building such query in Java code using string append. But it > looks > really bad, and would be prone to breaking when query syntax changes in > future. > > If there any better way of constructing such detailed facet queries, the > way we build the main solr search query? > > regards, > aakash >
Re: slow response
Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, "slice my docs into pieces" I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersoll wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add &debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: query too long / has-many relation
> But apart from that everything works fine now (10,000 OR clauses takes 10 > seconds). Not fast. I would recommend to denormalize your data, put everything into Solr index and use Solr faceting http://wiki.apache.org/solr/SolrFacetingOverview to get relevant persons ( see my previous message )
Re: TermsComponent
It's set as Field.Store.YES, Field.Index.ANALYZED. On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll wrote: > How are you tokenizing/analyzing the field you are accessing? > > > On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: > > Hi Rekha, >> >> Here's teh link to the TermsComponent info: >> >> http://wiki.apache.org/solr/TermsComponent >> >> and another link Matt Weber did on autocompletion: >> >> >> http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ >> >> We had to upgrade to the latest nightly to get the TermsComponent to work. >> >> Good Luck! >> >> Todd >> >> On Wed, Sep 9, 2009 at 5:17 AM, dharhsana >> wrote: >> >> >>> Hi, >>> >>> I have a requirement on Autocompletion search , iam using solr 1.4. >>> >>> Could you please tell me how you worked on that Terms component using >>> solr >>> 1.4, >>> i could'nt find terms component in solr 1.4 which i have downloaded,is >>> there >>> anyother configuration should be done. >>> >>> Do you have code for autocompletion, please share wih me.. >>> >>> Regards >>> Rekha >>> >>> >>> >>> tbenge wrote: >>> Hi, I was looking at TermsComponent in Solr 1.4 as a way of building a autocomplete function. I have a prototype working but noticed that terms that have whitespace in them when indexed are absent the whitespace when returned from the TermsComponent. Any ideas on why that may be happening? Am I just missing a >>> configuration >>> option? Thanks, Todd >>> -- >>> View this message in context: >>> http://www.nabble.com/TermsComponent-tp25302503p25362829.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>> > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: slow response
gwk, Sorry for confusion. I am doing simple phrase search among the sentences which could be in english or other language. Each doc has only several id numbers and the sentence itself. I did not know about paging. Sounds like it is what I need. How to achieve paging from solr? I also need to store all the results into my own tables in javascript to use for connecting with other applications. Elaine On Wed, Sep 9, 2009 at 10:37 AM, gwk wrote: > Hi Elaine, > > I think you need to provide us with some more information on what exactly > you are trying to achieve. From your question I also assumed you wanted > paging (getting the first 10 results, than the next 10 etc.) But reading it > again, "slice my docs into pieces" I now think you might've meant that you > only want to retrieve certain fields from each document. For that you can > use the fl parameter > (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). > Hope this helps. > > Regards, > > gwk > > Elaine Li wrote: >> >> I want to get the 10K results, not just the top 10. >> The fields are regular language sentences, they are not large. >> >> Is clustering the technique for what I am doing? >> >> On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersoll >> wrote: >> >>> >>> Do you need 10K results at a time or are you just getting the top 10 or >>> so >>> in a set of 10K? Also, are you retrieving really large stored fields? >>> If >>> you add &debugQuery=true to your request, Solr will return timing >>> information for the various components. >>> >>> >>> On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: >>> >>> Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> > >
Re: slow response
Hi Elaine, You can page your resultset with the rows and start parameters (http://wiki.apache.org/solr/CommonQueryParameters). So for example to get the first 100 results one would use the parameters rows=100&start=0 and the second 100 results with rows=100&start=100 etc. etc. Regards, gwk Elaine Li wrote: gwk, Sorry for confusion. I am doing simple phrase search among the sentences which could be in english or other language. Each doc has only several id numbers and the sentence itself. I did not know about paging. Sounds like it is what I need. How to achieve paging from solr? I also need to store all the results into my own tables in javascript to use for connecting with other applications. Elaine On Wed, Sep 9, 2009 at 10:37 AM, gwk wrote: Hi Elaine, I think you need to provide us with some more information on what exactly you are trying to achieve. From your question I also assumed you wanted paging (getting the first 10 results, than the next 10 etc.) But reading it again, "slice my docs into pieces" I now think you might've meant that you only want to retrieve certain fields from each document. For that you can use the fl parameter (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). Hope this helps. Regards, gwk Elaine Li wrote: I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersoll wrote: Do you need 10K results at a time or are you just getting the top 10 or so in a set of 10K? Also, are you retrieving really large stored fields? If you add &debugQuery=true to your request, Solr will return timing information for the various components. On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: Hi, I have 20 million docs on solr. If my query would return more than 10,000 results, the response time will be very very long. How to resolve such problem? Can I slice my docs into pieces and let the query operate within one piece at a time so the response time and response data will be more managable? Thanks. Elaine -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: slow response
gwk, thanks a lot. Elaine On Wed, Sep 9, 2009 at 11:14 AM, gwk wrote: > Hi Elaine, > > You can page your resultset with the rows and start parameters > (http://wiki.apache.org/solr/CommonQueryParameters). So for example to get > the first 100 results one would use the parameters rows=100&start=0 and the > second 100 results with rows=100&start=100 etc. etc. > > Regards, > > gwk > > Elaine Li wrote: >> >> gwk, >> >> Sorry for confusion. I am doing simple phrase search among the >> sentences which could be in english or other language. Each doc has >> only several id numbers and the sentence itself. >> >> I did not know about paging. Sounds like it is what I need. How to >> achieve paging from solr? >> >> I also need to store all the results into my own tables in javascript >> to use for connecting with other applications. >> >> Elaine >> >> On Wed, Sep 9, 2009 at 10:37 AM, gwk wrote: >> >>> >>> Hi Elaine, >>> >>> I think you need to provide us with some more information on what exactly >>> you are trying to achieve. From your question I also assumed you wanted >>> paging (getting the first 10 results, than the next 10 etc.) But reading >>> it >>> again, "slice my docs into pieces" I now think you might've meant that >>> you >>> only want to retrieve certain fields from each document. For that you can >>> use the fl parameter >>> >>> (http://wiki.apache.org/solr/CommonQueryParameters#head-db2785986af2355759faaaca53dc8fd0b012d1ab). >>> Hope this helps. >>> >>> Regards, >>> >>> gwk >>> >>> Elaine Li wrote: >>> I want to get the 10K results, not just the top 10. The fields are regular language sentences, they are not large. Is clustering the technique for what I am doing? On Wed, Sep 9, 2009 at 10:16 AM, Grant Ingersoll wrote: > > Do you need 10K results at a time or are you just getting the top 10 or > so > in a set of 10K? Also, are you retrieving really large stored fields? > If > you add &debugQuery=true to your request, Solr will return timing > information for the various components. > > > On Sep 9, 2009, at 10:10 AM, Elaine Li wrote: > > > >> >> Hi, >> >> I have 20 million docs on solr. If my query would return more than >> 10,000 results, the response time will be very very long. How to >> resolve such problem? Can I slice my docs into pieces and let the >> query operate within one piece at a time so the response time and >> response data will be more managable? Thanks. >> >> Elaine >> >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using > Solr/Lucene: > http://www.lucidimagination.com/search > > > > >>> >>> > >
Solr fitting in travel site context?
Hi all, I'm about to develop a travel website and am wondering if Solr might fit to be used as the search solution. Being quite the opposite of a db guru and new to Solr, it's hard for me to judge if for my use-case a relational db should be used in favor of Solr(or similar indexing server). Maybe some of you guys would share their opinion on this? The products being searched for would be travel packages. That is: hotel room + flight combined into one product. I receive the products via a csv file, where each line defines a travel package with concrete departure/return, accommodation and price data. For example one csv row might represent: Hotel Foo in Paris, flight departing 10/10/09 from London, ending 10/20/09, mealplan Bar, pricing $300 ..while another one might look like: Hotel Foo in Paris, flight departing 10/10/09 from Amsterdam, ending 10/30/09, mealplan Eggs :), pricing $400 Now searches should show results in 2 steps: first step showing results grouped by hotel(so no hotel appears twice) and second one all date-airport-mealplan combinations for the hotel selected by the user in step 1. >From some first little tests, it seems to me as if I at least would need the collapse patch(SOLR-236) to be used in step 1 above?! What do you think? Does Solr fit into this scenario? Thoughts? Sorry for the lengthy post & thanks a lot for any pointer! Carsten
Re: Highlighting... is highlighting too many fields
Thanks Ahmet, Your second suggestion about using the filter query works. Ideally I would like to be able to use the first solution with hl.requireFieldMatch=true, but I cannot seem to get it to work no matter what I do. I changed the query to just 'smith~' and hl.requireFieldMatch=true and I get results but no highlights :( On Tue, Sep 8, 2009 at 12:12 PM, AHMET ARSLAN wrote: > > I currently have highlighting working, but when I search for > > Query: "smith~ category_id:(1 OR 2 OR 3)" > > Results: "name: Mr. John Smith, > > addresses: 1 Main St, NYC, > > NY, 552666" > > > Why does it show highlights on the addresses, when I > > specifically sent in a > > query for category_id? When I set > > hl.requireFieldMatch and > > hl.usePhraseHighlighter to true, I get 0 results > > highlighted. > > Althougth hl.usePhraseHighlighter is about PhraseQuery (and SpanQuery) > hl.requireFieldMatch=true should work for your case. > When you set hl.requireFieldMatch to true, do you get result returned? But > without highlighted? If yes I think your default operator is set to OR. > Those results without highlights are coming from category_id:(1 OR 2 OR 3) > part of your query. > Try "smith~ AND category_id:(1 OR 2 OR 3)" or alternatively you can use > filter queries for structured fields (integer, string) like > q=smith~&fq=category_id:(1 OR 2 OR 3) > > Hope this helps. > > > >
Re: Catchall field and facet search
Hi, Thank you for the answer. Very helpful. Regards, Thibault. On Wed, 09 Sep 2009 13:36:02 +0200 Uri Boness wrote: > Hi, > > This is a bit tricky but I think you can achieve it as follows: > > 1. have a field called "location_facet" which holds the logical path of > the location for each address (e.g. /Eurpoe/England/London) > 2. have another multi valued filed "location_search" that holds all the > locations - your "catchall" field. > 3. When the user searches for "England", perform the search on the > "location_search" field. > 4. Always facet on the "location_facet" field > 5. When you get the response, drop the most common prefix from all the > facet values, so for example if you search on "England": > > returned facets: > > /Europe/England/London..5 > /Europe/England/Manchester6 > /Europe/England/Liverpool...3 > > after dropping the common prefix (which is /Europe/England): > > London5 > Manchester.6 > Liverpool3 > > note that theoretically (and perhaps even realistically) you might also > have multiple prefixes (for example, in the US you can definitely have > several cities with the same name in different states), in which case > you'd probably want to group these results by the prefix: > > (for the sake of the argument, let's assume there's an "England" state > in the US :-)) > > /Europe/England > London5 > Manchester..6 > Liverpool.3 > > /North America/USA/England > AnotherCity..10 > > On the client side, when the user clicks on one of the facet values, you > should use value path as a wildcard filter on the "location_facet" > field. For example, if the user click on London (the city in England), > the you should add the following filter: > > location_facet:/Europe/England/London/* > > this is a bit of manual work to do on the results, but I think it should > work, but maybe someone has a better idea on how to do it in a cleaner way. > > cheers, > Uri > > thibault jouannic wrote: > > Hi Solr users, > > > > This is my first post on this list, so nice to meet you. > > > > I need to do something with solr, but I have no idea how to achieve this. > > Let me describe my problem. > > > > I'm building an address search engine. In my Solr schema, I've got many > > fields like «country», «state», «town», «street». > > > > I want my users to search an address by location, so I've set up a catchall > > field containing a copy of all the other fields. This is my default search > > field. > > > > I want to propose a dynamic facet search : if a user searches for the term > > «USA», the used facet.field will be «state», but if he searches for > > «Chicago», facet.field will be «street». If a user is searching for an > > address in Chicago, it would be stupid to propose a facet search on the > > «country» field, would'nt it? > > > > However, how can I know which field is matched ? If the user search > > «France», how can I know if this is a country or a town ? > > > > Is anybody has an idea? > > > > Best regards, > > Thibault. > > > > > >
Re: Sort a Multivalue field
Unfortunately you can't sort on a multi-valued field. In order to sort on a field it must be indexed but not multi-valued. Have a look at the FieldOptions wiki page for a good description of what values to set for different use cases: http://wiki.apache.org/solr/FieldOptionsByUseCase -Jay www.lucidimagination.com On Wed, Sep 9, 2009 at 2:37 AM, Jörg Agatz wrote: > Hallo Friends, > > I have a Problem... > > my Search engins Server runs since a lot of weeks... > Now i gett new XML, and one of the fields ar Multivalue,, > > Ok, i change the Schema.xml, set it to Multivalue and it works :-) no Error > by the Indexing.. Now i go to the Gui, and will sort this Field, and BAM, i > cant sort. > > "it is impossible to sort a Tokenized field" > > Than i think, ok, i doo it in a CopyField and sort the CopyField.. > and voila, i dont get an error, but hie dosent sort realy, i get an output, > but no change by "desc" ore "asc" > > What can i do to sort this Field.. i thinkt, when i soert this field (only > Numbers) the file comes multible in the output, like this... > > > xml: > field aaa>1122 > field aaa>2211 > field aaa>3322 > > sort field aaa > > *1122* > 1134 > 1145 > *2211* > 2233 > 3355 > 3311 > 3312 > *3322* > ... > ... > ... > > i hope you have a idea, i am at the end with my ideas > > KingArtus >
multicore and ruby
Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
Re: multicore and ruby
Paul I've been working with rsolr in a Rails app. In terms of querying from multiple indices/cores within a multicore setup of Solr, I'm managing it all on the Rails side, aggregating results from mutliple cores. In terms of core administration, I've been doing that all by hand as well. Greg From: Paul Rosen To: solr-user@lucene.apache.org Sent: Wednesday, September 9, 2009 12:38:56 PM Subject: multicore and ruby Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
Re: multicore and ruby
Hey Paul, In rsolr, you could use the #request method to set a request handler path: solr.request('/core1/select', :q=>'*:*') Alternatively, (rsolr and solr-ruby) you could probably handle this by creating a new instance of a connection object per-core, and then have some kind of factory to return connection objects by a core-name? What kinds of things were you hoping to find when looking for multicore support in either solr-ruby or rsolr? Matt On Wed, Sep 9, 2009 at 12:38 PM, Paul Rosen wrote: > Hi all, > > I'd like to start experimenting with multicore in a ruby on rails app. > > Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with > solr and it doesn't appear to have direct support for multicore and I didn't > have any luck googling around for it. > > We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked > at rsolr very briefly and didn't see any reference to multicore there, > either. > > I can certainly hack something together, but it seems like this is a common > problem. > > How are others doing multicore from ruby? > > Thanks, > Paul >
Re: Highlighting... is highlighting too many fields
--- On Wed, 9/9/09, John Eberly wrote: > From: John Eberly > Subject: Re: Highlighting... is highlighting too many fields > To: solr-user@lucene.apache.org > Date: Wednesday, September 9, 2009, 7:12 PM > Thanks Ahmet, > > Your second suggestion about using the filter query > works. Ideally I would > like to be able to use the first solution with > hl.requireFieldMatch=true, > but I cannot seem to get it to work no matter what I do. > > I changed the query to just 'smith~' and > hl.requireFieldMatch=true and I get > results but no highlights :( What is your defaultSearchField defined in schema.xml? On what field are you highlighting? hl.fl=? If query just 'smith~' and hl.requireFieldMatch=true aren't returning highlights it seems that your default search field and hl.fl are different. you can try ?q=sameField:smith~&hl.requireFieldMatch=true&hl.fl=sameField it should return highlights if the sameField is stored="true".
Can't delete with a fq?
I'm trying to delete using SolJ's "deleteByQuery", but it doesn't like it that I've added an "fq" parameter. Here's what I see in the logs: Sep 9, 2009 1:46:13 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.queryParser.ParseException: Cannot parse 'url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma': Encountered ":" at line 1, column 46. Was expecting one of: ... ... ... "+" ... "-" ... "(" ... "*" ... "^" ... ... ... ... ... ... "[" ... "{" ... ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:75) at org.apache.solr.search.QueryParsing.parseQuery(QueryParsing.java:64) ... Should I rewrite that query to be "url:http:... AND category:pharma"? -- http://www.linkedin.com/in/paultomblin
Re: Can't delete with a fq?
--- On Wed, 9/9/09, Paul Tomblin wrote: > From: Paul Tomblin > Subject: Can't delete with a fq? > To: solr-user@lucene.apache.org > Date: Wednesday, September 9, 2009, 8:51 PM > I'm trying to delete using SolJ's > "deleteByQuery", but it doesn't like > it that I've added an "fq" parameter. Here's what I > see in the logs: > > Sep 9, 2009 1:46:13 PM org.apache.solr.common.SolrException > log > SEVERE: org.apache.lucene.queryParser.ParseException: > Cannot parse > 'url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma': > Should I rewrite that query to be "url:http:... AND > category:pharma"? Yes, because url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma is not a valid query. > > > -- > http://www.linkedin.com/in/paultomblin >
Re: TermsComponent
And what Analyzer are you using? I'm guessing that your words are being split up during analysis, which is why you aren't seeing whitespace. If you want to keep the whitespace, you will need to use the String field type or possibly the Keyword Analyzer. -Grant On Sep 9, 2009, at 11:06 AM, Todd Benge wrote: It's set as Field.Store.YES, Field.Index.ANALYZED. On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll wrote: How are you tokenizing/analyzing the field you are accessing? On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: Hi Rekha, Here's teh link to the TermsComponent info: http://wiki.apache.org/solr/TermsComponent and another link Matt Weber did on autocompletion: http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ We had to upgrade to the latest nightly to get the TermsComponent to work. Good Luck! Todd On Wed, Sep 9, 2009 at 5:17 AM, dharhsana wrote: Hi, I have a requirement on Autocompletion search , iam using solr 1.4. Could you please tell me how you worked on that Terms component using solr 1.4, i could'nt find terms component in solr 1.4 which i have downloaded,is there anyother configuration should be done. Do you have code for autocompletion, please share wih me.. Regards Rekha tbenge wrote: Hi, I was looking at TermsComponent in Solr 1.4 as a way of building a autocomplete function. I have a prototype working but noticed that terms that have whitespace in them when indexed are absent the whitespace when returned from the TermsComponent. Any ideas on why that may be happening? Am I just missing a configuration option? Thanks, Todd -- View this message in context: http://www.nabble.com/TermsComponent-tp25302503p25362829.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Can't delete with a fq?
On Wed, Sep 9, 2009 at 2:07 PM, AHMET ARSLAN wrote: > --- On Wed, 9/9/09, Paul Tomblin wrote: >> SEVERE: org.apache.lucene.queryParser.ParseException: >> Cannot parse >> 'url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma': > >> Should I rewrite that query to be "url:http:... AND >> category:pharma"? > Yes, because url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma is not a > valid query. >> It works perfectly well as a query: http://localhost:8080/solrChunk/nutch/select/?q=url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma retrieved all the documents I wanted to delete. -- http://www.linkedin.com/in/paultomblin
Re: multicore and ruby
Hi Matt, What kinds of things were you hoping to find when looking for multicore support in either solr-ruby or rsolr? I have a couple of uses for it: 1) Search and merge the results from multiple indexes: http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q= I assume the above would return documents containing in both cores. How are the relevancy scores managed? Would the documents be merged together? (The reason I want two indexes here but in every respect look to the end user as one index is that one index is huge and changes rarely, and the other is small and changes more often, so I'd like to make the commits on that take a reasonable amount of time. Also, it makes managing the large index better, because I don't have to worry about the small index's changes.) 2) Do automated tests during reindexing: After reindexing to core1, I'll query both core0 and core1 separately and compare the results to be sure only what I intended to change was changed. We can even create an interface so that an authorized user can switch indexes in their session to test the changes out in an otherwise completely live environment. 3) There are a few more similar uses that might be coming, but I think the main point is to be able to query on one or the other cores, or both, or possibly a third one in the future. Thanks, Paul
Re: Can't delete with a fq?
> It works perfectly well as a query: > > http://localhost:8080/solrChunk/nutch/select/?q=url:http\:\/\/xcski\.com\/pharma\/&fq=category:pharma > > retrieved all the documents I wanted to delete. > I mean it is not a valid string that QueryParser can parse and return a Lucene Query. Filter query syntax is not supported in QueryParser syntax. deleteByQuery(category:pharma AND url:http...) is the way to go.
OutOfMemory issue after upgrade to 1.3 solr
Our slaves servers is having issue with the following error after we upgraded to Solr 1.3. Any suggestions? Thanks Francis NFO: [] webapp=/solr path=/select/ params={q=(type:artist+AND+alphaArtistSort:"forever+in+terror")} hits=1 status=0 QTime=1 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 14140776, Num elements: 3535189 SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 442984, Num elements: 55371
Pagination with solr json data
Hi, What is the best way to do pagination? I searched around and only found some YUI utilities can do this. But their examples don't have very close match to the pattern I have in mind. I would like to have pretty plain display, something like the search results from google. Thanks. Elaine
about SOLR-1395 integration with katta
Hi, It is really exciting to see this integration coming out. May I ask how I need to make changes to be able to deploy Solr index on katta servers? Are there any tutorials? thanks zhong
Re: multicore and ruby
With solr-ruby, simply put the core name in the URL of the Solr::Connection... solr = Solr::Connection.new('http://localhost:8983/solr/core_name') Erik On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote: Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
help with solr.PatternTokenizerFactory
hello *, im not sure what im doing wrong i have this field defined in schema.xml, using admin/analysis.jsp its working as expected, but when i try to update via csvhandler i get Error 500 org.apache.solr.analysis.PatternTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer java.lang.ClassCastException: org.apache.solr.analysis.PatternTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) ... im using nightly of solr 1.4 thx much, --joe
Re: multicore and ruby
Hi Erik, Yes, I've been doing that in my tests, but I also have the case of wanting to do a search over all the cores using the shards syntax. I was thinking that the following wouldn't work: solr = Solr::Connection.new('http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1') because it has a "?" in it. Erik Hatcher wrote: With solr-ruby, simply put the core name in the URL of the Solr::Connection... solr = Solr::Connection.new('http://localhost:8983/solr/core_name') Erik On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote: Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
solr 1.3 and multicore data directory
Hi All, I'm trying to set up solr 1.3 to use multicore but I'm getting some puzzling results. My solr.xml file is: dataDir="solr/resources/data/" /> dataDir="solr/exhibits/data/" /> dataDir="solr/reindex_resources/data/" /> When I start up solr, everything looks normal until I get this line in the log: INFO: [resources] Opening new SolrCore at solr/resources/, dataDir=./solr/data/ And a new folder is created ./solr/data/index with a blank index. And, of course, any queries go to that blank index and not to one of my cores. Actually, what I'd really like is to have my directory structure look like this (some items removed for brevity): - solr_1.3 lib solr solr.xml bin conf data resources index exhibits index reindex_resources index start.jar - And have all the cores share everything except an index. How would I set that up? Are there differences between 1.3 and 1.4 in this respect? Thanks, Paul
Re: multicore and ruby
The Connection is not for parameters, merely the base URL to the Solr server (or core, which is effectively a Solr "server"). As of solr-ruby 0.0.6, the shards parameter is supported for the Solr::Request::Standard and Dismax request objects, so you'd simply specify :shards=>"" for those queries. Also note that you can specify the shards in solrconfig.xml for the request handler mapping(s) and avoid having to send it from the client (depends on your needs whether that makes sense or not). Erik On Sep 9, 2009, at 10:17 PM, Paul Rosen wrote: Hi Erik, Yes, I've been doing that in my tests, but I also have the case of wanting to do a search over all the cores using the shards syntax. I was thinking that the following wouldn't work: solr = Solr::Connection.new('http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1') because it has a "?" in it. Erik Hatcher wrote: With solr-ruby, simply put the core name in the URL of the Solr::Connection... solr = Solr::Connection.new('http://localhost:8983/solr/core_name') Erik On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote: Hi all, I'd like to start experimenting with multicore in a ruby on rails app. Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with solr and it doesn't appear to have direct support for multicore and I didn't have any luck googling around for it. We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked at rsolr very briefly and didn't see any reference to multicore there, either. I can certainly hack something together, but it seems like this is a common problem. How are others doing multicore from ruby? Thanks, Paul
Re: multicore and ruby
Yep same thing in rsolr and just use the :shards param. It'll return whatever solr returns. Matt On Wed, Sep 9, 2009 at 4:17 PM, Paul Rosen wrote: > Hi Erik, > > Yes, I've been doing that in my tests, but I also have the case of wanting > to do a search over all the cores using the shards syntax. I was thinking > that the following wouldn't work: > > > solr = Solr::Connection.new(' > http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1 > ') > > because it has a "?" in it. > > > Erik Hatcher wrote: > >> With solr-ruby, simply put the core name in the URL of the >> Solr::Connection... >> >> solr = Solr::Connection.new('http://localhost:8983/solr/core_name') >> >>Erik >> >> >> On Sep 9, 2009, at 6:38 PM, Paul Rosen wrote: >> >> Hi all, >>> >>> I'd like to start experimenting with multicore in a ruby on rails app. >>> >>> Right now, the app is using the solr-ruby-rails-0.0.5 to communicate with >>> solr and it doesn't appear to have direct support for multicore and I didn't >>> have any luck googling around for it. >>> >>> We aren't necessarily wedded to using solr-ruby-rails-0.0.5, but I looked >>> at rsolr very briefly and didn't see any reference to multicore there, >>> either. >>> >>> I can certainly hack something together, but it seems like this is a >>> common problem. >>> >>> How are others doing multicore from ruby? >>> >>> Thanks, >>> Paul >>> >> >> >
Nonsensical Solr Relevancy Score
I have done a search on the word ³blue² in our index. The debugQuery shows some extremely strange methods of scoring. Somehow product 1 gets a higher score with only 1 match on the word blue when product 2 gets a lower score with the same field match AND an additional field match. Can someone please help me understand why such an obviously more relevant product is given a lower score. 2.3623571 = (MATCH) sum of: 0.26248413 = (MATCH) max plus 0.5 times others of: 0.26248413 = (MATCH) weight(productNameSearch:blue in 112779), product of: 0.032673787 = queryWeight(productNameSearch:blue), product of: 8.033478 = idf(docFreq=120, numDocs=136731) 0.0040672035 = queryNorm 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779), product of: 1.0 = tf(termFreq(productNameSearch:blue)=1) 8.033478 = idf(docFreq=120, numDocs=136731) 1.0 = fieldNorm(field=productNameSearch, doc=112779) 2.099873 = (MATCH) max plus 0.5 times others of: 2.099873 = (MATCH) weight(productNameSearch:blue^8.0 in 112779), product of: 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of: 8.0 = boost 8.033478 = idf(docFreq=120, numDocs=136731) 0.0040672035 = queryNorm 8.033478 = (MATCH) fieldWeight(productNameSearch:blue in 112779), product of: 1.0 = tf(termFreq(productNameSearch:blue)=1) 8.033478 = idf(docFreq=120, numDocs=136731) 1.0 = fieldNorm(field=productNameSearch, doc=112779) 1.9483687 = (MATCH) sum of: 0.63594794 = (MATCH) max plus 0.5 times others of: 0.16405259 = (MATCH) weight(productNameSearch:blue in 8142), product of: 0.032673787 = queryWeight(productNameSearch:blue), product of: 8.033478 = idf(docFreq=120, numDocs=136731) 0.0040672035 = queryNorm 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142), product of: 1.0 = tf(termFreq(productNameSearch:blue)=1) 8.033478 = idf(docFreq=120, numDocs=136731) 0.625 = fieldNorm(field=productNameSearch, doc=8142) 0.55392164 = (MATCH) weight(color:blue^10.0 in 8142), product of: 0.15009704 = queryWeight(color:blue^10.0), product of: 10.0 = boost 3.6904235 = idf(docFreq=9309, numDocs=136731) 0.0040672035 = queryNorm 3.6904235 = (MATCH) fieldWeight(color:blue in 8142), product of: 1.0 = tf(termFreq(color:blue)=1) 3.6904235 = idf(docFreq=9309, numDocs=136731) 1.0 = fieldNorm(field=color, doc=8142) 1.3124207 = (MATCH) max plus 0.5 times others of: 1.3124207 = (MATCH) weight(productNameSearch:blue^8.0 in 8142), product of: 0.2613903 = queryWeight(productNameSearch:blue^8.0), product of: 8.0 = boost 8.033478 = idf(docFreq=120, numDocs=136731) 0.0040672035 = queryNorm 5.0209236 = (MATCH) fieldWeight(productNameSearch:blue in 8142), product of: 1.0 = tf(termFreq(productNameSearch:blue)=1) 8.033478 = idf(docFreq=120, numDocs=136731) 0.625 = fieldNorm(field=productNameSearch, doc=8142) -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: about SOLR-1395 integration with katta
Hi Zhong, It's a very new patch. I'll update the issue as we start the wiki page. I've been working on indexing in Hadoop in conjunction with Katta, which is different (it sounds) than your use case where you have prebuilt indexes you simply want to distributed using Katta? -J On Wed, Sep 9, 2009 at 12:33 PM, Zhenyu Zhong wrote: > Hi, > > It is really exciting to see this integration coming out. > May I ask how I need to make changes to be able to deploy Solr index on > katta servers? > Are there any tutorials? > > thanks > zhong >
Re: TermsComponent
We're using the StandardAnalyzer but I'm fairly certain that's not the issue. In fact, I there doesn't appear to be any issue with Lucene or Solr. There are many instances of data in which users have removed the whitespace so they have a high frequency which means they bubble to the top of the sort. The result is that a search for a name shows a first and last name without the whitespace. One thing I've noticed is that since TermsComponent is working on a single Term, there doesn't seem to be a way to query against a phrase. The same example as above applies, so if you're querying for name it'd be prefered to get multi-term responses back if a first name matches. Any suggestions? Thanks for all the help. It's much appreciated. Todd On Wed, Sep 9, 2009 at 12:11 PM, Grant Ingersoll wrote: > And what Analyzer are you using? I'm guessing that your words are being > split up during analysis, which is why you aren't seeing whitespace. If you > want to keep the whitespace, you will need to use the String field type or > possibly the Keyword Analyzer. > > -Grant > > > On Sep 9, 2009, at 11:06 AM, Todd Benge wrote: > > It's set as Field.Store.YES, Field.Index.ANALYZED. >> >> >> >> On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll >> wrote: >> >> How are you tokenizing/analyzing the field you are accessing? >>> >>> >>> On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: >>> >>> Hi Rekha, >>> Here's teh link to the TermsComponent info: http://wiki.apache.org/solr/TermsComponent and another link Matt Weber did on autocompletion: http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ We had to upgrade to the latest nightly to get the TermsComponent to work. Good Luck! Todd On Wed, Sep 9, 2009 at 5:17 AM, dharhsana wrote: Hi, > > I have a requirement on Autocompletion search , iam using solr 1.4. > > Could you please tell me how you worked on that Terms component using > solr > 1.4, > i could'nt find terms component in solr 1.4 which i have downloaded,is > there > anyother configuration should be done. > > Do you have code for autocompletion, please share wih me.. > > Regards > Rekha > > > > tbenge wrote: > > >> Hi, >> >> I was looking at TermsComponent in Solr 1.4 as a way of building a >> autocomplete function. I have a prototype working but noticed that >> terms >> that have whitespace in them when indexed are absent the whitespace >> when >> returned from the TermsComponent. >> >> Any ideas on why that may be happening? Am I just missing a >> >> configuration > > option? >> >> Thanks, >> >> Todd >> >> >> >> -- > View this message in context: > http://www.nabble.com/TermsComponent-tp25302503p25362829.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >>> Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >>> > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: help with solr.PatternTokenizerFactory
Hi Joe, I think you come across the issue of: https://issues.apache.org/jira/browse/SOLR-1377 Is your nightly latest? If not, try the latest one. Koji Joe Calderon wrote: hello *, im not sure what im doing wrong i have this field defined in schema.xml, using admin/analysis.jsp its working as expected, but when i try to update via csvhandler i get Error 500 org.apache.solr.analysis.PatternTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer java.lang.ClassCastException: org.apache.solr.analysis.PatternTokenizerFactory$1 cannot be cast to org.apache.lucene.analysis.Tokenizer at org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69) at org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74) ... im using nightly of solr 1.4 thx much, --joe
Solr SVN build problem
Hi , I am building Solr from source. During building it from source I am getting following error. generate-maven-artifacts: [mkdir] Created dir: c:\Downloads\solr_trunk\build\maven [mkdir] Created dir: c:\Downloads\solr_trunk\dist\maven [copy] Copying 1 file to c:\Downloads\solr_trunk\build\maven\c:\Downloads\s olr_trunk\src\maven BUILD FAILED c:\Downloads\solr_trunk\build.xml:741: The following error occurred while execut ing this line: c:\Downloads\solr_trunk\common-build.xml:261: Failed to copy c:\Downloads\solr_t runk\src\maven\solr-parent-pom.xml.template to c:\Downloads\solr_trunk\build\mav en\c:\Downloads\solr_trunk\src\maven\solr-parent-pom.xml.template due to java.io .FileNotFoundException c:\Downloads\solr_trunk\build\maven\c:\Downloads\solr_tru nk\src\maven\solr-parent-pom.xml.template (The filename, directory name, or volu me label syntax is incorrect) Regards, Allahbaksh
Query regarding incremental index replication
Hi , Currently we are using Solr 1.3 and we have the following requirement. As we need to process very high volumes of documents (of the order of 400 GB per day), we are planning to separate indexer(s) and searcher(s), so that there won't be performance hit. Our idea is to have have a set of servers which is used only for indexers for index creation and then every 5 mins or so, the index will be copied to the searchers(set of solr servers only for querying). For this we tried to use the snapshooter,rsysnc etc. But the problem with this approach is, the same index is present on both the indexer and searcher, and hence occupying large FS. What we need is a mechanism, where in the indexer contains only the index for the past 5 mins(last indexing cycle before the snap shooter is run) and the searcher should have the accumulated(total) index i.e every 5 mins, we should be able to move the entire index from indexer to searcher and so on. The above scenario is slightly different from master/slave implementation, as on master we want only the latest(WIP) index and the slave should contain the entire index. Appreciate if anyone can throw some light on how to achieve this. Thanks, sS
Extract info from parent node during data import
Hello, I am using SOLR 1.4 (from nighly build) and its URLDataSource in conjunction with the XPathEntityProcessor. I have successfully imported XML content, but I think I may have found a limitation when it comes to the commonField attribute in the DataImportHandler. Before writing my own parser to read in a whole XML document, I thought I'd post the question here (since I got some great advice last time). The bulk of my content is contained within each tag. However, each item has a parent called and each category has a name which I would like to import. In my forEach loop I specify the /document/category/item as the collection of items I am interested in. Is there anyway to extract an element from underneath a parent node? To be a more more specific (see eg xml below). I would like to index the following: - category: Category 1; id: 1; author: Author 1 - category: Category 1; id: 2; author: Author 2 - category: Category 2; id: 3; author: Author 3 - category: Category 2; id: 4; author: Author 4 Any ideas on how I can get to a parent node from within a child during data import? If it cant be done, what do you suggest would be the best way so I can keep using the DataImportHandler... would XSLT be a good idea to 'flatten out' the structure a bit? Thanks This is what my XML document looks like: Category 1 1 Author 1 2 Author 2 Category 2 3 Author 3 4 Author 4 And this is what my dataConfig looks like: http://localhost:9080/data/20090817070752.xml"; processor="XPathEntityProcessor" forEach="/document/category/item" transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> This is how I have specified my schema id id _ Need a place to rent, buy or share? Let us find your next place for you! http://clk.atdmt.com/NMN/go/157631292/direct/01/
Re: Field Collapsing (was Re: Schema for group/child entity setup)
> > The patch which will be committed soon will add this functionality. Where can I follow the progress of this patch? On Mon, Sep 7, 2009 at 3:38 PM, Uri Boness wrote: > >> Great. Nice site and very similar to my requirements. >> > thanks. > > So, right now, you get all field values by default? >> > Right now, no field values are returned for the collapsed documents. The > patch which will be committed soon will add this functionality. > > > R. Tan wrote: > >> Great. Nice site and very similar to my requirements. >> >> >> >>> There's work on the patch that is being done now which will enable you to >>> ask for specific field values of the collapsed documents using a >>> dedicated >>> request parameter. >>> >>> >> >> >> So, right now, you get all field values by default? >> >> >> On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness wrote: >> >> >> >>> You can check out http://www.ilocal.nl. If you search for a bank in >>> Amsterdam then you'll see that a lot of the results are collapsed. For >>> this >>> we used an older version of this patch (which works on 1.3) but a lot has >>> changed since then. We're currently using this patch on another project, >>> but >>> it's not live yet. >>> >>> >>> Uri >>> >>> R. Tan wrote: >>> >>> >>> Thanks Uri. Your personal suggestion is appreciated and I think I'll follow your advice. We're still early in development and 1.4 would be a good choice. I hope I can get field collapsing to work with my requirements. Do you know any live site using field collapsing already? On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness wrote: > There's work on the patch that is being done now which will enable you > to > ask for specific field values of the collapsed documents using a > dedicated > request parameter. This work is not committed yet to the latest patch, > but > will be very soon. There is of course a drawback to that as well, the > collapsed documents set can be very large (depends on your data of > course) > in which case the returned result which includes the fields values can > be > rather large, which will impact performance, this is why this feature > will > be enabled only if you specify this extra parameter - by default no > field > values will be returned. > > AFAIK, the latest patch should work fine with the latest build. Martijn > (which is the main maintainer of this patch) tries to keep it up to > date > with the latest builds. But I guess the safest way is to work with the > nightly build of the same date as the latest patch (though I would give > it a > try first with the latest build). > > BTW, it's not an official suggestion from the Solr development team, > but > if > you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I > would > go for the later. 1.4 is supposed to be released in the upcoming week > or > two > and it bring loads of bug fixes, enhancements and extra functionality. > But > again, this is my personal suggestion. > > > cheers, > Uri > > R. Tan wrote: > > > > > >> Okay. Thanks for giving an insight on how it works in general. Without >> trying it myself, are the field values for the collapsed ones also >> part >> of >> the results data? >> What is the latest build that is safe to use on a production >> environment? >> I'd probably go for that and use field collapsing. >> >> Thank you very much. >> >> >> On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness wrote: >> >> >> >> >> >> >> >>> The collapsed documents are represented by one "master" document >>> which >>> can >>> be part of the normal search result (the doc list), so pagination >>> just >>> works >>> as expected, meaning taking only the returned documents in account >>> (ignoring >>> the collapsed ones). As for the scoring, the "master" document is >>> actually >>> the document with the highest score in the collapsed group. >>> >>> As for Solr 1.3 compatibility... well... it's very hart to tell. All >>> latest >>> patch are certainly *not* 1.3 compatible (I think they're also >>> depending >>> on >>> some changes in lucene which are not available for solr 1.3). I guess >>> you'll >>> have to try some of the old patches, but I'm not sure about their >>> stability. >>> >>> cheers, >>> Uri >>> >>> >>> R. Tan wrote: >>> >>> >>> >>> >>> >>> >>> Thanks Uri. How does paging and scoring work when using field collapsing? What patch works with 1.3? Is it production ready? R On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness wrote: >>
Re: Extract info from parent node during data import
try this add two xpaths in your forEach forEach="/document/category/item | /document/category/name" and add a field as follows Please try it out and let me know. On Thu, Sep 10, 2009 at 7:30 AM, venn hardy wrote: > > Hello, > > > > I am using SOLR 1.4 (from nighly build) and its URLDataSource in conjunction > with the XPathEntityProcessor. I have successfully imported XML content, but > I think I may have found a limitation when it comes to the commonField > attribute in the DataImportHandler. > > > > Before writing my own parser to read in a whole XML document, I thought I'd > post the question here (since I got some great advice last time). > > > > The bulk of my content is contained within each tag. However, each > item has a parent called and each category has a name which I > would like to import. In my forEach loop I specify the > /document/category/item as the collection of items I am interested in. Is > there anyway to extract an element from underneath a parent node? To be a > more more specific (see eg xml below). I would like to index the following: > > - category: Category 1; id: 1; author: Author 1 > > - category: Category 1; id: 2; author: Author 2 > > - category: Category 2; id: 3; author: Author 3 > > - category: Category 2; id: 4; author: Author 4 > > > > Any ideas on how I can get to a parent node from within a child during data > import? If it cant be done, what do you suggest would be the best way so I > can keep using the DataImportHandler... would XSLT be a good idea to 'flatten > out' the structure a bit? > > > > Thanks > > > > This is what my XML document looks like: > > > > Category 1 > > 1 > Author 1 > > > 2 > Author 2 > > > > Category 2 > > 3 > Author 3 > > > 4 > Author 4 > > > > > > > And this is what my dataConfig looks like: > > > > url="http://localhost:9080/data/20090817070752.xml"; > processor="XPathEntityProcessor" forEach="/document/category/item" > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> > commonField="true" /> > > > > > > > > > This is how I have specified my schema > > /> > > > > > id > id > > > > > > > _ > Need a place to rent, buy or share? Let us find your next place for you! > http://clk.atdmt.com/NMN/go/157631292/direct/01/ -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Creating facet query using SolrJ
thanks avlesh, solrQuery.set("f.myField.facet.limit",10) ... this is how I ended up doing it, and it works perfectly for me. It just didn't look good in all neat Solr API calls :), as my complete query construction logic i regards, aakash 2009/9/9 Avlesh Singh > > > > When constructing query, I create a lucene query and use query.toString > to > > create SolrQuery. > > > Go this thread - > > http://www.lucidimagination.com/search/document/f4d91628ced293bf/lucene_query_to_solr_query > > I am facing difficulty while creating facet query for individual field, as > I > > could not find an easy and clean way of constructing facet query with > > parameters specified at field level. > > > Per field overrides for facet params using SolrJ is not supported yet. > However, you can always use > solrQuery.set("f.myField.facet.limit",10) ... > to pass field specific facet params to the SolrServer. > > Cheers > Avlesh > > On Wed, Sep 9, 2009 at 2:42 PM, Aakash Dharmadhikari >wrote: > > > hello, > > > > I am using SolrJ to access solr indexes. When constructing query, I > create > > a lucene query and use query.toString to create SolrQuery. > > > > I am facing difficulty while creating facet query for individual field, > as > > I could not find an easy and clean way of constructing facet query with > > parameters specified at field level. > > > > As I understand, the faceting parameters like limit, sort order etc. can > > be set on SolrQuery object but they are used for all the facets in query. > I > > would like to provide these parameters separately for each field. I am > > currently building such query in Java code using string append. But it > > looks > > really bad, and would be prone to breaking when query syntax changes in > > future. > > > > If there any better way of constructing such detailed facet queries, the > > way we build the main solr search query? > > > > regards, > > aakash > > >
Re: Field Collapsing (was Re: Schema for group/child entity setup)
I just noticed this and it reminded me of an issue I've had with collapsed faceting with an older version of the patch in Solr 1.3. Would it be possible, if we can get the terms for all the collapsed documents on a field, to then facet each collapsed document on the unique terms it has collectively? What I mean is for example: Doc 1, 2, 3 collapse together on some other field Doc 1 is the "main document" and has the "colors" blue and red Doc 2 has red Doc 3 has green For the purposes of faceting, it would be ideal in our case for faceting on color to count one each for blue, red, and green on this document (the user drills down on this value to yet another collapsed set). Right now, when you facet after collapse you just get blue and red (green is dropped because it collapses out). To the user it makes the counts seem inaccurate, like they're missing something. Instead we facet before collapsing and get an "inflated" value (which ticks 2 for red - but when you drill down, you still only get 1 because Doc 1 and Doc 2 collapse together again). Either way it's not ideal. At the time (many months ago) there was no way to account for this but it sounds like this patch could make it possible, maybe. Thanks! -- Steve On Sep 5, 2009, at 5:57 AM, Uri Boness wrote: There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. This work is not committed yet to the latest patch, but will be very soon. There is of course a drawback to that as well, the collapsed documents set can be very large (depends on your data of course) in which case the returned result which includes the fields values can be rather large, which will impact performance, this is why this feature will be enabled only if you specify this extra parameter - by default no field values will be returned. AFAIK, the latest patch should work fine with the latest build. Martijn (which is the main maintainer of this patch) tries to keep it up to date with the latest builds. But I guess the safest way is to work with the nightly build of the same date as the latest patch (though I would give it a try first with the latest build). BTW, it's not an official suggestion from the Solr development team, but if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would go for the later. 1.4 is supposed to be released in the upcoming week or two and it bring loads of bug fixes, enhancements and extra functionality. But again, this is my personal suggestion. cheers, Uri
Re: solr 1.3 and multicore data directory
the dataDir is a Solr1.4 feature On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen wrote: > Hi All, > > I'm trying to set up solr 1.3 to use multicore but I'm getting some puzzling > results. My solr.xml file is: > > > > > dataDir="solr/resources/data/" /> > /> > dataDir="solr/reindex_resources/data/" /> > > > > When I start up solr, everything looks normal until I get this line in the > log: > > INFO: [resources] Opening new SolrCore at solr/resources/, > dataDir=./solr/data/ > > And a new folder is created ./solr/data/index with a blank index. And, of > course, any queries go to that blank index and not to one of my cores. > > Actually, what I'd really like is to have my directory structure look like > this (some items removed for brevity): > > - > solr_1.3 > lib > solr > solr.xml > bin > conf > data > resources > index > exhibits > index > reindex_resources > index > start.jar > - > > And have all the cores share everything except an index. > > How would I set that up? > > Are there differences between 1.3 and 1.4 in this respect? > > Thanks, > Paul > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: OutOfMemory error on solr 1.3
Just wondering, how much memory are you giving your JVM ? On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin wrote: > > I am having OutOfMemory error on our slaves server, I would like to know if > someone has the same issue and have the solution for this. > > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 441216, Num elements: 55150 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > Exception in thread "[ACTIVE] ExecuteThread: '7' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '8' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '10' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '11' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751552, Num elements: 187884 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096, > Num elements: 2539 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400, > Num elements: 2690 > > deployment service message for request id "-1" from server "AdminServer". > Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object > size: 4368, Num elements: 2174 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 14140768, Num elements: 3535188 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320, > Num elements: 2649 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751560, Num elements: 187885 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400, > Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@fb6bac19:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14140904, Num elements: 3535222 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@536d7b1b:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14140904, Num elements: 3535222 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@6a1ef00a:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751864, Num elements: 187961 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@298f2d9c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5398568, Num elements: 1349637 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object si
Re: about SOLR-1395 integration with katta
Jason, Thanks for the reply. In general, I would like to use katta to handle the management overhead such as single point of failure as well as the distributed index deployment. In the same time, I still want to use nice search features provided by solr. Basically, I would like to try both on the indexing part 1. Using Hadoop to lauch MR jobs to build index. Then deploy the index to katta 2. Using the new patch SOLR-1935 Based on my understandings, it seems to support index building with Hadoop. I assume the index would have all the necessary information such as solr index schema so that I can still use the nice search features provided by solr. On the search part, I would like to try the distributed search on solr-index which is deployed on katta if that is possible. I would be very appreciated if you could share some thoughts with me. thanks zhong On Wed, Sep 9, 2009 at 6:06 PM, Jason Rutherglen wrote: > Hi Zhong, > > It's a very new patch. I'll update the issue as we start the > wiki page. > > I've been working on indexing in Hadoop in conjunction with > Katta, which is different (it sounds) than your use case where > you have prebuilt indexes you simply want to distributed using > Katta? > > -J > > On Wed, Sep 9, 2009 at 12:33 PM, Zhenyu Zhong > wrote: > > Hi, > > > > It is really exciting to see this integration coming out. > > May I ask how I need to make changes to be able to deploy Solr index on > > katta servers? > > Are there any tutorials? > > > > thanks > > zhong > > >
RE: OutOfMemory error on solr 1.3
Xms is 1.5Gb , Xnx is 1.5Gb and Xns is 128Mb. Physical memory is 4Gb. We are running Jrockit version 1.5.0_15 on weblogic 10. ./java -version java version "1.5.0_15" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04) BEA JRockit(R) (build R27.6.0-50_o-100423-1.5.0_15-20080626-2104-linux-x86_64, compiled mode) 4 S root 7532 7487 8 75 0 - 804721 184466 05:10 ? 00:07:18 /opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon -Djavelin.jsp.el.elcache=4096 -Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr Francis -Original Message- From: Constantijn Visinescu [mailto:baeli...@gmail.com] Sent: Wednesday, September 09, 2009 11:35 PM To: solr-user@lucene.apache.org Subject: Re: OutOfMemory error on solr 1.3 Just wondering, how much memory are you giving your JVM ? On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin wrote: > > I am having OutOfMemory error on our slaves server, I would like to know if > someone has the same issue and have the solution for this. > > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 441216, Num elements: 55150 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > Exception in thread "[ACTIVE] ExecuteThread: '7' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '8' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '10' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '11' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751552, Num elements: 187884 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096, > Num elements: 2539 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400, > Num elements: 2690 > > deployment service message for request id "-1" from server "AdminServer". > Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object > size: 4368, Num elements: 2174 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 14140768, Num elements: 3535188 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320, > Num elements: 2649 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751560, Num elements: 187885 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400, > Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apa