RE: Filter by relevance
I have a dismax query where I check for values in 3 fields against documents in the index - a title, a list of keyword tags and then full-text of the document. I usually get lots of results and I can see that the first results are OK - it's giving precedence to titles and tag matches, as my dismax boosts on title and keywords (normal boost and phrase boost). After say 20/30 good results I start to get matches based upon just the full-text, so these are less relevant. I am also facet.couting on my keyword tags (and presenting in the results as a way of filtering) and as you can imagine the counts are high because of the number of overall results. I want to somehow make the facet counts more associated with the higher relevancy results. My options as I see it are - 1) exclude full-text from the dismax altogether 2) configure the dismax normal boost on full-text to zero, but phrase boost to something higher (the aim here is to only really get a hit on the full-text if my search term is foound as a phrase in the full-text) 3) limit my results by relevancy or number of results If I do (3) above will the facet.counts respect the lower number of results - this is the overall aim really. Thank You Jason. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wed 03/11/2010 23:15 To: solr-user@lucene.apache.org Subject: Re: Filter by relevance Be aware, though, that relevance isn't absolute, it's only interesting #within# a query. And it's then normed between 0 and 1. So picking "a certain value" is rarely doing what you think it will. Limiting to the top N docs is usually more reasonable But this may be an XY problem. What is it you're trying to accomplish? Perhaps if you state the problem, some other suggestions may be in the offing Best Erick On Wed, Nov 3, 2010 at 4:48 PM, Jason Brown wrote: > Is it possible to filter my search results by relevance? For example, > anything below a certain value shouldn't be returned? > > I also retrieve facet counts in my search queries, so it would be useful if > the facet counts also respected the filter on the relevance. > > Thank You. > > Jason. > > If you wish to view the St. James's Place email disclaimer, please use the > link below > > http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer > If you wish to view the St. James's Place email disclaimer, please use the link below http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer
mergeFactor questions
Hi all, Having read the SolrPerformanceFactors wiki page [1], I'd still need a couple of clarifications about mergeFactor (I am using version 1.4.1) so if anyone can help it would be nice. - Is mergeFactor a one time configuration setting that is considered only when creating the index for the first time or can it be adjusted later even with some docs inside the index? e.g. I have mF to 10 then I realize I want quicker searches and I set it to 2 so that at the next optimize/commit I will have no more than 2 segments. My understanding is that one can adjust mF over time, is it right? - In a replicated environment does it make sense to define different mergeFactors on master and slave? I'd say no since it influences the number of segments created, that being a concern of who actually index documents (the master) not of who receives (segments of) index, but please correct me if I am wrong. Thanks for your help, Regards, Tommaso [1] : http://wiki.apache.org/solr/SolrPerformanceFactors
Re: Negative or zero value for fieldNorm
Hi, I've worked around the issue by setting omitNorms=true on the title field. Now all fieldNorm values are 1.0f and therefore do not mess up my scores anymore. This, of course, is hardly a solution even though i currently do not use index-time boosts on any field. The question remains, why does the title field return a fieldNorm=0 for many queries? And a subquestion, does the luke request handler return boost values for documents? I know i get boost values for fields but i haven't seen boost values for documents. Cheers, On Wednesday 03 November 2010 20:44:48 Markus Jelsma wrote: > > Regarding "Negative or zero value for fieldNorm", I don't see any > > negative fieldNorms here... just very small positive ones? > > Of course, you're right. The E-# got twisted in my mind and became > negative. Silly me. > > > Anyway the fieldNorm is the product of the lengthNorm and the > > index-time boost of the field (which is itself the product of the > > index time boost on the document and the index time boost of all > > instances of that field). Index time boosts default to "1" though, so > > they have no effect unless something has explicitly set a boost. > > I've just checked docs 7 and 1462 (resp. first and second in debug output > below) with Luke. The title and content fields have no index time boosts, > thus defaulting to 1.0f which is fine. > > Then, why does doc 7 have a fieldNorm of 0.0 on title (and so setting a 0.0 > score on the doc in the result set) and does doc 1462 have a very very > small fieldNorm? > > debugOutput for doc 7: > 0.0 = fieldNorm(field=title, doc=7) > > Luke on the title field of doc 7. > 1.0 > > Thanks for your reply! > > > -Yonik > > http://www.lucidimagination.com > > > > > > > > On Wed, Nov 3, 2010 at 2:30 PM, Markus Jelsma > > > > wrote: > > > Hi all, > > > > > > I've got some puzzling issue here. During tests i noticed a document at > > > the bottom of the results where it should not be. I query using DisMax > > > on title and content field and have a boost on title using qf. Out of > > > 30 results, only two documents also have the term in the title. > > > > > > Using debugQuery and fl=*,score i quickly noticed large negative > > > maxScore of the complete resultset and a portion of the resultset > > > where scores sum up to zero because of a product with 0 (fieldNorm). > > > > > > See below for debug output for a result with score = 0: > > > > > > 0.0 = (MATCH) sum of: > > > 0.0 = (MATCH) max of: > > >0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of: > > > 0.75658196 = queryWeight(content:kunstgrasveld), product of: > > >6.6516633 = idf(docFreq=33, maxDocs=9682) > > >0.113743275 = queryNorm > > > > > > 0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of: > > >2.236068 = tf(termFreq(content:kunstgrasveld)=5) > > >6.6516633 = idf(docFreq=33, maxDocs=9682) > > >0.0 = fieldNorm(field=content, doc=7) > > > > > >0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of: > > > 1.0 = tf(termFreq(title:kunstgrasveld)=1) > > > 8.791729 = idf(docFreq=3, maxDocs=9682) > > > 0.0 = fieldNorm(field=title, doc=7) > > > > > > And one with a negative score: > > > > > > 3.0716116E-4 = (MATCH) sum of: > > > 3.0716116E-4 = (MATCH) max of: > > >3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462), > > >product > > > > > > of: 0.75658196 = queryWeight(content:kunstgrasveld), product of: > > > 6.6516633 = idf(docFreq=33, maxDocs=9682) > > > > > >0.113743275 = queryNorm > > > > > > 4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462), > > > > > > product of: > > >1.0 = tf(termFreq(content:kunstgrasveld)=1) > > >6.6516633 = idf(docFreq=33, maxDocs=9682) > > >6.1035156E-5 = fieldNorm(field=content, doc=1462) > > > > > > There are no funky issues with term analysis for the text fieldType, in > > > fact, the term passes through unchanged. I don't do omitNorms, i store > > > termVectors etc. > > > > > > Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my > > > input from Nutch is messed up. A fieldNorm can never be =< 0 for a > > > normal positive boost and field boosts should not be zero or negative > > > (correct me if i'm wrong). But, since i can't yet figure out what field > > > boosts Nutch sends to me i thought i'd drop by on this mailing list > > > first. > > > > > > There are quite a few query terms that return with zero or negative > > > scores and many that behave as i expect. I find it also a bit hard to > > > comprehend why the docs with negative score rank higher in the result > > > set than documents with zero score. Sorting defaults to score DESC, > > > but this is perhaps another issue. > > > > > > Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the > > > hood. Help or directions are appreciated =) > > > > > > Cheers, > > > > > > -- >
ContentStreamDataSource
Hi! I am trying to get the ContentStreamDataSource to work properly , but there are not many examples out there. What I have done is that I have made a copy of my HttpDataSource config and replaced the
querying multiple fields as one
Hi all, having two fields named 'type' and 'cat' with identical type and options, but different values recorded, would it be possible to query them as they were one field? For instance q=type:electronics cat:electronics should return same results as q=common:electronics I know I could make it defining a third field 'common' with copyFields from 'type' and 'cat' to 'common' but this wouldn't be feasible if you've already lots of documents in your index and don't want to reindex everything, isn't it? Any suggestions? Thanks in advance, Tommaso
Re: querying multiple fields as one
On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili wrote: > Hi all, > having two fields named 'type' and 'cat' with identical type and options, > but different values recorded, would it be possible to query them as they > were one field? > For instance > q=type:electronics cat:electronics > should return same results as > q=common:electronics > I know I could make it defining a third field 'common' with copyFields from > 'type' and 'cat' to 'common' but this wouldn't be feasible if you've > already > lots of documents in your index and don't want to reindex everything, isn't > it? > Any suggestions? > Thanks in advance, > Tommaso > Tommaso, If re-indexing is not feasible/preferred, you might try looking into creating a dismax handler that should give you what you're looking for in your query: http://wiki.apache.org/solr/DisMaxQParserPlugin. The same solrconfig.xml that comes with SOLR has a dismax parser that you can modify to your needs. - Ken Stanley
Re: querying multiple fields as one
Ken's suggestion to look at dismax is a good one, but I have a question q=type:electronics cat:electronics should do what you want assuming your default operator is OR. Is it failing? Or is the real question how you can do this automatically? I'd expect the ranking to be a bit different, but I'm guessing that's not a big issue Best Erick On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili wrote: > Hi all, > having two fields named 'type' and 'cat' with identical type and options, > but different values recorded, would it be possible to query them as they > were one field? > For instance > q=type:electronics cat:electronics > should return same results as > q=common:electronics > I know I could make it defining a third field 'common' with copyFields from > 'type' and 'cat' to 'common' but this wouldn't be feasible if you've > already > lots of documents in your index and don't want to reindex everything, isn't > it? > Any suggestions? > Thanks in advance, > Tommaso >
Re: ContentStreamDataSource
for contentstreamdatasource to work you must post the stream in the request On Thu, Nov 4, 2010 at 8:13 AM, Theodor Tolstoy wrote: > Hi! > I am trying to get the ContentStreamDataSource to work properly , but there > are not many examples out there. > > What I have done is that I have made a copy of my HttpDataSource config > and replaced the > If understand everything correctly I should be able to use the same URL > syntax as with HttpDataSource and supply the XML file as post data. > > I have tried to post data - both as binary, file and string to the URL, but > nothing happens. > > > This is the log file: > 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter > doFullImport > INFO: Starting Full Import > 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter > readIndexerProperties > VARNING: Unable to read: datapush.properties > 2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder execute > INFO: Time taken = 0:0:0.0 > 2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute > INFO: [] webapp=/solr path=/datapush > params={clean=false&entity=suLIBRIS&command=full-import} status=0 QTime=0 > > > What am I doing wrong? > > Regards > Theodor Tolstoy > Developer Stockholm university library > > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: mergeFactor questions
On 11/4/2010 3:27 AM, Tommaso Teofili wrote: - Is mergeFactor a one time configuration setting that is considered only when creating the index for the first time or can it be adjusted later even with some docs inside the index? e.g. I have mF to 10 then I realize I want quicker searches and I set it to 2 so that at the next optimize/commit I will have no more than 2 segments. My understanding is that one can adjust mF over time, is it right? The mergeFactor is applied anytime documents are added to the index, not just when it is built for the first time. You can adjust it later, and reload the core or restart Solr. It will apply to any additional indexing from that point forward. With a mergeFactor of 10, having 21 segments (and more) temporarily on the disk at the same time is reasonably possible. I know this applies if you are doing a continuous large insert, not sure if you are doing several small inserts separately. These segments are: * The small segment that is being built right now. * The previous 10 small segments. * The merged segment being created from those above. * The previous 9 merged segments. If it takes a really long time to merge the last 10 small segments and then merge the 10 large segments into an even larger segment, you can end up with even more small segments from your continuous insert. If it should take long enough that you actually get 10 more new small segments, the large merge will pause while it completes the small merge. I saw this happen recently when I decided to see what happens if I built a single shard from our entire database. It took a really long time, partly from that super-merge and the optimize that happened later, and took up 85GB of disk space. I'm not really sure what happens if you have this continue beyond a single super-merge like I have mentioned. - In a replicated environment does it make sense to define different mergeFactors on master and slave? I'd say no since it influences the number of segments created, that being a concern of who actually index documents (the master) not of who receives (segments of) index, but please correct me if I am wrong. Because it only applies when indexes are being built, it has no meaning on a slave, which as you said, just copies the data from the master. Shawn
Re: Negative or zero value for fieldNorm
On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma wrote: > The question remains, why does the title field return a fieldNorm=0 for many > queries? Because the index-time boost was set to 0 when the doc was indexed. I can't say how that happened... look to your indexing code. > And a subquestion, does the luke request handler return boost values > for documents? I know i get boost values for fields but i haven't seen boost > values for documents. The doc boost is just multiplied into each field boost and doesn't have a separate representation in the index. -Yonik http://www.lucidimagination.com
Re: Problem escaping question marks
Have you tried encoding it with %3F? firstname:*%3F* On 2010-11-04, at 1:44 AM, Stephen Powis wrote: > I'm having difficulty properly escaping ? in my search queries. It seems as > tho it matches any character. > > Some info, a simplified schema and query to explain the issue I'm having. > I'm currently running solr1.4.1 > > Schema: > > > required="false" /> > > I want to return any first name with a Question Mark in it > Query: first_name: *\?* > > Returns all documents with any character in it. > > Can anyone lend a hand? > Thanks! > Stephen
Re: Problem escaping question marks
On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis wrote: > I want to return any first name with a Question Mark in it > Query: first_name: *\?* > There is no way to escape the metacharacters * or ? for a wildcard query (regardless of queryparser, even if you write your own). See https://issues.apache.org/jira/browse/LUCENE-588 Its something we could fix, but in all honesty it seems one reason it isn't fixed is because the bug is so old, yet there hasn't really been any indication of demand for such a thing...
Re: Negative or zero value for fieldNorm
I've done some testing with the example docs and it behaves similar when there is a zero doc boost. Luke, however, does not show me the index-time boosts. Bost document and field boosts are not visible in Luke's output. I've changed doc boost and field boosts for the mp500.xml document but all i ever see returned is boost=1.0. Is this correct? Anyway, i'm looking at Nutch now for reasons why i sends a zero boost on a docuement. On Thursday 04 November 2010 14:16:22 Yonik Seeley wrote: > On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma > > wrote: > > The question remains, why does the title field return a fieldNorm=0 for > > many queries? > > Because the index-time boost was set to 0 when the doc was indexed. I > can't say how that happened... look to your indexing code. > > > And a subquestion, does the luke request handler return boost values > > for documents? I know i get boost values for fields but i haven't seen > > boost values for documents. > > The doc boost is just multiplied into each field boost and doesn't > have a separate representation in the index. > > -Yonik > http://www.lucidimagination.com -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Re: Negative or zero value for fieldNorm
On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma wrote: > I've done some testing with the example docs and it behaves similar when there > is a zero doc boost. Luke, however, does not show me the index-time boosts. Remember that the norm is a product of the length norm and the index time boost... it's recorded as a single number in the index. > Bost document and field boosts are not visible in Luke's output. I've changed > doc boost and field boosts for the mp500.xml document but all i ever see > returned is boost=1.0. Is this correct? Perhaps you still have omitNorms=true for the field you are querying? -Yonik http://www.lucidimagination.com
Re: Negative or zero value for fieldNorm
On Thursday 04 November 2010 15:12:23 Yonik Seeley wrote: > On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma > > wrote: > > I've done some testing with the example docs and it behaves similar when > > there is a zero doc boost. Luke, however, does not show me the > > index-time boosts. > > Remember that the norm is a product of the length norm and the index > time boost... it's recorded as a single number in the index. Yes. > > Bost document and field boosts are not visible in Luke's output. I've > > changed doc boost and field boosts for the mp500.xml document but all i > > ever see returned is boost=1.0. Is this correct? > > Perhaps you still have omitNorms=true for the field you are querying? The example schema does not have omitNorms=true on the name, cat or features field. > > -Yonik > http://www.lucidimagination.com -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
Updating Solr index - DIH delta vs. task queues
Hi, I have data stored in a database that is being updated constantly. I need to find a way to update Solr index as data in the database is being updated. There seems to be 2 main schools of thoughts on this: 1) DIH delta - query the database for all records that have a timestamp later than the last_index_time. Import those records for indexing to Solr 2) Task queue - every time a record is updated in the database, throw a task to a queue to index that record to Solr Just want to know what are the pros and cons of each approach and what is your experience. For someone starting new, what'd be your recommendation? ThanksAndy
Re: Updating Solr index - DIH delta vs. task queues
I'm in the same scenario, so this answer would be helpful too.. I'm adding... 3) Web Service - Request a webservice for all the new data that has been updated (can this be done? On Thu, Nov 4, 2010 at 2:38 PM, Andy wrote: > Hi, > I have data stored in a database that is being updated constantly. I need > to find a way to update Solr index as data in the database is being updated. > There seems to be 2 main schools of thoughts on this: > 1) DIH delta - query the database for all records that have a timestamp > later than the last_index_time. Import those records for indexing to Solr > 2) Task queue - every time a record is updated in the database, throw a > task to a queue to index that record to Solr > Just want to know what are the pros and cons of each approach and what is > your experience. For someone starting new, what'd be your recommendation? > ThanksAndy > > > -- __ Ezequiel. Http://www.ironicnet.com
Re: Optimize Index
On 11/4/2010 7:22 AM, stockiii wrote: how can i start an optimize by using DIH, but NOT after an delta- or full-import ? I'm not aware of a way to do this with DIH, though there might be something I'm not aware of. You can do it with an HTTP POST. Here's how to do it with curl: /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \ -H "Content-Type: text/xml" \ --data-binary '' Shawn
Using setStart in solrj
Hi all- First, thanks to all the folks to have helped me so far getting the hang of Solr; I promise to give back when I think my contributions will be useful :) I am at the point where I'm trying to return results back from a search in a war file, using Java with solrj. On the result page of the website I'd want to limit the actual results to probably around 20 or so, with the usual "next/prev page" paradigm. The issue I've been wrestling with is keeping the SolrQuery object around so that I don't need to transmit the entire thing back to the client, especially if they search for something like "truck", which could return a lot of results. I was thinking that one solution would be to do a "query.setRows(20);" for the query, then return the results back with some sort of an identifier so that on subsequent queries, I could also include "query.setStart(someCounter + 1);" to get the next set of 20. In theory, that would work at the cost of having to re-execute the query. I've been looking for information about setStart() and haven't found much more than Javadoc that says "sets the starting row for the result set". My question is, how do I know what the starting row is? Maybe, based on the search parameters, it will always return the results in an implicit order in which case is it just like executing a fixed query in a database and then grabbing the next 20 rows from the result set? Because the user would be pressing the prev/next buttons, even though the query is being re-executed, the parameters would not be changing. That's the theory, anyway. It seems excessive to keep executing the same query over and over again just because the user wants to see the next set of results, especially if the original SolrQuery object has them all, but maybe that's just what needs to be done, given the stateless nature of the web. Any info on this method/strategy would be most appreciated. Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Optimize Index
For what it's worth, the Solr class instructor at the Lucene Revolution conference recommended *against* optimizing, and instead suggested to just let the merge factor do it's job. On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey wrote: > On 11/4/2010 7:22 AM, stockiii wrote: > >> how can i start an optimize by using DIH, but NOT after an delta- or >> full-import ? >> > > I'm not aware of a way to do this with DIH, though there might be something > I'm not aware of. You can do it with an HTTP POST. Here's how to do it > with curl: > > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \ > -H "Content-Type: text/xml" \ > --data-binary '' > > Shawn > >
Re: Optimize Index
Huh? That's something new for me. Optmize removed documents that have been flagged for deletion. For relevancy it's important those are removed because document frequencies are not updated for deletes. Did i miss something? > For what it's worth, the Solr class instructor at the Lucene Revolution > conference recommended *against* optimizing, and instead suggested to just > let the merge factor do it's job. > > On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey wrote: > > On 11/4/2010 7:22 AM, stockiii wrote: > >> how can i start an optimize by using DIH, but NOT after an delta- or > >> full-import ? > > > > I'm not aware of a way to do this with DIH, though there might be > > something I'm not aware of. You can do it with an HTTP POST. Here's > > how to do it with curl: > > > > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \ > > -H "Content-Type: text/xml" \ > > --data-binary '' > > > > Shawn
Does DataImportHandler support Digest authentication
I need to connect to a RETS api through a http url. But the REST service uses digest authentication. Can I use DataImportHandler to pass the credentials for digest authentication? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Does-DataImportHandler-support-Digest-authentication-tp1844497p1844497.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does DataImportHandler support Digest authentication
I mean to say RESTful Apis. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-DataImportHandler-support-Digest-authentication-tp1844497p1844501.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Testing/packaging question
Hi, I'm now trying to export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" and restarting tomcat (v6 package from ubuntu maverick) via sudo /etc/init.d/tomcat6 restart but solr still doesn't seem to find that schema.xml, as it complains about unknown fields when running the tests that require that schema.xml Can someone please tell me what I'm doing wrong -- and what I should be doing? TIA again, Bernhard Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > Hi, > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > see > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > In order to run solrpy's supplied tests at build time, I'd need Solr to > know about the schema.xml that comes with the tests. > Can anyone tell me how do that properly? I'd basically need Solr to > temporarily recognize that schema.xml without permanently installing it > -- is there any way to do this, eg via environment variables? > > TIA > Bernhard Reiter
Re: Problem escaping question marks
Looking at the JIRA issue, looks like there's been a new patch related to this. This is good news! We've re-written a portion of our web app to use Solr instead of mysql. This part of our app allows clients to construct rules to match data within their account, and automatically apply actions to those matched data points. So far our testing and then rollout has been smooth, until we encountered the above rule/query. I guess I assumed since these metacharacters were escaped that they would be parsed correctly under any type of query. What is the likelihood of this being included in the next release/bug fix version of Solr? Are there docs available online with basic information about rolling our own build of Solr that includes this patch? I appreciate your help! Thanks! Stephen On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir wrote: > On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis > wrote: > > I want to return any first name with a Question Mark in it > > Query: first_name: *\?* > > > > There is no way to escape the metacharacters * or ? for a wildcard > query (regardless of queryparser, even if you write your own). > See https://issues.apache.org/jira/browse/LUCENE-588 > > Its something we could fix, but in all honesty it seems one reason it > isn't fixed is because the bug is so old, yet there hasn't really been > any indication of demand for such a thing... >
RE: Testing/packaging question
I believe it should point to the directory above, where conf and lib are located (though I have a multi-core setup). Mine is set to: /usr/local/jboss-5.1.0.GA/server/solr/solr_data/ And in solr_data the solr.xml defines the two cores, but in each core directory, is a conf, data, and lib directory, which contains the schema.xml. -Original Message- From: Bernhard Reiter [mailto:ock...@raz.or.at] Sent: Thursday, November 04, 2010 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Testing/packaging question Hi, I'm now trying to export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" and restarting tomcat (v6 package from ubuntu maverick) via sudo /etc/init.d/tomcat6 restart but solr still doesn't seem to find that schema.xml, as it complains about unknown fields when running the tests that require that schema.xml Can someone please tell me what I'm doing wrong -- and what I should be doing? TIA again, Bernhard Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > Hi, > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > see > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > In order to run solrpy's supplied tests at build time, I'd need Solr to > know about the schema.xml that comes with the tests. > Can anyone tell me how do that properly? I'd basically need Solr to > temporarily recognize that schema.xml without permanently installing it > -- is there any way to do this, eg via environment variables? > > TIA > Bernhard Reiter DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Deletes writing bytes len 0, corrupting the index
I'm still seeing this error after downloading the latest 2.9 branch version, compiling, copying to Solr 1.4 and deploying. Basically as mentioned, the .del files are of zero length... Hmm... On Wed, Oct 13, 2010 at 1:33 PM, Jason Rutherglen wrote: > Thanks Robert, that Jira issue aptly describes what I'm seeing, I think. > > On Wed, Oct 13, 2010 at 10:22 AM, Robert Muir wrote: >> if you are going to fill up your disk space all the time with solr >> 1.4.1, I suggest replacing the lucene jars with lucene jars from >> 2.9-branch >> (http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/). >> >> then you get the fix for https://issues.apache.org/jira/browse/LUCENE-2593 >> too. >> >> On Wed, Oct 13, 2010 at 11:37 AM, Jason Rutherglen >> wrote: >>> We have unit tests for running out of disk space? However we have >>> Tomcat logs that fill up quickly and starve Solr 1.4.1 of space. The >>> main segments are probably not corrupted, however routinely now, there >>> are deletes files of length 0. >>> >>> 0 2010-10-12 18:35 _cc_8.del >>> >>> Which is fundamental index corruption, though less extreme. Are we >>> testing for this? >>> >> >
Re: Problem escaping question marks
Wildcard queries, especially a wildcard query with a wildcard both _before_ and _after_, are going to be fairly slow for Solr to process, anyhow. (In fact, for some reason I thought wildcards weren't even supported both before and after, just one or the other). Still, it's a bug in lucene, it ought not to do that, true. But there may be a better design to handle your actual use case with much better performance anyhow. Based around doing something at indexing time to tokenize in a different field on individual letters (if perhaps you frequently want to search on arbitrary individual characters), or to simply index a "1" or "0" in a field depending on whether it includes a question mark if you specifically want to search all the time on question marks and don't care about other letters. Or some kind of more complex ngram'ing, if you want to be able to search on all sorts of sub-strings, efficiently. The trade-off will be disk space for performance... but if you start to have a lot of records, that wildcard-on-both-sides thing will have unacceptable performance, I predict. Jonathan Stephen Powis wrote: Looking at the JIRA issue, looks like there's been a new patch related to this. This is good news! We've re-written a portion of our web app to use Solr instead of mysql. This part of our app allows clients to construct rules to match data within their account, and automatically apply actions to those matched data points. So far our testing and then rollout has been smooth, until we encountered the above rule/query. I guess I assumed since these metacharacters were escaped that they would be parsed correctly under any type of query. What is the likelihood of this being included in the next release/bug fix version of Solr? Are there docs available online with basic information about rolling our own build of Solr that includes this patch? I appreciate your help! Thanks! Stephen On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir wrote: On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis wrote: I want to return any first name with a Question Mark in it Query: first_name: *\?* There is no way to escape the metacharacters * or ? for a wildcard query (regardless of queryparser, even if you write your own). See https://issues.apache.org/jira/browse/LUCENE-588 Its something we could fix, but in all honesty it seems one reason it isn't fixed is because the bug is so old, yet there hasn't really been any indication of demand for such a thing...
RE: Testing/packaging question
You need to either add that to catalina.sh or create a setenv.sh in the CATALINA_HOME/bin directory. Then you can restart tomcat. So, setenv.sh would contain the following: export JAVA_HOME="/path/to/jre" export JAVA_OPTS="="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" If you were setting the export in your own environment and then issuing the restart, tomcat was not picking up your local environment because it's running as root. You don't want to change root's environment. You could also, create a context.xml in you CATALINA_HOME/conf/CATALINA/localhost. You should be able to find those instruction on/through the Solr FAQ. Hope this helps. From: Bernhard Reiter [ock...@raz.or.at] Sent: Thursday, November 04, 2010 4:49 PM To: solr-user@lucene.apache.org Subject: Re: Testing/packaging question Hi, I'm now trying to export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" and restarting tomcat (v6 package from ubuntu maverick) via sudo /etc/init.d/tomcat6 restart but solr still doesn't seem to find that schema.xml, as it complains about unknown fields when running the tests that require that schema.xml Can someone please tell me what I'm doing wrong -- and what I should be doing? TIA again, Bernhard Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > Hi, > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > see > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > In order to run solrpy's supplied tests at build time, I'd need Solr to > know about the schema.xml that comes with the tests. > Can anyone tell me how do that properly? I'd basically need Solr to > temporarily recognize that schema.xml without permanently installing it > -- is there any way to do this, eg via environment variables? > > TIA > Bernhard Reiter
Re: Using setStart in solrj
Hi Ron, how do I know what the starting row Always 0. especially if the original SolrQuery object has them all thats the point. solr will normally cache it for you. This is your friend: 40 just try it first with http to get an impression what start is good for: it just sets the starting doc for the current query. E.g. you have a very complicated query ala select?q=xy¶m1=...¶m2=...¶mN=...&rows=20&start=0 the next *page* would be select?q=xy¶m1=...¶m2=...¶mN=...&rows=20&start=20 (newStart=oldStart+rows) (To get the next page you'll need to keep the params either in the session or 'encoded' within the url.) Just try and ask if you need more info :-) Regards, Peter. Hi all- First, thanks to all the folks to have helped me so far getting the hang of Solr; I promise to give back when I think my contributions will be useful :) I am at the point where I'm trying to return results back from a search in a war file, using Java with solrj. On the result page of the website I'd want to limit the actual results to probably around 20 or so, with the usual "next/prev page" paradigm. The issue I've been wrestling with is keeping the SolrQuery object around so that I don't need to transmit the entire thing back to the client, especially if they search for something like "truck", which could return a lot of results. I was thinking that one solution would be to do a "query.setRows(20);" for the query, then return the results back with some sort of an identifier so that on subsequent queries, I could also include "query.setStart(someCounter + 1);" to get the next set of 20. In theory, that would work at the cost of having to re-execute the query. I've been looking for information about setStart() and haven't found much more than Javadoc that says "sets the starting row for the result set". My question is, how do I know what the starting row is? Maybe, based on the search parameters, it will always return the results in an implicit order in which case is it just like executing a fixed query in a database and then grabbing the next 20 rows from the result set? Because the user would be pressing the prev/next buttons, even though the query is being re-executed, the parameters would not be changing. That's the theory, anyway. It seems excessive to keep executing the same query over and over again just because the user wants to see the next set of results, especially if the original SolrQuery object has them all, but maybe that's just what needs to be done, given the stateless nature of the web. Any info on this method/strategy would be most appreciated. Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. -- http://jetwick.com twitter search prototype
Re: Optimize Index
what you can try maxSegments=2 or more as a 'partial' optimize: "If the index is so large that optimizes are taking longer than desired or using more disk space during optimization than you can spare, consider adding the maxSegments parameter to the optimize command. In the XML message, this would be an attribute; the URL form and SolrJ have the corresponding option too. By default this parameter is 1 since an optimize results in a single Lucene "segment". By setting it larger than 1 but less than the mergeFactor, you permit partial optimization to no more than this many segments. Of course the index won't be fully optimized and therefore searches will be slower. " from http://wiki.apache.org/solr/PacktBook2009 (I only found that link there must be sth. on the real wiki for the maxSegments parameter ...) Hello. My Index have ~30 Million documents and a optimize=true is very heavy. it takes long time ... how can i start an optimize by using DIH, but NOT after an delta- or full-import ? i set my index to compound-index. thx -- http://jetwick.com twitter search prototype
RE: Testing/packaging question
The thing is, I only have a schema.xml -- no data, no lib directories. See the tests subdirectory in the solrpy package: http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz Bernhard Am Donnerstag, den 04.11.2010, 15:59 -0500 schrieb Olson, Ron: > I believe it should point to the directory above, where conf and lib are > located (though I have a multi-core setup). > > Mine is set to: > > /usr/local/jboss-5.1.0.GA/server/solr/solr_data/ > > And in solr_data the solr.xml defines the two cores, but in each core > directory, is a conf, data, and lib directory, which contains the schema.xml. > > > > -Original Message- > From: Bernhard Reiter [mailto:ock...@raz.or.at] > Sent: Thursday, November 04, 2010 3:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Testing/packaging question > > Hi, > > I'm now trying to > > export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" > > and restarting tomcat (v6 package from ubuntu maverick) via > > sudo /etc/init.d/tomcat6 restart > > but solr still doesn't seem to find that schema.xml, as it complains > about unknown fields when running the tests that require that schema.xml > > Can someone please tell me what I'm doing wrong -- and what I should be > doing? > > TIA again, > Bernhard > > Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > > Hi, > > > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > > see > > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > > > In order to run solrpy's supplied tests at build time, I'd need Solr to > > know about the schema.xml that comes with the tests. > > Can anyone tell me how do that properly? I'd basically need Solr to > > temporarily recognize that schema.xml without permanently installing it > > -- is there any way to do this, eg via environment variables? > > > > TIA > > Bernhard Reiter > > > > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you.
Re: Problem escaping question marks
On Thu, Nov 4, 2010 at 4:58 PM, Stephen Powis wrote: > What is the likelihood of this being included in the next release/bug fix > version of Solr? In this case, not likely. It will have to wait for Solr 4.0 > Are there docs available online with basic information > about rolling our own build of Solr that includes this patch? you can checkout trunk with 'svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk' and apply the patch with 'patch -p0 < foo.patch'
RE: Testing/packaging question
Thanks for your instructions. Unfortunately, I need to do all that as part of my package's (python-solrpy) build procedure, so I can't change any global configuration, such as in the catalina subdirectories. I've already sensed that restarting tomcat is also just too system-invasive and would include changing its (system-wide) configuration. Are there any other ways to use solr for running the tests from http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz without having to change any system configuration? Maybe via a user Tomcat instance such as provided by the tomcat6-user debian package? Thanks for your help! Bernhard Am Donnerstag, den 04.11.2010, 16:15 -0500 schrieb Turner, Robbin J: > You need to either add that to catalina.sh or create a setenv.sh in the > CATALINA_HOME/bin directory. Then you can restart tomcat. > > So, setenv.sh would contain the following: > >export JAVA_HOME="/path/to/jre" >export JAVA_OPTS="="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" > > If you were setting the export in your own environment and then issuing the > restart, tomcat was not picking up your local environment because it's > running as root. You don't want to change root's environment. > > You could also, create a context.xml in you > CATALINA_HOME/conf/CATALINA/localhost. You should be able to find those > instruction on/through the Solr FAQ. > > Hope this helps. > > From: Bernhard Reiter [ock...@raz.or.at] > Sent: Thursday, November 04, 2010 4:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Testing/packaging question > > Hi, > > I'm now trying to > > export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" > > and restarting tomcat (v6 package from ubuntu maverick) via > > sudo /etc/init.d/tomcat6 restart > > but solr still doesn't seem to find that schema.xml, as it complains > about unknown fields when running the tests that require that schema.xml > > Can someone please tell me what I'm doing wrong -- and what I should be > doing? > > TIA again, > Bernhard > > Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > > Hi, > > > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > > see > > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > > > In order to run solrpy's supplied tests at build time, I'd need Solr to > > know about the schema.xml that comes with the tests. > > Can anyone tell me how do that properly? I'd basically need Solr to > > temporarily recognize that schema.xml without permanently installing it > > -- is there any way to do this, eg via environment variables? > > > > TIA > > Bernhard Reiter
Re: querying multiple fields as one
Hi Erick 2010/11/4 Erick Erickson > Ken's suggestion to look at dismax is a good one, but I have > a question > q=type:electronics cat:electronics > > should do what you want assuming your default operator > is OR. correct > Is it failing? Or is the real question how you can > do this automatically? > No failing, just looking for how to do such "expansion" of fields automatically (with fields in OR but that's not an issue I think) > > I'd expect the ranking to be a bit different, but I'm guessing > that's not a big issue > right, no problem if the scoring isn't exactly the same. Thanks, Tommaso > > Best > Erick > > On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili > wrote: > > > Hi all, > > having two fields named 'type' and 'cat' with identical type and options, > > but different values recorded, would it be possible to query them as they > > were one field? > > For instance > > q=type:electronics cat:electronics > > should return same results as > > q=common:electronics > > I know I could make it defining a third field 'common' with copyFields > from > > 'type' and 'cat' to 'common' but this wouldn't be feasible if you've > > already > > lots of documents in your index and don't want to reindex everything, > isn't > > it? > > Any suggestions? > > Thanks in advance, > > Tommaso > > >
RE: Testing/packaging question
You can setup your own tomcat instance which would contain just configurations you need. You won't even have to recreate all the tomcat configuration and binaries, just the ones that were not defaults. So, if you lookup multiple tomcat configuration instance (google it), and then you'll have a set of directories. You'll need to have your own startup script that points to your configurations. You can use the current startup script as a model, then in your build procedures (I've done all this with a script) have this added to the system so you can preform restart. You'd have to have a couple of other environment variables set: export CATALINA_BASE=/path/to/your/tomcat/instance/conf/files export CATALINA_HOME=/path/to/default/installation/bin/files export SOLR_HOME=/path/to/solr/dataNconf Good luck From: Bernhard Reiter [ock...@raz.or.at] Sent: Thursday, November 04, 2010 5:49 PM To: solr-user@lucene.apache.org Subject: RE: Testing/packaging question Thanks for your instructions. Unfortunately, I need to do all that as part of my package's (python-solrpy) build procedure, so I can't change any global configuration, such as in the catalina subdirectories. I've already sensed that restarting tomcat is also just too system-invasive and would include changing its (system-wide) configuration. Are there any other ways to use solr for running the tests from http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz without having to change any system configuration? Maybe via a user Tomcat instance such as provided by the tomcat6-user debian package? Thanks for your help! Bernhard Am Donnerstag, den 04.11.2010, 16:15 -0500 schrieb Turner, Robbin J: > You need to either add that to catalina.sh or create a setenv.sh in the > CATALINA_HOME/bin directory. Then you can restart tomcat. > > So, setenv.sh would contain the following: > >export JAVA_HOME="/path/to/jre" >export JAVA_OPTS="="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" > > If you were setting the export in your own environment and then issuing the > restart, tomcat was not picking up your local environment because it's > running as root. You don't want to change root's environment. > > You could also, create a context.xml in you > CATALINA_HOME/conf/CATALINA/localhost. You should be able to find those > instruction on/through the Solr FAQ. > > Hope this helps. > > From: Bernhard Reiter [ock...@raz.or.at] > Sent: Thursday, November 04, 2010 4:49 PM > To: solr-user@lucene.apache.org > Subject: Re: Testing/packaging question > > Hi, > > I'm now trying to > > export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" > > and restarting tomcat (v6 package from ubuntu maverick) via > > sudo /etc/init.d/tomcat6 restart > > but solr still doesn't seem to find that schema.xml, as it complains > about unknown fields when running the tests that require that schema.xml > > Can someone please tell me what I'm doing wrong -- and what I should be > doing? > > TIA again, > Bernhard > > Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > > Hi, > > > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > > see > > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > > > In order to run solrpy's supplied tests at build time, I'd need Solr to > > know about the schema.xml that comes with the tests. > > Can anyone tell me how do that properly? I'd basically need Solr to > > temporarily recognize that schema.xml without permanently installing it > > -- is there any way to do this, eg via environment variables? > > > > TIA > > Bernhard Reiter
Re: querying multiple fields as one
Tommaso Teofili wrote: No failing, just looking for how to do such "expansion" of fields automatically (with fields in OR but that's not an issue I think) the dismax query parser is that way.
Re: mergeFactor questions
Thanks so much Shawn, I am in a scenario with many inserts while searching, each consisting of ~ 500documents, I will monitor the number of segments taking your considerations in mind :-) Regards, Tommaso 2010/11/4 Shawn Heisey > On 11/4/2010 3:27 AM, Tommaso Teofili wrote: > >>- Is mergeFactor a one time configuration setting that is considered >> only >> >>when creating the index for the first time or can it be adjusted later >> even >>with some docs inside the index? e.g. I have mF to 10 then I realize I >> want >>quicker searches and I set it to 2 so that at the next optimize/commit >> I >>will have no more than 2 segments. My understanding is that one can >> adjust >>mF over time, is it right? >> > > The mergeFactor is applied anytime documents are added to the index, not > just when it is built for the first time. You can adjust it later, and > reload the core or restart Solr. It will apply to any additional indexing > from that point forward. > > With a mergeFactor of 10, having 21 segments (and more) temporarily on the > disk at the same time is reasonably possible. I know this applies if you > are doing a continuous large insert, not sure if you are doing several small > inserts separately. These segments are: > > * The small segment that is being built right now. > * The previous 10 small segments. > * The merged segment being created from those above. > * The previous 9 merged segments. > > If it takes a really long time to merge the last 10 small segments and then > merge the 10 large segments into an even larger segment, you can end up with > even more small segments from your continuous insert. If it should take > long enough that you actually get 10 more new small segments, the large > merge will pause while it completes the small merge. I saw this happen > recently when I decided to see what happens if I built a single shard from > our entire database. It took a really long time, partly from that > super-merge and the optimize that happened later, and took up 85GB of disk > space. > > I'm not really sure what happens if you have this continue beyond a single > super-merge like I have mentioned. > > - In a replicated environment does it make sense to define different >> >>mergeFactors on master and slave? I'd say no since it influences the >> number >>of segments created, that being a concern of who actually index >> documents >>(the master) not of who receives (segments of) index, but please >> correct me >>if I am wrong. >> > > Because it only applies when indexes are being built, it has no meaning on > a slave, which as you said, just copies the data from the master. > > Shawn > >
RE: Does Solr support Natural Language Search
Hi Jayant, I think you mean NL search as opposed to Boolean search: the ability to return ranked results from queries based on non-required term matches. Right? If that is what you meant, then the answer is: "Yes!". If not, then you should rephrase your question. Otherwise, the answer could eventually be: "Maybe!!!". YMMV, TMR. Steve > -Original Message- > From: jayant [mailto:jayan...@hotmail.com] > Sent: Wednesday, November 03, 2010 11:49 PM > To: solr-user@lucene.apache.org > Subject: Does Solr support Natural Language Search > > > Does Solr support Natural Language Search? I did not find any thing about > this in the reference manual. Please let me know. > Thanks. > -- > View this message in context: http://lucene.472066.n3.nabble.com/Does- > Solr-support-Natural-Language-Search-tp1839262p1839262.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Testing/packaging question
Hi, don't know if the python package provides one but solrj offers to start solr embedded (|EmbeddedSolrServer|) and setting up different schema + config is possible. for this see: https://karussell.wordpress.com/2010/06/10/how-to-test-apache-solrj/ if you need an 'external solr' (via jetty and java -jar start.jar) while tests running see this: http://java.dzone.com/articles/getting-know-solr Regards, Peter. Hi, I'm pretty much of a Solr newbie currently packaging solrpy for Debian; see http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ In order to run solrpy's supplied tests at build time, I'd need Solr to know about the schema.xml that comes with the tests. Can anyone tell me how do that properly? I'd basically need Solr to temporarily recognize that schema.xml without permanently installing it -- is there any way to do this, eg via environment variables? TIA Bernhard Reiter
Dataimporthandler crashed raidcontroller
Hi all, we had a severe problem with our raidcontroller on one of our servers today during importing a table with ~8 million rows into a solr index. After importing about 4 million documents, our server shutdown, and failed to restart due to a corrupt raid disk. The Solr data import was the only heavy process running on that machine during the crash. Has anyone experienced hdd/raid-related problems during indexing large sql databases into solr? thanks! -robert
Re: Optimize Index
no, you didn't miss anything. The comment at Lucen Revolution was more along the lines that optimize didn't actually improve much #absent# deletes. Plus, on a significant size corpus, the doc frequencies won't changed that much by deleting documents, but that's a case-by-case thing Best Erick On Thu, Nov 4, 2010 at 4:31 PM, Markus Jelsma wrote: > Huh? That's something new for me. Optmize removed documents that have been > flagged for deletion. For relevancy it's important those are removed > because > document frequencies are not updated for deletes. > > Did i miss something? > > > For what it's worth, the Solr class instructor at the Lucene Revolution > > conference recommended *against* optimizing, and instead suggested to > just > > let the merge factor do it's job. > > > > On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey wrote: > > > On 11/4/2010 7:22 AM, stockiii wrote: > > >> how can i start an optimize by using DIH, but NOT after an delta- or > > >> full-import ? > > > > > > I'm not aware of a way to do this with DIH, though there might be > > > something I'm not aware of. You can do it with an HTTP POST. Here's > > > how to do it with curl: > > > > > > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \ > > > -H "Content-Type: text/xml" \ > > > --data-binary '' > > > > > > Shawn >
Re: Dataimporthandler crashed raidcontroller
I experienced similar problems. It was because we didn't perform load stress tests properly, before going to production. Nothing is forever, replace controller, change hardware vendor, maintain low temperature inside a rack. Thanks --Original Message-- From: Robert Gründler To: solr-user@lucene.apache.org ReplyTo: solr-user@lucene.apache.org Subject: Dataimporthandler crashed raidcontroller Sent: Nov 4, 2010 7:21 PM Hi all, we had a severe problem with our raidcontroller on one of our servers today during importing a table with ~8 million rows into a solr index. After importing about 4 million documents, our server shutdown, and failed to restart due to a corrupt raid disk. The Solr data import was the only heavy process running on that machine during the crash. Has anyone experienced hdd/raid-related problems during indexing large sql databases into solr? thanks! -robert Sent on the TELUS Mobility network with BlackBerry
Re: Does Solr support Natural Language Search
I don't think current lucene will offer what you want now. There are 2 main tasks in a search process. One is "understanding" users' intension. Because natural language understanding is difficult, Current Information Retrival systems "force" users input some terms to express their needs. But terms have ambiguations. e.g. apple may means a fruit or electronics. so users are asked to inpinput more terms to disambiguate . e.g. apple fruit may suggest user want fruit apple. There are many things to help detect user's demand -- query expansion(Searches related to in google) suggests when user type .. The ultimate goal is understanding intension by analyzing users' natural language. Another is "understanding" documents. Current models such as VSM don't understanding document. it just regards documents as words' collections. when users input a word, it returns documents contains this word(tf). of course idf is also taken into consideration. But it's far from understanding. That's why Keyword stuffing comes out. Because machine don't really understanding the document and can't judge whether the document is good or bad or whether it matchs query good or bad So PageRank and some other external informations are used to relieve this problem. But can't fully solve it. To fully understand documents need more advaned NLP techs. But I don't think it will achieve human's intelligence in near future although I am a NLPer Another road is human help machine "understanding", That's which called web 2.0 social networks, semantic web ... But also not an easy task. 2010/11/4 jayant : > > Does Solr support Natural Language Search? I did not find any thing about > this in the reference manual. Please let me know. > Thanks. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Does-Solr-support-Natural-Language-Search-tp1839262p1839262.html > Sent from the Solr - User mailing list archive at Nabble.com. >
How to Facet on a price range
I am able to facet on a particular field because I have index on that field. But I am not sure how to facet on a price range when I have the exact price in the 'price' field. Can anyone help here. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1846392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re:Re: Re:Re: problem of solr replcation's speed
sometorment later I found the reason ofsolr replication'slow speed. It's not solr's problem.It's jetty's. I used to embed jetty7 in my app. But when I found solr's demo use jetty6 , I tried to use jetty6 in my app and I was so happy to get the fast speed. actually, I tried to change solr's demo in jetty7 by default's conf, the replication's speed was slow too. I don't know why the default jetty7 server is so slow. I wanna to find the reason.Maybe I can ask the jetty maillist or continue to read the codes. At 2010-11-02 07:28:54,"Lance Norskog" wrote: >This is the time to replicate and open the new index, right? Opening a >new index can take a lot of time. How many autowarmers and queries are >there in the caches? Opening a new index re-runs all of the queries in >all of the caches. > >2010/11/1 kafka0102 : >> I suspected my app has some sleeping op every 1s, so >> I changed ReplicationHandler.PACKET_SZ to 1024 * 1024*10; // 10MB >> >> and log result is like thus : >> [2010-11-01 >> 17:49:29][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3184 >> [2010-11-01 >> 17:49:32][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3426 >> [2010-11-01 >> 17:49:36][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3359 >> [2010-11-01 >> 17:49:39][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3166 >> [2010-11-01 >> 17:49:42][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3513 >> [2010-11-01 >> 17:49:46][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3140 >> [2010-11-01 >> 17:49:50][INFO][pool-6-thread-1][SnapPuller.java(1038)]readFully10485760 >> cost 3471 >> >> That means It's still slow like before. what's wrong with my env >> >> At 2010-11-01 17:30:32,kafka0102 wrote: >> I hacked SnapPuller to log the cost, and the log is like thus: >> [2010-11-01 >> 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 979 >> [2010-11-01 >> 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 4 >> [2010-11-01 >> 17:21:19][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 4 >> [2010-11-01 >> 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 980 >> [2010-11-01 >> 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 4 >> [2010-11-01 >> 17:21:20][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 5 >> [2010-11-01 >> 17:21:21][INFO][pool-6-thread-1][SnapPuller.java(1037)]readFully1048576 cost >> 979 >> >> >> It's saying it cost about 1000ms for transfering 1M data every 2 times. I >> used jetty as server and embeded solr in my app.I'm so confused.What I have >> done wrong? >> >> >> At 2010-11-01 10:12:38,"Lance Norskog" wrote: >> >>>If you are copying from an indexer while you are indexing new content, >>>this would cause contention for the disk head. Does indexing slow down >>>during this period? >>> >>>Lance >>> >>>2010/10/31 Peter Karich : we have an identical-sized index and it takes ~5minutes > It takes about one hour to replacate 6G index for solr in my env. But my > network can transfer file about 10-20M/s using scp. So solr's http > replcation is too slow, it's normal or I do something wrong? > >>> >>> >>> >>>-- >>>Lance Norskog >>>goks...@gmail.com >> >> >> > > > >-- >Lance Norskog >goks...@gmail.com