Query results vs. facets results
Hello, I am new to Solr and I running some tests with our data in Solr. I am using version 3.6 and the data is imported form a DB2 database using Solr's DIH. We have defined a single entity in the db-data-config.xml, which is an equivalent of the following query: The ID in NAME_CONNECTIONS is not unique, so it might appear multiple times. For the unique ID in the schema, we are using a solr.UUIDField: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true yields134 as a result, which is exactly what we expect. On the other hand, running http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true yields 103 I would expect to have the same number (134) in this facet result as the previous filter result. Could you please let me know why these two results are different? Thank you, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4 Alpha Out Of Mem Err
Do you have the following hard autoCommit in your config (as the stock server does)? 15000 false This is now fairly important since Solr now tracks information on every uncommitted document added. At some point we should probably hardcode some mechanism based on number of documents or time. -Yonik http://lucidimagination.com
Re: Groups count in distributed grouping is wrong in some case
Hi, I'm using SOLR 4.x from trunk. This was the version from 2012-07-10. So this is one of the latest versions. I searched mailing list and jira but found only this https://issues.apache.org/jira/browse/SOLR-3436 It was committed in May to trunk so my version of SOLR has this fix. But the problem still exists. Cheers Agnieszka 2012/7/15 Erick Erickson > what version of Solr are you using? There's been quite a bit of work > on this lately, > I'm not even sure how much has made it into 3.6. You might try searching > the > JIRA list, Martijn van Groningen has done a bunch of work lately, look for > his name. Fortunately, it's not likely to get a bunch of false hits .. > > Best > Erick > > On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz > wrote: > > Hi, > > > > I have problem with faceting count in distributed grouping. It appears > only > > when I make query that returns almost all of the documents. > > > > My SOLR implementation has 4 shards and my queries looks like: > > > > http://host:port > > > /select/q?=*:*&shards=shard1,shard2,shard3,shard4&group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1 > > > > With query like above I get strange counts for field category1. > > The counts for values are very big: > > 9659 > > 7015 > > 5676 > > 1180 > > 1105 > > 979 > > 770 > > 701 > > 612 > > 422 > > 358 > > > > When I make query to narrow the results adding to query > > fq=category1:"val1", etc. I get different counts than facet category1 > shows > > for a few first values: > > > > fq=category1:"val1" - counts: 22 > > fq=category1:"val2" - counts: 22 > > fq=category1:"val3" - counts: 21 > > fq=category1:"val4" - counts: 19 > > fq=category1:"val5" - counts: 19 > > fq=category1:"val6" - counts: 20 > > fq=category1:"val7" - counts: 20 > > fq=category1:"val8" - counts: 25 > > fq=category1:"val9" - counts: 422 > > fq=category1:"val10" - counts: 358 > > > > From val9 the count is ok. > > > > First I thought that for some values in facet "category1" groups count > does > > not work and it returns counts of all documents not group by field id. > > But the number of all documents matches query fq=category1:"val1" is > > 45468. So the numbers are not the same. > > > > I check the queries on each shard for val1 and the results are: > > > > shard1: > > query: > > > http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1 > > > > > > 11 > > > > query: > > > http://shard1/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1 > > :"val1" > > > > shard 2: > > query: > > > http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1 > > > > there is no value "val1" in category1 facet. > > > > query: > > > http://shard2/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1 > > :"val1" > > > > 7 > > > > shard3: > > query: > > > http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1 > > > > there is no value val1 in category1 facet > > > > query: > > > http://shard3/select/?q=*:*group=true&group.field=id&group.facet=true&group.ngroups=true&facet.field=category1&facet.missing=false&facet.mincount=1&fq=category1 > > :"val1" > > > > 4 > > > > So it looks that detail query with fq=category1:"val1" returns the > relevant > > results. But Solr has problem with faceting counts when one of the shard > > does not return the faceting value (in this scenario "val1") that exists > on > > other shards. > > > > I checked shards for "val10" and I got: > > > > shard1: count for val10 - 142 > > shard2: count for val10 - 131 > > shard3: count for val10 - 149 > > sum of counts 422 - ok. > > > > I'm not sure how to resolve that situation. For sure the counts of val1 > to > > val9 should be different and they should not be on the top of the > category1 > > facet because this is very confusing. Do you have any idea how to fix > this > > problem? > > > > Best regards > > Agnieszka >
Lost answers?
Dear Solr Users, I have a solr3.6 + Tomcat and I have a program that connect 4 http requests at the same time. I must do 1902 requests. I do several tests but each time it losts some requests: - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs. With Jetty, I get always 1902 docs. As it's a dev' environment, I'm alone to test it. Is it a problem to do 4 requests at the same time for tomcat6? thanks for your info, Bruno
Re: Facet on all the dynamic fields with *_s feature
The answer appears to be "No", but it's good to hear people express an interest in proposed features. -- Jack Krupansky -Original Message- From: Rajani Maski Sent: Sunday, July 15, 2012 12:02 AM To: solr-user@lucene.apache.org Subject: Facet on all the dynamic fields with *_s feature Hi All, Is this issue fixed in solr 3.6 or 4.0: Faceting on all Dynamic field with facet.field=*_s Link : https://issues.apache.org/jira/browse/SOLR-247 If it is not fixed, any suggestion on how do I achieve this? My requirement is just same as this one : http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none Regards Rajani
Solr - Spatial Search for Specif Areas on Map
Hi, I am new to Solr Spatial Search and would like to understand if Solr can be used successfully for very large data sets in the range of 4Billion records. I need to search some filtered data based on a region - maybe a set of lat/lons or polygon area. is that possible in solr? How fast is it with such data size? Will it be able to handle the load for 1 req/sec? If so, how? Do you think solr can beat the performance of PostGIS? As I am about to choose the right technology for my new project, I need some expert comments from the community. Regards Sam -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR 4 Alpha Out Of Mem Err
> Do you have the following hard autoCommit in your config (as the stock server does)? > > 15000 > false > I have tried with and without that setting. When I described running with auto commit, that setting is what I mean. I have varied the time in the range 10,000-60,000 msec. I have tried this setting with and without soft commit in the server config file. I have tried without this setting, but specifying the commit within time in the solrj client in the add method. In both these cases, the client seems to overrun the server and out of memory in the server results. One clarification I should make is that after the server gets out of memory, the solrj client does NOT receive an error. However, the documents indexed do not reliably appear to queries. Approach #3 is to remove the autocommit in the server config, issue the add method without commit within, but issue commits in the solrj client with wait for sync and searcher set to true. In case #3, I do not see the out of memory in the server. However, document index rates are restricted to about 1,000 per second. Nick -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, July 15, 2012 5:15 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4 Alpha Out Of Mem Err Do you have the following hard autoCommit in your config (as the stock server does)? 15000 false This is now fairly important since Solr now tracks information on every uncommitted document added. At some point we should probably hardcode some mechanism based on number of documents or time. -Yonik http://lucidimagination.com
Re: Query results vs. facets results
q and fq queries don't necessarily run through the same query parser, see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting So try adding &debugQuery=on to both queries you submitted. My guess is that if you look at the parsed queries, you'll see something that explains your differences. If not, paste the results back and we can take a look. BTW, ignore all the "explain" bits for now, the important bit is the parsed form of q and fq in your queries. Best Erick On Sat, Jul 14, 2012 at 5:11 AM, tudor wrote: > Hello, > > I am new to Solr and I running some tests with our data in Solr. We are > using version 3.6 and the data is imported form a DB2 database using Solr's > DIH. We have defined a single entity in the db-data-config.xml, which is an > equivalent of the following query: > query=" > SELECT C.NAME, >F.CITY > FROM > NAME_CONNECTIONS AS C > JOIN NAME_DETAILS AS F > ON C.DETAILS_NAME = F.NAME" >> > > > This might lead to some names appearing multiple times in the result set. > This is OK. > > For the unique ID in the schema, we are using a solr.UUIDField: > > > stored="true" default="NEW"/ > > All the searchable fields are declared as indexed and stored. > > I am aware of the fact that this is a very crude configuration, but for the > tests that I am running it is fine. > > The problem that I have is the different result counts that I receive when I > do equivalent queries for searching and faceting. For example, running the > following query > > http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=100&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.ngroups=true&group.truncate=true > > yields > >134 > > as a result, which is exactly what we expect. > > On the other hand, running > > http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true > > yields > > > > > > 103 > > I would expect to have the same number (134) in this facet result as well. > Could you please let me know why these two results are different? > > Thank you, > Tudor > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3994988.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?
Anything currently in the trunk will most probably be in the BETA and in the eventual release. So I'd go with the trunk code. It'll always be closer to the actual release than ALPHA or BETA I know there've been some changes recently around, exactly the "collection" name. In fact there's a discussion about rearranging the whole example directory Best Erick On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla wrote: > Hi, > > Is it intentional that the ALPHA release has a different folder structure > as opposed to the trunk? > > eg. collection1 folder is missing in the ALPHA, but present in branch_4x > and trunk > > lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl > 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl > lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl > > > This has consequences for development - e.g. solr testcases do not expect > that the collection1 is there for ALPHA. > > In general, what is your advice for developers who are upgrading from solr > 3.x to solr 4.x? What codebase should we follow to minimize the pain of > porting to the next BETA and stable releases? > > Thanks! > > roman
Re: Metadata and FullText, indexed at different times - looking for best approach
You've got a couple of choices. There's a new patch in town https://issues.apache.org/jira/browse/SOLR-139 that allows you to update individual fields in a doc if (and only if) all the fields in the original document were stored (actually, all the non-copy fields). So if you're storing (stored="true") all your metadata information, you can just update the document when the text becomes available assuming you know the uniqueKey when you update. Under the covers, this will find the old document, get all the fields, add the new fields to it, and re-index the whole thing. Otherwise, your fallback idea is a good one. Best Erick On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch wrote: > Hello, > > I have a database of metadata and I can inject it into SOLR with DIH > just fine. But then, I also have the documents to extract full text > from that I want to add to the same records as additional fields. I > think DIH allows to run Tika at the ingestion time, but I may not have > the full-text files at that point (they could arrive days later). I > can match the file to the metadata by a file name matching a field > name. > > What is the best approach to do that staggered indexing with minimum > custom code? I guess my fallback position is a custom full-text > indexer agent that re-adds the metadata fields when the file is being > indexed. Is there anything better? > > I am a newbie using v4.0alpha of SOLR (and loving it). > > Thank you, > Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book)
Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?
"Anything currently in the trunk ..." I think you mean "Anything in the 4x branch", since "trunk" is 5x by definition. But I'd agree that taking a nightly build or building from the 4x branch is likely to be a better bet than the "old" Alpha. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Sunday, July 15, 2012 11:02 AM To: solr-user@lucene.apache.org Subject: Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade? Anything currently in the trunk will most probably be in the BETA and in the eventual release. So I'd go with the trunk code. It'll always be closer to the actual release than ALPHA or BETA I know there've been some changes recently around, exactly the "collection" name. In fact there's a discussion about rearranging the whole example directory Best Erick On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla wrote: Hi, Is it intentional that the ALPHA release has a different folder structure as opposed to the trunk? eg. collection1 folder is missing in the ALPHA, but present in branch_4x and trunk lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl This has consequences for development - e.g. solr testcases do not expect that the collection1 is there for ALPHA. In general, what is your advice for developers who are upgrading from solr 3.x to solr 4.x? What codebase should we follow to minimize the pain of porting to the next BETA and stable releases? Thanks! roman
Re: SOLR 4 Alpha Out Of Mem Err
Maybe your rate of update is so high that the commit never gets a chance to run. So, maybe all these uncommitted updates are buffered up and using excess memory. Try explicit commits from SolrJ, but less frequently. Or maybe if you just pause your updates periodically (every 30 seconds or so) the auto-commit would get a chance to occur. Although I have no idea how long a pause might be needed. -- Jack Krupansky -Original Message- From: Nick Koton Sent: Sunday, July 15, 2012 10:52 AM To: solr-user@lucene.apache.org ; yo...@lucidimagination.com Subject: RE: SOLR 4 Alpha Out Of Mem Err Do you have the following hard autoCommit in your config (as the stock server does)? 15000 false I have tried with and without that setting. When I described running with auto commit, that setting is what I mean. I have varied the time in the range 10,000-60,000 msec. I have tried this setting with and without soft commit in the server config file. I have tried without this setting, but specifying the commit within time in the solrj client in the add method. In both these cases, the client seems to overrun the server and out of memory in the server results. One clarification I should make is that after the server gets out of memory, the solrj client does NOT receive an error. However, the documents indexed do not reliably appear to queries. Approach #3 is to remove the autocommit in the server config, issue the add method without commit within, but issue commits in the solrj client with wait for sync and searcher set to true. In case #3, I do not see the out of memory in the server. However, document index rates are restricted to about 1,000 per second. Nick -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, July 15, 2012 5:15 AM To: solr-user@lucene.apache.org Subject: Re: SOLR 4 Alpha Out Of Mem Err Do you have the following hard autoCommit in your config (as the stock server does)? 15000 false This is now fairly important since Solr now tracks information on every uncommitted document added. At some point we should probably hardcode some mechanism based on number of documents or time. -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton wrote: >> Do you have the following hard autoCommit in your config (as the stock > server does)? >> >> 15000 >> false >> > > I have tried with and without that setting. When I described running with > auto commit, that setting is what I mean. OK cool. You should be able to run the stock server (i.e. with this autocommit) and blast in updates all day long - it looks like you have more than enough memory. If you can't, we need to fix something. You shouldn't need explicit commits unless you want the docs to be searchable at that point. > Solrj multi-threaded client sends several 1,000 docs/sec Can you expand on that? How many threads at once are sending docs to solr? Is each request a single doc or multiple? -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky wrote: > Maybe your rate of update is so high that the commit never gets a chance to > run. I don't believe that is possible. If it is, it should be fixed. -Yonik http://lucidimagination.com
Re: SOLR 4 Alpha Out Of Mem Err
Agreed. That's why I say "maybe". Clearly something sounds amiss here. -- Jack Krupansky -Original Message- From: Yonik Seeley Sent: Sunday, July 15, 2012 12:06 PM To: solr-user@lucene.apache.org Subject: Re: SOLR 4 Alpha Out Of Mem Err On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky wrote: Maybe your rate of update is so high that the commit never gets a chance to run. I don't believe that is possible. If it is, it should be fixed. -Yonik http://lucidimagination.com
Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?
The beta will have files that where in solr/conf and solr/data in solr/collection1/conf|data instead. What Solr test cases are you referring to? The only ones that should care about this would have to be looking at the file system. If that is the case, simply update the path. The built in tests had to be adjusted for this as well. The problem with having the default core use /solr as a conf dir is that if you create another core, where does it logically go? The default collection is called collection1, so now its conf and data lives in a folder called collection1. A new SolrCore called newsarticles would have it's conf and data in /solr/newsarticles. There are still going to be some bumps as you move from alpha to beta to release if you are depending on very specific file system locations - however, they should be small bumps that are easily handled. Just send an email to the user list if you'd like some help with anything in particular. In this case, I'd update what you have to look at /solr/collection1 rather than simply /solr. It's still the default core, so simple URLs without the core name will still work. It won't affect HTTP communication. Just file system location. On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote: > Hi, > > Is it intentional that the ALPHA release has a different folder structure > as opposed to the trunk? > > eg. collection1 folder is missing in the ALPHA, but present in branch_4x > and trunk > > lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl > 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl > lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl > > > This has consequences for development - e.g. solr testcases do not expect > that the collection1 is there for ALPHA. > > In general, what is your advice for developers who are upgrading from solr > 3.x to solr 4.x? What codebase should we follow to minimize the pain of > porting to the next BETA and stable releases? > > Thanks! > > roman - Mark Miller lucidimagination.com
Re: Index version on slave incrementing to higher than master
Erick, Thank you. I think originally my thought was that if I had my slave configuration really close to my master config, it would be very easy to promote a slave to a master (and vice versa) if necessary. But I think you are correct that ripping out from the slave config anything that would modify an index in any way makes sense. I will give this a try very soon. Thanks again. Andy On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson wrote: > Gotta admit it's a bit puzzling, and surely you want to move to the 3x > versions .. > > But at a guess, things might be getting confused on the slaves given > you have a merge policy on them. There's no reason to have any > policies on the slaves; slaves should just be about copying the files > from the master, all the policies,commits,optimizes should be done on > the master. About all the slave does is copy the current state of the index > from the master. > > So I'd try removing everything but the replication from the slaves, > including > any autocommit stuff and just let replication do it's thing. > > And I'd replicate after the optimize if you keep the optimize going. You > should > end up with one segment in the index after that, on both the master and > slave. > You can't get any more merged than that. > > Of course you'll also copy the _entire_ index every time after you've > optimized... > > Best > Erick > > On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff > wrote: > > Hi, > > > > I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a > > number of solr instances running on it (150 or so), and nightly most of > > them have documents written to them. The script that does these writes > > (adds) does a commit and an optimize on the indexes when it's entirely > > finished updating them, then initiates replication on the slave per > > instance. In this configuration, the index versions between master and > > slave remain in synch. > > > > The optimize portion, which, again, happens nightly, is taking a lot of > > time and I think it's unnecessary. I was hoping to stop doing this > explicit > > optimize, and to let my merge policy handle that. However, if I don't do > an > > optimize, and only do a commit before initiating slave replication, some > > hours later the slave is, for reasons that are unclear to me, > incrementing > > its index version to 1 higher than the master. > > > > I am not really sure I understand the logs, but it looks like the > > incremented index version is the result of an optimize on the slave, but > I > > am never issuing any commands against the slave aside from initiating > > replication, and I don't think there's anything in my solr configuration > > that would be initiating this. I do have autoCommit on with maxDocs of > > 1000, but since I am initiating slave replication after doing a commit on > > the master, I don't think there would ever be any uncommitted documents > on > > the slave. I do have a merge policy configured, but it's not clear to me > > that it has anything to do with this. And if it did, I'd expect to see > > similar behavior on the master (right?). > > > > I have included a snipped from my slave logs that shows this issue. In > this > > snipped index version 1286065171264 is what the master has, > > and 1286065171265 is what the slave increments itself to, which is then > out > > of synch with the master in terms of version numbers. Nothing that I know > > of is issuing any commands to the slave at this time. If I understand > these > > logs (I might not), it looks like something issued an optimize that took > > 1023720ms? Any ideas? > > > > Thanks in advance. > > > > Andy > > > > > > > > Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close > > FINE: Closing Writer DirectUpdateHandler2 > > Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit > > INFO: SolrDeletionPolicy.onCommit: commits:num=2 > > > > > commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm, > > _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4 > > .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt, > > _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm, > > _h5.prx, _h6.prx, _h6.fdt, _h6 > > .fdx] > > > > > commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis, > > _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, segment > > s_h9, _h7.nrm, _h7.tii, _h7.frq] > > Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy > > updateCommits > > INFO: newest commit = 1286065171265 > > Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher > > INFO: Opening Searcher@4ac62082 main > > Jul 12, 2012 12:21:14 PM org.apache.solr.update.DirectUpdateHandler2 > commit > > INFO: end_commit_flush > > Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher warm > > INFO: autowarming Searcher@4ac62082 main from Searcher@48d901f7 mai
Re: Lost answers?
I forgot: I do the request on the uniqueKey field, so each request gets one document Le 15/07/2012 14:11, Bruno Mannina a écrit : Dear Solr Users, I have a solr3.6 + Tomcat and I have a program that connect 4 http requests at the same time. I must do 1902 requests. I do several tests but each time it losts some requests: - sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs. With Jetty, I get always 1902 docs. As it's a dev' environment, I'm alone to test it. Is it a problem to do 4 requests at the same time for tomcat6? thanks for your info, Bruno
RE: SOLR 4 Alpha Out Of Mem Err
>> Solrj multi-threaded client sends several 1,000 docs/sec >Can you expand on that? How many threads at once are sending docs to solr? Is each request a single doc or multiple? I realize, after the fact, that my solrj client is much like org.apache.solr.client.solrj.LargeVolumeTestBase. The number of threads is configurable at run time as are the various commit parameters. Most of the test have been in the 4-16 threads range. Most of my testing has been with the single document SolrServer::add(SolrInputDocument doc )method. When I realized what LargeVolumeTestBase is doing, I converted my program to use the SolrServer::add(Collection docs) method with 100 documents in each add batch. Unfortunately, the out of memory errors still occur without client side commits. If you agree my three approaches to committing are logical, would it be useful for me to try to reproduce this with "example" schema in a small cloud configuration using LargeVolumeTestBase or the like? It will take me a couple days to work it in. Or perhaps this sort of test is already run? Best Nick -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Sunday, July 15, 2012 11:05 AM To: Nick Koton Cc: solr-user@lucene.apache.org Subject: Re: SOLR 4 Alpha Out Of Mem Err On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton wrote: >> Do you have the following hard autoCommit in your config (as the >> stock > server does)? >> >> 15000 >> false >> > > I have tried with and without that setting. When I described running > with auto commit, that setting is what I mean. OK cool. You should be able to run the stock server (i.e. with this autocommit) and blast in updates all day long - it looks like you have more than enough memory. If you can't, we need to fix something. You shouldn't need explicit commits unless you want the docs to be searchable at that point. > Solrj multi-threaded client sends several 1,000 docs/sec Can you expand on that? How many threads at once are sending docs to solr? Is each request a single doc or multiple? -Yonik http://lucidimagination.com
JRockit with SOLR3.4/3.5
We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was consuming more mem but didn't restrict to the max mem allocated to tomcat - jrockit did restrict to max mem). However, JRockit gives an error while using it with SOLR3.4/3.5. Any ideas, why? *** This Message Has Been Sent Using BlackBerry Internet Service from Mobilink ***
Re: Query results vs. facets results
Hi Eric, Thanks for the reply. The query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on yields this in the debug section: CITY:MILTON CITY:MILTON CITY:MILTON CITY:MILTON LuceneQParser in the explain section. There is no information about grouping. Second query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields this in the debug section: * * ID:* ID:* LuceneQParser To be honest, these do not tell me too much. I would like to see some information about the grouping, since I believe this is where I am missing something. In the mean time, I have combined the two queries above, hoping to make some sense out of the results. The following query filters all the entries with the city name "MILTON" and groups together the ones with the same ID. Also, the query facets the entries on city, grouping the ones with the same ID. So the results numbers refer to the number of groups. http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields the same (for me perplexing) results: 284 134 (i.e.: fq says: 134 groups with CITY:MILTON) ... ... 103 (i.e.: faceted search says: 103 groups with CITY:MILTON) I really believe that these different results have something to do with the grouping that Solr makes, but I do not know how to dig into this. Thank you and best regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query results vs. facets results
Hi Erick, Thanks for the reply. The query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on yields this in the debug section: CITY:MILTON CITY:MILTON CITY:MILTON CITY:MILTON LuceneQParser in the explain section. There is no information about grouping. Second query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields this in the debug section: * * ID:* ID:* LuceneQParser To be honest, these do not tell me too much. I would like to see some information about the grouping, since I believe this is where I am missing something. In the mean time, I have combined the two queries above, hoping to make some sense out of the results. The following query filters all the entries with the city name "MILTON" and groups together the ones with the same ID. Also, the query facets the entries on city, grouping the ones with the same ID. So the results numbers refer to the number of groups. http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields the same (for me perplexing) results: 284 134 (i.e.: fq says: 134 groups with CITY:MILTON) ... ... 103 (i.e.: faceted search says: 103 groups with CITY:MILTON) I really believe that these different results have something to do with the grouping that Solr makes, but I do not know how to dig into this. Thank you and best regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query results vs. facets results
Hi Eric, Thanks for the reply. The query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on yields this in the debug section: CITY:MILTON CITY:MILTON CITY:MILTON CITY:MILTON LuceneQParser There is no information about grouping. Second query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields this in the debug section: * * ID:* ID:* LuceneQParser To be honest, these do not tell me too much. I would like to see some information about the grouping, since I believe this is where I am missing something. In the mean time, I have combined the two queries above, hoping to make some sense out of the results. The following query filters all the entries with the city name "MILTON" and groups together the ones with the same ID. Also, the query facets the entries on city, grouping the ones with the same ID. So the results numbers refer to the number of groups. http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields the same (for me perplexing) results: 284 134 (i.e.: fq says: 134 groups with CITY:MILTON) ... ... 103 (i.e.: faceted search says: 103 groups with CITY:MILTON) I really believe that these different results have something to do with the grouping that Solr makes, but I do not know how to dig into this. Thank you and best regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995154.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query results vs. facets results
Hi Erick, Thanks for the reply. The query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on yields this in the debug section: CITY:MILTON CITY:MILTON CITY:MILTON CITY:MILTON LuceneQParser There is no information about grouping. Second query: http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields this in the debug section: * * ID:* ID:* LuceneQParser To be honest, these do not tell me too much. I would like to see some information about the grouping, since I believe this is where I am missing something. In the mean time, I have combined the two queries above, hoping to make some sense out of the results. The following query filters all the entries with the city name "MILTON" and groups together the ones with the same ID. Also, the query facets the entries on city, grouping the ones with the same ID. So the results numbers refer to the number of groups. http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on yields the same (for me perplexing) results: 284 134 (i.e.: fq says: 134 groups with CITY:MILTON) ... ... 103 (i.e.: faceted search says: 103 groups with CITY:MILTON) I really believe that these different results have something to do with the grouping that Solr makes, but I do not know how to dig into this. Thank you and best regards, Tudor -- View this message in context: http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JRockit with SOLR3.4/3.5
Hello, Salman, It would probably be helpful if you included the text/stack trace of the error you're encountering, plus any other pertinent system information you can think of. One thing to remember is the memory usage you tune with Xmx is only the maximum size of the heap, and there are other types of memory usage by the JVM that don't fall under that (Permgen space, memory mapped files, etc). Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Sun, Jul 15, 2012 at 3:19 PM, Salman Akram wrote: > We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was > consuming more mem but didn't restrict to the max mem allocated to tomcat - > jrockit did restrict to max mem). However, JRockit gives an error while using > it with SOLR3.4/3.5. Any ideas, why? > > *** This Message Has Been Sent Using BlackBerry Internet Service from > Mobilink ***
Re: SOLR 4 Alpha Out Of Mem Err
"unable to create new native thread" That suggests you're running out of threads, not RAM. Possibly you're using a multithreaded collector, and it's pushing you over the top of how many threads your OS lets a single process allocate? Or somehow the thread stack size is set too high? More here: http://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-support Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Sun, Jul 15, 2012 at 2:45 PM, Nick Koton wrote: >>> Solrj multi-threaded client sends several 1,000 docs/sec > >>Can you expand on that? How many threads at once are sending docs to solr? > Is each request a single doc or multiple? > I realize, after the fact, that my solrj client is much like > org.apache.solr.client.solrj.LargeVolumeTestBase. The number of threads is > configurable at run time as are the various commit parameters. Most of the > test have been in the 4-16 threads range. Most of my testing has been with > the single document SolrServer::add(SolrInputDocument doc )method. When I > realized what LargeVolumeTestBase is doing, I converted my program to use > the SolrServer::add(Collection docs) method with 100 > documents in each add batch. Unfortunately, the out of memory errors still > occur without client side commits. > > If you agree my three approaches to committing are logical, would it be > useful for me to try to reproduce this with "example" schema in a small > cloud configuration using LargeVolumeTestBase or the like? It will take me > a couple days to work it in. Or perhaps this sort of test is already run? > > Best > Nick > > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley > Sent: Sunday, July 15, 2012 11:05 AM > To: Nick Koton > Cc: solr-user@lucene.apache.org > Subject: Re: SOLR 4 Alpha Out Of Mem Err > > On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton wrote: >>> Do you have the following hard autoCommit in your config (as the >>> stock >> server does)? >>> >>> 15000 >>> false >>> >> >> I have tried with and without that setting. When I described running >> with auto commit, that setting is what I mean. > > OK cool. You should be able to run the stock server (i.e. with this > autocommit) and blast in updates all day long - it looks like you have more > than enough memory. If you can't, we need to fix something. You shouldn't > need explicit commits unless you want the docs to be searchable at that > point. > >> Solrj multi-threaded client sends several 1,000 docs/sec > > Can you expand on that? How many threads at once are sending docs to solr? > Is each request a single doc or multiple? > > -Yonik > http://lucidimagination.com >
Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?
I am using AbstractSolrTestCase (which in turn uses solr.util.TestHarness) as a basis for unittests, but the solr installation is outside of my source tree and I don't want to duplicate it just to change a few lines (and with the new solr4.0 I hope I can get the test-framework in a jar file, previously that wasn't possible). So in essence, I have to deal with the expected folder structure for all my unittests. The way I make the configuration visible outside the solr standard paths is to get the classloader and add folders to it, this way test extensions for solr without having the same configuration. But I should mimick the folder structure to be compatible. Thanks all for you help, it is much appreciated. roman On Sun, Jul 15, 2012 at 1:46 PM, Mark Miller wrote: > The beta will have files that where in solr/conf and solr/data in > solr/collection1/conf|data instead. > > What Solr test cases are you referring to? The only ones that should care > about this would have to be looking at the file system. If that is the case, > simply update the path. The built in tests had to be adjusted for this as > well. > > The problem with having the default core use /solr as a conf dir is that if > you create another core, where does it logically go? The default collection > is called collection1, so now its conf and data lives in a folder called > collection1. A new SolrCore called newsarticles would have it's conf and data > in /solr/newsarticles. > > There are still going to be some bumps as you move from alpha to beta to > release if you are depending on very specific file system locations - > however, they should be small bumps that are easily handled. > > Just send an email to the user list if you'd like some help with anything in > particular. > > In this case, I'd update what you have to look at /solr/collection1 rather > than simply /solr. It's still the default core, so simple URLs without the > core name will still work. It won't affect HTTP communication. Just file > system location. > > On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote: > >> Hi, >> >> Is it intentional that the ALPHA release has a different folder structure >> as opposed to the trunk? >> >> eg. collection1 folder is missing in the ALPHA, but present in branch_4x >> and trunk >> >> lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl >> 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl >> lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl >> >> >> This has consequences for development - e.g. solr testcases do not expect >> that the collection1 is there for ALPHA. >> >> In general, what is your advice for developers who are upgrading from solr >> 3.x to solr 4.x? What codebase should we follow to minimize the pain of >> porting to the next BETA and stable releases? >> >> Thanks! >> >> roman > > - Mark Miller > lucidimagination.com > > > > > > > > > > >
are stopwords indexed?
Hi all, are stopwords from the stopwords.txt config file supposed to be indexed? I would say no, but this is the situation I am observing on my Solr instance: * I have a bunch of stopwords in stopwords.txt * my fields are of fieldType "text" from the example schema.xml, i.e. I have -- -- >8 -- -- >8 -- -- >8 -- -- >8 [...] [...] [...] -- -- >8 -- -- >8 -- -- >8 -- -- >8 * searching for a stopwords thru solr gives always zero results * inspecting the index with LuCLI http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html show that all stopwords are in my index. Note that I query LuCLI specifying the field, i.e. with "myFieldName:and" and not just with the stopword "and". Is this normal? Are stopwords indexed? Cheers, Giovanni
Re: Solr - Spatial Search for Specif Areas on Map
Sam, These are big numbers you are throwing around, especially the query volume. How big are these records that you have 4 billion of -- or put another way, how much space would it take up in a pure form like in CSV? And should I assume the searches you are doing are more than geospatial? In any case, a Solr solution here is going to involve many machines. The biggest number you propose is 10k queries per second which is hard to imagine. I've seen some say Solr 4 might have 100M records per shard, although there is a good deal variability -- as usual, YMMV. But lets go with that for this paper-napkin calculation. You would need 40 shards of 100M documents each to get to 4000M (4B) documents. That is a lot of shards, but people have done it, I believe. This scales out to your document collection but not up to your query volume which is extremely high. I have some old benchmarks suggesting ~10ms geo queries on spatial queries for SOLR-2155 which was rolled into the spatial code in Lucene 4 (Solr adapters are on the way). But for a full query overhead and for a safer estimate, lets say 50ms. So perhaps you might get 20 concurrent queries per second (which seems high but we'll go with it). But you require 10k/sec(!) so this means you need 500 times the 20qps which means 500 *times* the base hardware to support the 40 shards I mentioned before. In other words, the 4B documents need to be replicated 500 times to support 10k/second queries. So theoretically, we're talking 500 clusters, each cluster having 40 shards -- at ~4 shards/machine this is 10 machines per cluster: 5,000 machines in total. Wow. Doesn't seem realistic. If you have a reference to some system or person's experience with any system that can, Solr or not, then please share. If you or anyone were to attempt to see if Solr scale's for their needs, a good approach is to consider just one shard non-replicated, or even better a handful that would all exist on one machine. Optimize it as much as you can. Then see how much data you can put on this machine and with what query-volume. From this point, it's basic math to see how many more such machines are required to scale out to your data size and up to your query volume. Care to explain why so much data needs to be searched at such a volume? Maybe you work for Google ;-) To your question on scalability vs PostGIS, I think Solr shines in its ability to scale out if you have the resources to do it. ~ David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995197.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard query vs facet.prefix for autocomplete?
I'm about to implement an autocomplete mechanism for my search box. I've read about some of the common approaches, but I have a question about wildcard query vs facet.prefix. Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that it won't appear if I type 'hadows'. While indexing, I'd use a whitespace tokenizer and a lowercase filter to store that title in the index. Now I'm thinking two approaches for 'dam' typed in the search box: 1) q=title:dam* 2) q=*:*&facet=on&facet.field=title&facet.prefix=dam So any reason that I should favour one over the other? Speed a factor? The index has around 200,000 items. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH - incorrect datasource being picked up by XPathEntityProcessor
Thanks Gora, I tried that but didn't help. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-incorrect-datasource-being-picked-up-by-XPathEntityProcessor-tp3994802p3995211.html Sent from the Solr - User mailing list archive at Nabble.com.