Faceting and Grouping Performance Degradation in Solr 5
I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had to abort due to average response times degraded from a baseline volume performance test. The affected queries involved faceting (both enum method and default) and grouping. There is a critical bug https://issues.apache.org/jira/browse/SOLR-8096 currently open which I gather is the cause of the slower response times. One concern I have is that discussions around the issue offer the suggestion of indexing with docValues which alleviated the problem in at least that one reported case. However, indexing with docValues did not improve the performance in my case. Can someone please confirm or correct my understanding that this issue has no path forward at this time and specifically that it is already known that docValues does not necessarily solve this? Thanks in advance!
Indexing a (File attached to a document)
Hi If I index a document with a file attachment attached to it in solr, can I visualise data of that attached file attachment also while querying that particular document? Please help me on this Thanks & Regards Vidya Nadella -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting and Grouping Performance Degradation in Solr 5
Does anyone know the answer to this? On Wed, May 4, 2016 at 2:19 PM, Solr User wrote: > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had > to abort due to average response times degraded from a baseline volume > performance test. The affected queries involved faceting (both enum method > and default) and grouping. There is a critical bug > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > gather is the cause of the slower response times. One concern I have is > that discussions around the issue offer the suggestion of indexing with > docValues which alleviated the problem in at least that one reported case. > However, indexing with docValues did not improve the performance in my case. > > Can someone please confirm or correct my understanding that this issue has > no path forward at this time and specifically that it is already known that > docValues does not necessarily solve this? > > Thanks in advance! > > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Joel, Thank you for taking the time to respond to my question. I tried the JSON Facet API for one query that uses facet.method=enum (since this one has a ton of unique values and performed better with enum) but this was way slower than even the slower Solr 5 times. I did not try the new API with the non-enum queries though so I will give that a go. It looks like Solr 5.5.1 also has a facet.method=uif which will be interesting to try. If these do not prove helpful, it looks like I will need to wait for SOLR-8096 to be resolved before upgrading. Thanks also for your comment on top_fc for the CollapsingQParser. I use collapse/expand for some queries but traditional grouping for others due to performance. It will be interesting to see if those grouping queries perform better now using CollapsingQParser with top_fc. On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein wrote: > Yes, SOLR-8096 is the issue here. > > I don't believe indexing with docValues is going to help too much with > this. The enum slowness may not be related, but I'm not positive about > that. > > The major slowdowns are likely due to the removal of the top level > FieldCache from general use and the removal of the FieldValuesCache which > was used for multi-value field faceting. > > The JSON facet API covers all the functionality in the traditional > faceting, and it has been developed to be very performant. > > You may also want to see if Collapse/Expand can meet your applications > needs rather Grouping. It allows you to specify using a top level > FieldCache if performance is a blocker without it. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, May 18, 2016 at 10:42 AM, Solr User wrote: > > > Does anyone know the answer to this? > > > > On Wed, May 4, 2016 at 2:19 PM, Solr User wrote: > > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but > > had > > > to abort due to average response times degraded from a baseline volume > > > performance test. The affected queries involved faceting (both enum > > method > > > and default) and grouping. There is a critical bug > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > > > gather is the cause of the slower response times. One concern I have > is > > > that discussions around the issue offer the suggestion of indexing with > > > docValues which alleviated the problem in at least that one reported > > case. > > > However, indexing with docValues did not improve the performance in my > > case. > > > > > > Can someone please confirm or correct my understanding that this issue > > has > > > no path forward at this time and specifically that it is already known > > that > > > docValues does not necessarily solve this? > > > > > > Thanks in advance! > > > > > > > > > > > >
Re: Indexing a (File attached to a document)
Hi I am using MapReduceIndexer Tool to index data from hdfs , using morphlines as ETL tool. Specifying data path as xpath's in morphline file. sorry for delay -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting and Grouping Performance Degradation in Solr 5
Thanks again for your work on honoring the facet.method. I have an observation that I would like to share and get your feedback on if possible. I performance tested Solr 5.5.2 with various facet queries and the only way I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? Here are the details. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 5.5.2 (without deletes): 125 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 5.5.2 (without deletes): 42 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < abenede...@apache.org> wrote: > Interesting developments : > > https://issues.apache.org/jira/browse/SOLR-9176 > > I think we found why term Enum seems slower in recent Solr ! > In our case it is likely to be related to the commit I mention in the Jira. > Have a check Joel ! > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > abenede...@apache.org> wrote: > > > I am investigating this scenario right now. > > I can confirm that the enum slowness is in Solr 6.0 as well. > > And I agree with Joel, it seems to be un-related with the famous faceting > > regression :( > > > > Furthermore with the legacy facet approach, if you set docValues for the > > field you are not going to be able to try the enum approach anymore. > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > // only fc can handle docvalues types > > method = FacetMethod.FC; > > } > > > > > > I got really horrible regressions simply using term enum in both Solr 4 > > and Solr 6. > > > > And even the most optimized fcs approach with docValues and > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > i.e. > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > I think we should open an issue if we can confirm it is not related with > > the other. > > A lot of people will continue using the legacy approach for a while... > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein > > wrote: > > > >> The enum slowness is interesting. It would appear on the surface to not > be > >> related to the FieldCache issue. I don't think the main emphasis of the > >> JSON facet API has been the enum approach. You may find using the JSON > >> facet API and eliminating the use of enum meets your performance needs. > >> > >> With the CollapsingQParserPlugin top_fc is definitely faster during > >> queries. The tradeoff is slower warming times and increased memory usage > >> if > >> the collapse fields are used in faceting, as faceting will load the > field > >> into a different cache. > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User wrote: > >> > >> > Joel, > >> > > >> > Thank you for taking the time to respond to my question. I tried the > >> JSON > >> > Facet API for one query that uses facet.method=enum (since this one > has > >> a > >> > ton of unique values and performed better with enum) but this was way > >> > slower than even the slower Solr 5 times. I did not try the new API > >> with > >> > the non-enum queries though so I will give that a go. It looks like > >> Solr > >> > 5.5.1 also has a facet.method=uif which will be interesting to try. > >> > > >> > If these do not prove helpful, it looks like I will need to wait for > >> > SOLR-8096 to be resolved before upgrading. > >> > > >> > Thanks also for your comment on top_fc for the CollapsingQParser. I > use > >> > collapse/expand for some queries but traditional grouping for others > >> due to > >> > performance. It will be interesting to see if those grouping queries > >> > perform better now using CollapsingQParser with top_fc. > >> > > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein > >> > wrote: > >> > > >> > > Yes, SOLR-8096 is the issue her
Re: Faceting and Grouping Performance Degradation in Solr 5
Further testing indicates that any performance difference is not due to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes. The times appear to converge on an optimized index. Below are the details. Not sure what else to make of this at this point other than moving forward with an upgrade with an optimized index wherever possible. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 4.8.1 (without deletes): 104 ms 5.5.2 (without deletes): 125 ms 4.8.1 (1 segment without deletes): 55 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 4.8.1 (without deletes): 35 ms 5.5.2 (without deletes): 42 ms 4.8.1 (1 segment without deletes): 28 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti wrote: > Hi ! > At the time we didn't investigate the deletion implication at all. > This can be interesting. > if you proceed with your investigations and discover what changed in the > deletion approach, I would be more than happy to help! > > Cheers > > On Mon, Sep 26, 2016 at 10:59 PM, Solr User wrote: > > > Thanks again for your work on honoring the facet.method. I have an > > observation that I would like to share and get your feedback on if > > possible. > > > > I performance tested Solr 5.5.2 with various facet queries and the only > way > > I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it > > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? > > Here are the details. > > > > Scenario #1: Using facet.method=uif with faceting on several > multi-valued > > fields. > > 4.8.1 (with deletes): 115 ms > > 5.5.2 (with deletes): 155 ms > > 5.5.2 (without deletes): 125 ms > > 5.5.2 (1 segment without deletes): 44 ms > > > > Scenario #2: Using facet.method=enum with faceting on several > multi-valued > > fields. These fields are different than Scenario #1 and perform much > > better with enum hence that method is used instead. > > 4.8.1 (with deletes): 38 ms > > 5.5.2 (with deletes): 49 ms > > 5.5.2 (without deletes): 42 ms > > 5.5.2 (1 segment without deletes): 34 ms > > > > > > > > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < > > abenede...@apache.org> wrote: > > > > > Interesting developments : > > > > > > https://issues.apache.org/jira/browse/SOLR-9176 > > > > > > I think we found why term Enum seems slower in recent Solr ! > > > In our case it is likely to be related to the commit I mention in the > > Jira. > > > Have a check Joel ! > > > > > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > > > abenede...@apache.org> wrote: > > > > > > > I am investigating this scenario right now. > > > > I can confirm that the enum slowness is in Solr 6.0 as well. > > > > And I agree with Joel, it seems to be un-related with the famous > > faceting > > > > regression :( > > > > > > > > Furthermore with the legacy facet approach, if you set docValues for > > the > > > > field you are not going to be able to try the enum approach anymore. > > > > > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > > > // only fc can handle docvalues types > > > > method = FacetMethod.FC; > > > > } > > > > > > > > > > > > I got really horrible regressions simply using term enum in both > Solr 4 > > > > and Solr 6. > > > > > > > > And even the most optimized fcs approach with docValues and > > > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > > > > > i.e. > > > > > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > > > I think we should open an issue if we can confirm it is not related > > with > > > > the other. > > > > A lot of people will continue using the legacy approach for a > while... > > > > > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein > > > > > wrote: > > > > > > > >> The en
Re: Faceting and Grouping Performance Degradation in Solr 5
Certainly. And I would of course welcome anyone else to test this for themselves especially with facet.method=uif to see if that has indeed bridged the gap between Solr 4 and Solr 5. I would be very happy if my testing is invalid due to variance, problem in process, etc. One thing I was pondering is if I should force merge the index to a certain amount of segments because indexing yields a random number of segments and deletions. The only thing stopping me short of doing that were observations of longer Solr 4 times even with more deletions and similar number of segments. We use Soasta as our testing tool. Before testing, load is sent for 10-15 minutes to make sure any Solr caches have stabilized. Then the test is run for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and Scenario #2 tested at 100 req/sec. Each request is different with input being pulled from data files. The requests are repeatable test to test. The numbers posted above are average response times as reported by Soasta. However, respective time differences are supported by Splunk which indexes the Solr logs and Dynatrace which is instrumented on one of the JVM's. The versions are deployed to the same machines thereby overlaying the previous installation. Going Solr 4 to Solr 5, full indexing is run with the same input data. Being in SolrCloud mode, the full indexing comprises of indexing all documents and then deleting any that were not touched. Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not load with a Solr 5 index. Testing Solr 4 after reverting yields the same results as the previous Solr 4 test. On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen wrote: > On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: > > Further testing indicates that any performance difference is not due > > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing > > deletes. > > Sanity check: Could you describe how you test? > > * How many queries do you issue for each test? > * Are each query a new one or do you re-use the same query? > * Do you discard the first X calls? > * Are the numbers averages, medians or something third? > * What do you do about disk cache? > * Are both Solr's on the same machine? > * Do they use the same index? > * Do you alternate between testing 4.8.1 and 5.5.2 first? > > - Toke Eskildsen, State and University Library, Denmark >
Re: Faceting and Grouping Performance Degradation in Solr 5
I plan to re-test this in a separate environment that I have more control over and will share the results when I can. On Wed, Sep 28, 2016 at 3:37 PM, Solr User wrote: > Certainly. And I would of course welcome anyone else to test this for > themselves especially with facet.method=uif to see if that has indeed > bridged the gap between Solr 4 and Solr 5. I would be very happy if my > testing is invalid due to variance, problem in process, etc. One thing I > was pondering is if I should force merge the index to a certain amount of > segments because indexing yields a random number of segments and > deletions. The only thing stopping me short of doing that were > observations of longer Solr 4 times even with more deletions and similar > number of segments. > > We use Soasta as our testing tool. Before testing, load is sent for 10-15 > minutes to make sure any Solr caches have stabilized. Then the test is run > for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and > Scenario #2 tested at 100 req/sec. Each request is different with input > being pulled from data files. The requests are repeatable test to test. > > The numbers posted above are average response times as reported by > Soasta. However, respective time differences are supported by Splunk which > indexes the Solr logs and Dynatrace which is instrumented on one of the > JVM's. > > The versions are deployed to the same machines thereby overlaying the > previous installation. Going Solr 4 to Solr 5, full indexing is run with > the same input data. Being in SolrCloud mode, the full indexing comprises > of indexing all documents and then deleting any that were not touched. > Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not > load with a Solr 5 index. Testing Solr 4 after reverting yields the same > results as the previous Solr 4 test. > > > On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen > wrote: > >> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >> > Further testing indicates that any performance difference is not due >> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >> > deletes. >> >> Sanity check: Could you describe how you test? >> >> * How many queries do you issue for each test? >> * Are each query a new one or do you re-use the same query? >> * Do you discard the first X calls? >> * Are the numbers averages, medians or something third? >> * What do you do about disk cache? >> * Are both Solr's on the same machine? >> * Do they use the same index? >> * Do you alternate between testing 4.8.1 and 5.5.2 first? >> >> - Toke Eskildsen, State and University Library, Denmark >> > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Below is some further testing. This was done in an environment that had no other queries or updates during testing. We ran through several scenarios so I pasted this with HTML formatting below so you may view this as a table. Sorry if you have to pull this out into a different file for viewing, but I did not want the formatting to be messed up. The times are average times in milliseconds. Same test methodology as above except there was a 5 minute warmup and a 15 minute test. Note that both the segment and deletions were recorded from only 1 out of 2 of the shards so we cannot try to extrapolate a function between them and the outcome. In other words, just view them as "non-optimized" versus "optimized" and "has deletions" versus "no deletions". The only exceptions are the 0 deletes were true for both shards and the 1 segment and 8 segment cases were true for both shards. A few of the tests were repeated as well. The only conclusion that I could draw is that the number of segments and the number of deletes appear to greatly influence the response times, at least more than any difference in Solr version. There also appears to be some external contributor to variancemaybe network, etc. Thoughts? Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted Docs578735787317695859369459369457873578735787357873Segment Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario #119821014518619020820921020610914273701601098385Scenario #29288596258727077746873636166545251 On Wed, Sep 28, 2016 at 4:44 PM, Solr User wrote: > I plan to re-test this in a separate environment that I have more control > over and will share the results when I can. > > On Wed, Sep 28, 2016 at 3:37 PM, Solr User wrote: > >> Certainly. And I would of course welcome anyone else to test this for >> themselves especially with facet.method=uif to see if that has indeed >> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >> testing is invalid due to variance, problem in process, etc. One thing I >> was pondering is if I should force merge the index to a certain amount of >> segments because indexing yields a random number of segments and >> deletions. The only thing stopping me short of doing that were >> observations of longer Solr 4 times even with more deletions and similar >> number of segments. >> >> We use Soasta as our testing tool. Before testing, load is sent for >> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >> with input being pulled from data files. The requests are repeatable test >> to test. >> >> The numbers posted above are average response times as reported by >> Soasta. However, respective time differences are supported by Splunk which >> indexes the Solr logs and Dynatrace which is instrumented on one of the >> JVM's. >> >> The versions are deployed to the same machines thereby overlaying the >> previous installation. Going Solr 4 to Solr 5, full indexing is run with >> the same input data. Being in SolrCloud mode, the full indexing comprises >> of indexing all documents and then deleting any that were not touched. >> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >> results as the previous Solr 4 test. >> >> >> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen >> wrote: >> >>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >>> > Further testing indicates that any performance difference is not due >>> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >>> > deletes. >>> >>> Sanity check: Could you describe how you test? >>> >>> * How many queries do you issue for each test? >>> * Are each query a new one or do you re-use the same query? >>> * Do you discard the first X calls? >>> * Are the numbers averages, medians or something third? >>> * What do you do about disk cache? >>> * Are both Solr's on the same machine? >>> * Do they use the same index? >>> * Do you alternate between testing 4.8.1 and 5.5.2 first? >>> >>> - Toke Eskildsen, State and University Library, Denmark >>> >> >> >
ClassNotFoundException with Custom ZkACLProvider
This is mostly just an FYI regarding future work on issues like SOLR-8792. I wanted admin update but world read on ZK since I do not have anything sensitive from a read perspective in the Solr data and did not want to force all SolrCloud clients to implement authentication just for read. So, I extended DefaultZkACLProvider and implemented a replacement for VMParamsAllAndReadonlyDigestZkACLProvider. My custom code is loaded from the sharedLib in solr.xml. However, there is a temporary ZK lookup to read solr.xml (and chroot) which is obviously done before loading sharedLib. Therefore, I am faced with a ClassNotFoundException. This has no negative effect on the ACL functionalityjust the annoying stack trace in the logs. I do not want to package this custom code with the Solr code and do not want to package this along with Solr dependencies in the Jetty lib/ext. So, I am planning to live with the stack trace and just wanted to share this for any future work on the dynamic solr.xml and chroot lookups or in case I am missing some work-around. Thanks!
Re: ClassNotFoundException with Custom ZkACLProvider
For those interested, I ended up bundling the customized ACL provider with the solr.war. I could not stomach looking at the stack trace in the logs. On Mon, Nov 7, 2016 at 4:47 PM, Solr User wrote: > This is mostly just an FYI regarding future work on issues like SOLR-8792. > > I wanted admin update but world read on ZK since I do not have anything > sensitive from a read perspective in the Solr data and did not want to > force all SolrCloud clients to implement authentication just for read. So, > I extended DefaultZkACLProvider and implemented a replacement for > VMParamsAllAndReadonlyDigestZkACLProvider. > > My custom code is loaded from the sharedLib in solr.xml. However, there > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously > done before loading sharedLib. Therefore, I am faced with a > ClassNotFoundException. This has no negative effect on the ACL > functionalityjust the annoying stack trace in the logs. I do not want > to package this custom code with the Solr code and do not want to package > this along with Solr dependencies in the Jetty lib/ext. > > So, I am planning to live with the stack trace and just wanted to share > this for any future work on the dynamic solr.xml and chroot lookups or in > case I am missing some work-around. > > Thanks! > >
Re: Work-around for "indexed without position data"
Sorry for the delay. I was able to reproduce this easily with my setup, but reproducing this on a Solr example proved challenging. Hopefully the work that I did to find the situation in which this is produced will help in resolving the problem. The driving factor for this appears to be how updates are sent to Solr. When sending batches of updates with commits, the problem is reproduced. If the commit is held until after all updates are sent, then no problem is produced. This leads me to believe that this issue has something to do with overlapping commits or index merges. This was reproducible regardless of running classic or managed schema and regardless of running Solr core or SolrCloud. There are not many steps to reproduce this, but you will need a way to send these updates. I have included inline create.sh and create.pl scripts to generate the data and send the updates. You can index a lastModified field or something to convince yourself that everything has been re-indexed. I left that out to keep the steps lean. Also, this test is using commit statements from the client sending the updates for simplicity even though it is not a good practice. My normal setup is using Solrj with commitWithin to allow Solr to manage when the commits take place, but the same error is produced either way. *STEPS TO REPRODUCE* 1. Install Solr 5.5.3 and change to that working directory 2. bin/solr -e techproducts 3. bin/solr stop [Why these next 3 steps? These are to start the index completely new without the 32 example documents as opposed to a delete query. The documents are not posted after the core is detected the second time.] 4. rm -rf ./example/techproducts/solr/techproducts/data/ 5. bin/solr -e techproducts 6. ./create.sh 7. curl -X POST -H 'Content-type:application/json' --data-binary '{ "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, "multiValued":true, "stored":true } }' http://localhost:8983/solr/techproducts/schema 8. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error] 9. ./create.sh 10. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error even though all documents have been re-indexed] *create.sh* #!/bin/bash for i in {1..100}; do echo "$i" ./create.pl $i > ./create.xml$i curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary @./create.xml$i done *create.pl <http://create.pl>* #!/usr/bin/perl my $S = $ARGV[0]; my $I = 100; my $N = $S*$I + $I; my $i; print "\n"; for($i=$S*$I; $i<$N; $i++) { print "SP${i}cat hard drive ${i}\n"; } print "\n"; On Fri, May 26, 2017 at 2:14 AM, Rick Leir wrote: > Can you reproduce this error? What are the steps you take to reproduce it? > ( simple is better). > > cheers -- Rick > > > > On 2017-05-25 05:46 PM, Solr User wrote: > >> This is in regards to changing a field type from string to >> text_en_splitting, re-indexing all documents, even optimizing to give the >> index a chance to merge segments and rewrite itself entirely, and then >> getting this error when running a phrase query: >> java.lang.IllegalStateException: field "blah" was indexed without >> position >> data; cannot run PhraseQuery >> >> I have encountered this issue before and have always done one of the >> following as a work-around: >> 1. Instead of changing the field type on an existing field just create a >> new field and retire the old one. >> 2. Delete the index directory and start from scratch. >> >> These work-arounds are not always ideal. Does anyone know what is holding >> onto that old field type definition? What thinks it is still a string? >> Every document has been re-indexed and I am sure of this because I have a >> time stamp indexed. Is there any other way to get this to work? >> >> For what it is worth, I am running this in SolrCloud mode but I remember >> seeing this issue before SolrCloud was released as well. >> >> >
Anonymous Read?
Is it possible to setup Solr security to allow anonymous query (/select etc.) but restricted access to other permissions as described in https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/ ?
Re: Anonymous Read?
Thanks! The null role value did the trick. I tried this with the predefined permissions and it worked as well. Thanks again! On Tue, Jun 6, 2017 at 2:08 PM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > We usually end security.json with the permissions > >{ > "name":"open_select", > "path":"/select/*", > "role":null}, > { > "name":"all-admin", > "collection":null, > "path":"/*", > "role":"allgen"}, > { > "name":"all-core-handlers", > "path":"/*", > "role":"allgen"}] > } } > > > ...and then assign the "allgen" role to all users > > This allows a select without a login & password, but requires a login & > password for anything else (including the front page of the GUI) > > -Original Message- > From: Solr User [mailto:solr...@gmail.com] > Sent: Tuesday, June 06, 2017 2:27 PM > To: solr-user@lucene.apache.org > Subject: Anonymous Read? > > Is it possible to setup Solr security to allow anonymous query (/select > etc.) but restricted access to other permissions as described in > https://lucidworks.com/2015/08/17/securing-solr-basic- > auth-permission-rules/ > ? >
Re: Work-around for "indexed without position data"
Not sure if it helps beyond the steps to reproduce that I supplied above, but I also see that "Omit Term Frequencies & Positions" is still set on the field according to the LukeRequestHandler: ITS--OF-- On Mon, Jun 5, 2017 at 1:18 PM, Solr User wrote: > Sorry for the delay. I was able to reproduce this easily with my setup, > but reproducing this on a Solr example proved challenging. Hopefully the > work that I did to find the situation in which this is produced will help > in resolving the problem. The driving factor for this appears to be how > updates are sent to Solr. When sending batches of updates with commits, > the problem is reproduced. If the commit is held until after all updates > are sent, then no problem is produced. This leads me to believe that this > issue has something to do with overlapping commits or index merges. This > was reproducible regardless of running classic or managed schema and > regardless of running Solr core or SolrCloud. > > There are not many steps to reproduce this, but you will need a way to > send these updates. I have included inline create.sh and create.pl > scripts to generate the data and send the updates. You can index a > lastModified field or something to convince yourself that everything has > been re-indexed. I left that out to keep the steps lean. Also, this test > is using commit statements from the client sending the updates for > simplicity even though it is not a good practice. My normal setup is using > Solrj with commitWithin to allow Solr to manage when the commits take > place, but the same error is produced either way. > > > *STEPS TO REPRODUCE* > >1. Install Solr 5.5.3 and change to that working directory >2. bin/solr -e techproducts >3. bin/solr stop [Why these next 3 steps? These are to start the >index completely new without the 32 example documents as opposed to a >delete query. The documents are not posted after the core is detected the >second time.] >4. rm -rf ./example/techproducts/solr/techproducts/data/ >5. bin/solr -e techproducts >6. ./create.sh >7. curl -X POST -H 'Content-type:application/json' --data-binary '{ >"replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, >"multiValued":true, "stored":true } }' http://localhost:8983/solr/ >techproducts/schema >8. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error] >9. ./create.sh >10. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error even though all documents have been >re-indexed] > > *create.sh* > #!/bin/bash > for i in {1..100}; do > echo "$i" > ./create.pl $i > ./create.xml$i > curl http://localhost:8983/solr/techproducts/update?commit=true -H > "Content-Type: text/xml" --data-binary @./create.xml$i > done > > *create.pl <http://create.pl>* > #!/usr/bin/perl > my $S = $ARGV[0]; > my $I = 100; > my $N = $S*$I + $I; > my $i; > print "\n"; > for($i=$S*$I; $i<$N; $i++) { >print "SP${i}cat > hard drive ${i}\n"; > } > print "\n"; > > > On Fri, May 26, 2017 at 2:14 AM, Rick Leir wrote: > >> Can you reproduce this error? What are the steps you take to reproduce >> it? ( simple is better). >> >> cheers -- Rick >> >> >> >> On 2017-05-25 05:46 PM, Solr User wrote: >> >>> This is in regards to changing a field type from string to >>> text_en_splitting, re-indexing all documents, even optimizing to give the >>> index a chance to merge segments and rewrite itself entirely, and then >>> getting this error when running a phrase query: >>> java.lang.IllegalStateException: field "blah" was indexed without >>> position >>> data; cannot run PhraseQuery >>> >>> I have encountered this issue before and have always done one of the >>> following as a work-around: >>> 1. Instead of changing the field type on an existing field just create a >>> new field and retire the old one. >>> 2. Delete the index directory and start from scratch. >>> >>> These work-arounds are not always ideal. Does anyone know what is >>> holding >>> onto that old field type definition? What thinks it is still a string? >>> Every document has been re-indexed and I am sure of this because I have a >>> time stamp indexed. Is there any other way to get this to work? >>> >>> For what it is worth, I am running this in SolrCloud mode but I remember >>> seeing this issue before SolrCloud was released as well. >>> >>> >> >
Re: Faceting and Grouping Performance Degradation in Solr 5
I am pleased to report that we are in Production on Solr 5.5.3 with comparable performance to Solr 4.8.1 through leveraging facet.method=uif as well as https://issues.apache.org/jira/browse/SOLR-9176. Thanks to everyone who worked on these! On Mon, Oct 3, 2016 at 3:55 PM, Solr User wrote: > Below is some further testing. This was done in an environment that had > no other queries or updates during testing. We ran through several > scenarios so I pasted this with HTML formatting below so you may view this > as a table. Sorry if you have to pull this out into a different file for > viewing, but I did not want the formatting to be messed up. The times are > average times in milliseconds. Same test methodology as above except there > was a 5 minute warmup and a 15 minute test. > > Note that both the segment and deletions were recorded from only 1 out of > 2 of the shards so we cannot try to extrapolate a function between them and > the outcome. In other words, just view them as "non-optimized" versus > "optimized" and "has deletions" versus "no deletions". The only exceptions > are the 0 deletes were true for both shards and the 1 segment and 8 segment > cases were true for both shards. A few of the tests were repeated as well. > > The only conclusion that I could draw is that the number of segments and > the number of deletes appear to greatly influence the response times, at > least more than any difference in Solr version. There also appears to be > some external contributor to variancemaybe network, etc. > > Thoughts? > > > Date9/29/20169/29/ > 20169/29/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/201610/3/ > 201610/3/201610/3/201610/3/2016Solr > Version5.5.25.5.24.8.14. > 8.14.8.15.5.25.5.25.5.2< > /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873 > 57873176958593694593694 > 578735787357873578730< > /td>00< > /td>0Segment Count3434 td>1827273434< > td>34348811 td>8811 > facet.method=uifYESYESN/A< > td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario > #1198210145186< > td>190208209210206 td>1091427370160 td>1098385Scenario > #29288596258 td>7270777468< > td>7363616654 > 5251 > > > > > On Wed, Sep 28, 2016 at 4:44 PM, Solr User wrote: > >> I plan to re-test this in a separate environment that I have more control >> over and will share the results when I can. >> >> On Wed, Sep 28, 2016 at 3:37 PM, Solr User wrote: >> >>> Certainly. And I would of course welcome anyone else to test this for >>> themselves especially with facet.method=uif to see if that has indeed >>> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >>> testing is invalid due to variance, problem in process, etc. One thing I >>> was pondering is if I should force merge the index to a certain amount of >>> segments because indexing yields a random number of segments and >>> deletions. The only thing stopping me short of doing that were >>> observations of longer Solr 4 times even with more deletions and similar >>> number of segments. >>> >>> We use Soasta as our testing tool. Before testing, load is sent for >>> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >>> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >>> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >>> with input being pulled from data files. The requests are repeatable test >>> to test. >>> >>> The numbers posted above are average response times as reported by >>> Soasta. However, respective time differences are supported by Splunk which >>> indexes the Solr logs and Dynatrace which is instrumented on one of the >>> JVM's. >>> >>> The versions are deployed to the same machines thereby overlaying the >>> previous installation. Going Solr 4 to Solr 5, full indexing is run with >>> the same input data. Being in SolrCloud mode, the full indexing comprises >>> of indexing all documents and then deleting any that were not touched. >>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >>> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >>> results as the previous Solr 4 test. >>> >>> >>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen >>> wrote: >>> >>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >>>>
Work-around for "indexed without position data"
This is in regards to changing a field type from string to text_en_splitting, re-indexing all documents, even optimizing to give the index a chance to merge segments and rewrite itself entirely, and then getting this error when running a phrase query: java.lang.IllegalStateException: field "blah" was indexed without position data; cannot run PhraseQuery I have encountered this issue before and have always done one of the following as a work-around: 1. Instead of changing the field type on an existing field just create a new field and retire the old one. 2. Delete the index directory and start from scratch. These work-arounds are not always ideal. Does anyone know what is holding onto that old field type definition? What thinks it is still a string? Every document has been re-indexed and I am sure of this because I have a time stamp indexed. Is there any other way to get this to work? For what it is worth, I am running this in SolrCloud mode but I remember seeing this issue before SolrCloud was released as well.
does shards.tolerant deal with this scenario?
hi all I have some questions re shards.tolerant=true and timeAllowed=xxx I have seen situations where shards.tolerant=true works; if one of the shards specified in a query is dead, shards.tolerant seems to work and I get results from the non-dead shards However, if one of the shards goes down during the execution of a query, I have to wait for the primary searcher (the solr sending the request to the shards) to timeout, which can last minutes. ie shards.tolerant doesn't seem to work question 1: is timeAllowed "shard-aware"? ie in a sharded query, does this param get used by all the shards specified or does it only get used by the primary searcher? question 2: Since shards.tolerant=true is not helping when a shard goes down during query execution, is there any other way to deal with this? If timeAllowed is shard-aware, I would think that I could use timeAware and the primary searcher would then wait xxx milliseconds and return with whatever the other shards had sent back. Is that correct? thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/does-shards-tolerant-deal-with-this-scenario-tp4125300.html Sent from the Solr - User mailing list archive at Nabble.com.
how do I get search for "fort st john" to match "ft saint john"
I have been using solr for a while but started running across situations where synonyms are required. the example I have is group of city names that look like "Fort Saint John" (a city), in a text field. Users may want to search for "Ft St John" or "Fort St John" or "Ft Saint John" however My attempted solution was to create a type that uses SynonymFilterFactory and a text file of city based synonyms like this: saint,st,ste fort,ft this doesnt work however and I am not sure I understand why. any help appreciated. thx p.s. I am using Solr 4.6.1 and here is the field type definition from the solrconfig.xml: -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
yes, and I can see that (as expected) per the field type: 1. the indexed value is lowercased 2. stripped of non-alpha characters 3. multiple consecutive whitespace is removed 4. trimmed 5. goes thru the SynonymFilterFactory where: a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord" b. the search value of "Marina/Former Ft Ord" is "marina former ft ord" This I already knew. My question wasn't "why" they dont match, it is: how do I get search for "fort st john" to match "ft saint john". ie is there a way to index/search that would allow the search to match. the SynonymFilterFactory during indexing does not create a matching term for "marina former ft ord", which I think it would do if the indexed value was a word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord") (note that my terms/understanding of how this works may be incorrect, hence my request for assistance/understanding) -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Hi Eric. Sorry, been away. The city_index_synonyms.txt file is pretty small as it contains just these two lines: saint,st,ste fort,ft There is nothing at all in the city_query_synonyms.txt file, and it isn't used either. My understanding is that solr would create the appropriate synonym entries in the index and so treat "fort" and "ft" as equal if you have a simple one line schema (that uses the type definition from my original email) and index "fort saint john", does it work for you? i.e. does it return results if you search for "ft st john" and "ft saint john" and "fort st john"? My Solr 4.6.1 instance doesn't. I am wondering if synonyms just don't work for all/some words in a phrase -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Hi Eric. No, that doesnt fix the problem either (I have tested this previously and did so again just now) Since the PatternTokenizerFactory is not tokenizing on whitespace(by design since I want the user to search by phrase), the phrase "marina former fort ord" (for example) does not get turned into four tokens ("marina", "former", "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms for them (by design) the original question remains: is there a tokenizer/plugin that will allow me to synonym words in a unbroken phrase? note: the reason I dont want to tokenize the data by whitespace is that it would cause way to many results to get returned if I, for example, search on "new" or "st" ... However, I still want to be able to include "fort saint john" in the results if the user searches for "ft st john" or "fort st john" or ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
thanks guys. unfortunately the solr that contains this schema/data is in a legacy system that requires the fields to not be changed. we will, hopefully in the near future, be able to look at redesigning the schema. alternatively, I could look at boning up on Java (which I havent used in a long time) and see if I can write a subword synonym plugin of some sort to perform this type of synonyming thanks anyhow. -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr3.4 on tomcat 7.0.23 - hung with error "threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed"
were you able to resolve this issue, and if so how?? I am encountering the same issue in a couple of solr versions (including 4.0 and 4.5) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342p4107286.html Sent from the Solr - User mailing list archive at Nabble.com.
is it possible to consolidate filterquery cache strings
lets say I have a largish set of data (120M docs) and that I am partitioning my data by groups of states (using the state codes) Someone suggested that I could use the following format in my solrconfig.xml when defining the filterqueries work: *:* State:AL State:AK ... State:WY Would that work, and if so how would I know that the cache is being hit? Or do I need to use the following traditional syntax instead: *:* State:AL *:* State:AK ... *:* State:WY any help appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to consolidate filterquery cache strings
note: by partitioning I mean that I have sharded the 120M docs into 9 Solr partitions (each on a separate server) -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to consolidate filterquery cache strings
would not breaking the FQs out by state be faster for warming up the fq caches? -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html Sent from the Solr - User mailing list archive at Nabble.com.
Are there any Java versions we should avoid with Solr
we are currently using Oracle Java 1.7.0_11 23.6-b04 JDK with our Solr 4.6.1 setup I was looking at upgrading to a more recent version but am wondering, are there any versions to avoid? reason I ask is that I see some versions that have GC issues but am not sure how/if Solr is affected by them. 7u40 has bug with "New minimum young generation size is not properly checked by the JVM", and with "Irregular crash or corrupt term vectors in the Lucene libraries" 7u51 has bug with "Memory leak when GCNotifier uses create_from_platform_dependent_str()" -- View this message in context: http://lucene.472066.n3.nabble.com/Are-there-any-Java-versions-we-should-avoid-with-Solr-tp4121164.html Sent from the Solr - User mailing list archive at Nabble.com.
how do I stop queries from being logged in two different log files in Tomcat
hi all. We have a number of solr 1.4x and solr 4.x installations running on tomcat We are trying to standardize the content of our log files so that we can automate log analysis; we dont want to use log4j at this time. In our solr 1.4x installations, the following conf\logging.properties file is correctly logging queries only to our localhost_access_log.xxx.txt files, and tomcat type messages to our catalina.xxx.log files However in our solr 4.x installations, we are seeing solr queries being logged in both our localhost_access_log.xxx.txt files and our catalina.xxx.log files. We dont want the solr queries logged in catalina.xxx.log files since it more than doubles the amount of logging being done and doubles the disk space requirement (which can be huge). Is there a way to configure logging, without using log4j (for now), to only log solr queries to the localhost_access_log.xxx.txt files?? I have looked at various tomcat logging info and dont see how to do it. Any help appreciated. # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. handlers = 1catalina.org.apache.juli.FileHandler, 2localhost.org.apache.juli.FileHandler, 3manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler .handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler # Handler specific properties. # Describes specific configuration info for Handlers. 1catalina.org.apache.juli.FileHandler.level = FINE 1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 1catalina.org.apache.juli.FileHandler.prefix = catalina. 2localhost.org.apache.juli.FileHandler.level = FINE 2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 2localhost.org.apache.juli.FileHandler.prefix = localhost. 3manager.org.apache.juli.FileHandler.level = FINE 3manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 3manager.org.apache.juli.FileHandler.prefix = manager. java.util.logging.ConsoleHandler.level = WARNING java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter # Facility specific properties. # Provides extra control for each logger. org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers = 3manager.org.apache.juli.FileHandler # For example, set the org.apache.catalina.util.LifecycleBase logger to log # each component that extends LifecycleBase changing state: #org.apache.catalina.util.LifecycleBase.level = FINE -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I stop queries from being logged in two different log files in Tomcat
awesome Mike. that does exactly what I want. many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587p4168597.html Sent from the Solr - User mailing list archive at Nabble.com.
confused about how to set a solr query timeout when using tomcat
I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable to do in near term one of my priority fixes tho is to implement some sort of timeout for solr queries that exceed 1000ms (or so); ie if the query takes longer than that, I want to abort that query (returning nothing or an error or whatever) so that solr can process other queries. while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. I know that this version of Solr itself doesn't have a built in timeout mechanism, which leaves me with figuring out what to do (it seems to me that I have to figure out how to get Tomcat to timeout the queries somehow) note that I DID google until my fingers hurt and have not been able to find clear (at least not clear to me) instructions on how do to so Details: 1. the setup uses the DataImportHandler to updates Solr, and updates occur often and can be quite large; we use batchSize="1" and autoCommit="true" with doc size being around 1400 to 1600 bytes. I dont want the timeout to kill the imports of course 2. I tried adding a timeout param to the tomcat configuration but it doesnt work: any thoughts?? can anyone point me in the right direction on how to implement this? any help appreciated. thx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: confused about how to set a solr query timeout when using tomcat
millions of documents per shard, with a number of shards ~40gb index folder size 12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like 4.x does) servers are hosted internally, and are powerful understood. as mentioned, we tuned the bulk of our queries to run very quickly (50ms or less), but we do occasionally see queries (ie internal ones for statistics/tests) that can be excessively long running Basically, we want to be able to enforce how long those long running queries are allowed to run -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171368.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: confused about how to set a solr query timeout when using tomcat
yes, that solr queries continue to run the query on the solr server even after a connection is broken was my understanding and concern as well I was hoping I had overlooked or missed something in Solr or Tomcat documentation that might do the job it is unfortunate if anyone else can think of something, let me know -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171379.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
.schema.IndexSchema û [coreA] Schema name=Helios 1863 [main] INFO org.apache.solr.servlet.SolrDispatchFilter û user.dir=C:\SOLR\helios-4.10.2\Instance\Master 1864 [main] INFO org.apache.solr.servlet.SolrDispatchFilter û SolrDispatchFilter.init() done 1885 [main] INFO org.eclipse.jetty.server.AbstractConnector û Started SocketConnector@0.0.0.0:8086 9895 [qtp618640318-19] INFO org.apache.solr.servlet.SolrDispatchFilter û [admin] webapp=null path=/admin/cores params={indexInfo=false&_=1418236560709&wt=json} status=0 QTime=17 9931 [qtp618640318-19] INFO org.apache.solr.servlet.SolrDispatchFilter û [admin] webapp=null path=/admin/info/system params={_=1418236560885&wt=json} status=0 QTime=2 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
definitely puzzling. am running this on my local box (ie using http://localhost:8086/solr) and it is the only running instance of any solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
log tab shows "No Events available" no errors at all in the CMD console my test version hasnt got any logging changes that are already in the default solr 4.10.2 package some kind of warning or error message would have been helpful... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
my apologies for the lack of clarity our internal name for the project to upgrade solr from 4.0 to 4.10.2 is "helios" and so we named our test folder "heliosearch". I was not even aware of the github project Heliosearch, and nothing we are doing is related to it. to simplify things for this post, we simplified things so that we have one solr instance but two cores; coreX contains the collection1 files/folders as per the downloaded solr 4.10.2 package, while coreA uses the same collection1 files/folders but with schema.xml and solrconfig.xml changes to meet our needs so file and foldername-wise, here is what we did: 1. C:\SOLR\solr-4.10.2.zip\solr-4.10.2\example renamed to C:\SOLR\helios-4.10.2\Master 2. renamed example\solr\collection1 to example\solr\coreX; no files modified here 3. copied example\solr\coreX to example\solr\coreA 4. modified the coreA schema to match our current production schema; ie our field names, etc 5. modified the coreA solrconfig.xml to meet our needs (see below) here are the solrconfig.xml changes we made to coreA 1. 2. 4 3. false 4. false 5. commented out autoCommit section 6. commented out autoSoftCommit section 7. commented out the section 8. 4 9. 10. contains geocluster 11. commented out these sections: here are the schema.xml changes we made to our copy of the downloaded solr 4.10.2 package (aside from replacing the example fields provided in the downloaded solr 4.10.2): 1. 2. removed the example fields provided in the downloaded solr 4.10.2 3. delete various types we dont use in our current schemas 4. added fieldtypes that are in our current solr 4.0 instances 5. added various fieldtypes that are in our current solr 4.0 instances 6. readded the "text" field as apparently required: also note that we are using java "1.7.0_67" and jetty-8.1.10.v20130312 all in all, I dont see anything that we have done that would keep the cores from being discovered. hope that helps. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
small correction; coreX (the one with the unmodified schema.xml and solrconfig.xml) IS seen by solr and appears on the solr admin page, but coreA (which has our modified schema and solrconfig) is found by solr but is not shown in the solr admin page: 1494 [main] INFO org.apache.solr.core.CoresLocator û Looking for core definitions underneath C:\SOLR\helios-4.10.2\Master\solr 1502 [main] INFO org.apache.solr.core.CoresLocator û Found core coreA in C:\SOLR\helios-4.10.2\Master\solr\coreA\ 1502 [main] INFO org.apache.solr.core.CoresLocator û Found core coreX in C:\SOLR\helios-4.10.2\Master\solr\coreX\ 1503 [main] INFO org.apache.solr.core.CoresLocator û Found 2 core definitions -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
yes, have triple checked the schema and solrconfig XML; various tools have indicated the XML is valid no missing types or dupes, and have not disabled the admin handler as mentioned in my most recent response, I can see the coreX core (the renamed and unmodified collection1 core from the downloaded package) and query it with no issues, but coreA (whch has our specific schema and solrconfig changes) is not showing in the admin interface and cannot be queried (I get a 404) both cores are located in the same solr folder. appreciate the suggestions; looks like I will need to gradually move my schema and core changes towards the collection1 content and see where things start working; will take a while...sigh will let you know what I find out. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
Chris, will get the schema and solrconfig ready for uploading. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
I did find out the cause of my problems. Turns out the problem wasn't due to the solrconfig.xml file; it was in the schema.xml file I spent a fair bit of time making my solrconfig closer to the default solrconfig.xml in the solr download; when that didnt get rid of the error I went back to the only other file we had that was different Turns out the line that was causing the problem was the middle line in this location_rpt fieldtype definition: The spatialContextFactory line caused the core to not load even tho no error/warning messages were shown. I missed that extra line somehow; mea culpa. Anyhow, I really appreciate the responses/help I got on this issue. many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4174118.html Sent from the Solr - User mailing list archive at Nabble.com.
what does this "write.lock does not exist" mean??
I looked for messages on the following error but dont see anything in nabble. Does anyone know what this error means and how to correct it?? SEVERE: java.lang.IllegalArgumentException: /var/apache/my-solr-slave/solr/coreA/data/index/write.lock does not exist I also occasionally see error messages about specific index files such as this: SEVERE: null:java.lang.IllegalArgumentException: /var/apache/my_solr-slave/solr/coreA/data/index/_md39_1.del does not exist I am using Solr 4.0.0, with Java 1.7.0_11-b21 and tomcat 7.0.34, running on a 12GB centos box; we have master/slave setup with multiple slave searchers per indexer. any thoughts on this would be appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/what-does-this-write-lock-does-not-exist-mean-tp4175291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 "Found core" but I get "No cores available" in dashboard page
interesting. unfortunately, time to take a break and so will have to deal with this in the new year tho. Merry Christmas and thanks for all the time and effort you guys put in answering all of our questions. It is much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4175423.html Sent from the Solr - User mailing list archive at Nabble.com.
Getting a word count frequency out of a page field
SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term "amplifier" for a field. For some reason it only returns. The things I've tried only return a count for 1 occurrence of the term even though I see the term in the paragraph more than just once. I've tried faceting on the field, "contents" http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 In schema.xml: In solrconfig.xml: filewrapper caseNumber pageNumber documentId contents documentId caseNumber pageNumber documentPageId contents Thanks in advance,
Re: Getting a word count frequency out of a page field
See comments inline below. On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson wrote: > Faceting won't work at all. Its function is to return the count > of the *documents* that a value occurs in, so that's no good > for your use case. > > "I don't know how to issue a proper SOLR query that returns a word count > for > a paragraph of text such as the term "amplifier" for a field. For some > reason it only returns." > > This is really unclear. Are you asking for the word counts of a paragraph > that contains "amplifier"? The number of times "amplifier" appears in > a paragraph? In a document? > I'm looking for the number of times the word or term appears in a paragraph that I'm indexing as the field name "contents". I'm storing and indexing the field name "contents" that contains multiple occurrences of the term/word. However, when I query for that term it only reports that the word/term appeared only once in the field name "contents". > > And why do you want this information anyway? It might be an XY problem. > I want to be able to search for word frequency for a page in a document that has many pages. So I can report to the user that the term/word occurred on page 1 "10" times. The user can click on the result and go right the the page where the word/term appeared most frequently. What do you mean an XY problem? > > Best > Erick > > On Fri, Jan 20, 2012 at 1:06 PM, solr user wrote: > > SOLR reports the term occurrence for terms over all the documents. I am > > having trouble making a query that returns the term occurrence in a > > specific page field called, documentPageId. > > > > I don't know how to issue a proper SOLR query that returns a word count > for > > a paragraph of text such as the term "amplifier" for a field. For some > > reason it only returns. > > > > The things I've tried only return a count for 1 occurrence of the term > even > > though I see the term in the paragraph more than just once. > > > > I've tried faceting on the field, "contents" > > > > > http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count > > > > > > > > 21 > > > > > > > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 0 > > > > > > > > > > > > > > > > > > In schema.xml: > > > indexed="true" /> > > > multiValued="false"/> > > > > In solrconfig.xml: > > > > filewrapper > > caseNumber > > pageNumber > > documentId > > contents > > documentId > > caseNumber > > pageNumber > > documentPageId > > contents > > > > Thanks in advance, >
Re: Getting a word count frequency out of a page field
Thanks for the article. I am indexing each page of a document as if it were a document. I think the answer is to configure SOLR for use of the TermVector Component: http://wiki.apache.org/solr/TermVectorComponent I have not tried it yet, but someone told me on StackExchange forum to try this one. -Melanie On Sun, Jan 22, 2012 at 8:56 PM, Erick Erickson wrote: > Here's Hoss' XY problem writeup: > http://people.apache.org/~hossman/#xyproblem > but this doesn't appear to be that. > > There's no way out of the box that I know of to do what you want. It starts > with the fact that Solr has no clue what a page is in the first place. Or > a paragraph. Or a sentence. So you're really on your own here > Solr only knows about *documents*. If each document is a page, > you can do some stuff with term frequencies etc. But for a larger > document you'll be getting into some pretty low-level analysis > of the data to accomplish this. > > Sorry I can't be more help. > Erick > > On Sun, Jan 22, 2012 at 5:35 PM, solr user wrote: > > See comments inline below. > > > > On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson > > > wrote: > >> > >> Faceting won't work at all. Its function is to return the count > >> of the *documents* that a value occurs in, so that's no good > >> for your use case. > >> > >> "I don't know how to issue a proper SOLR query that returns a word count > >> for > >> a paragraph of text such as the term "amplifier" for a field. For some > >> reason it only returns." > >> > >> This is really unclear. Are you asking for the word counts of a > paragraph > >> that contains "amplifier"? The number of times "amplifier" appears in > >> a paragraph? In a document? > > > > > > I'm looking for the number of times the word or term appears in a > paragraph > > that I'm indexing as the field name "contents". I'm storing and indexing > the > > field name "contents" that contains multiple occurrences of the > term/word. > > However, when I query for that term it only reports that the word/term > > appeared only once in the field name "contents". > > > >> > >> > >> And why do you want this information anyway? It might be an XY problem. > > > > > > I want to be able to search for word frequency for a page in a document > that > > has many pages. So I can report to the user that the term/word occurred > on > > page 1 "10" times. The user can click on the result and go right the the > > page where the word/term appeared most frequently. > > > > What do you mean an XY problem? > > > > > >> > >> > >> Best > >> Erick > >> > >> On Fri, Jan 20, 2012 at 1:06 PM, solr user > wrote: > >> > SOLR reports the term occurrence for terms over all the documents. I > am > >> > having trouble making a query that returns the term occurrence in a > >> > specific page field called, documentPageId. > >> > > >> > I don't know how to issue a proper SOLR query that returns a word > count > >> > for > >> > a paragraph of text such as the term "amplifier" for a field. For some > >> > reason it only returns. > >> > > >> > The things I've tried only return a count for 1 occurrence of the term > >> > even > >> > though I see the term in the paragraph more than just once. > >> > > >> > I've tried faceting on the field, "contents" > >> > > >> > > >> > > http://localhost:8983/solr/select?indent=on&q=*:*&wt=standard&facet=on&facet.field=documentPageId&facet.query=amplifier&facet.sort=lex&facet.missing=on&facet.method=count > >> > > >> > > >> > > >> > 21 > >> > > >> > > >> > > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 1 > >> > 0 > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > In schema.xml: > >> > >> > indexed="true" /> > >> > stored="true" > >> > multiValued="false"/> > >> > > >> > In solrconfig.xml: > >> > > >> > filewrapper > >> > caseNumber > >> > pageNumber > >> > documentId > >> > contents > >> > documentId > >> > caseNumber > >> > pageNumber > >> > documentPageId > >> > contents > >> > > >> > Thanks in advance, > > > > >
Limiting term frequency in a document to a specific term
0 down vote favorite share [fb] share [tw] What is the proper query URL to limit the term frequency to just one term in a document? Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageId&q=documentPageId:49667.3&qt=tvrh&tv.tf=true&tv.fl=contents][1 ] I would like to be able to limit the query to just one term that I know occurs in the document. The documentation for Term Frequency said to specify the following: f.fieldName.tv.tf - Turns on Term Frequency for the fieldName specified. This is in the wiki documentation: http://wiki.apache.org/solr/TermVectorComponent I tried various combinations of the above for the term amplifier in the URL but I could not get it to work. I would appreciate the appropriate syntax for a specific term amplifier.
Re: Limiting term frequency in a document to a specific term
With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery. http://localhost:8983/solr/select/?fl=score,documentPageId&defType=func&q=ttf(contents,amplifiers) where contents is a field name, and amplifiers is text in the field name. Just curious why I get a parse exception for the above syntax. On Monday, January 23, 2012, Ahmet Arslan wrote: >> Below is an example query to search for the term frequency >> in a document, >> but it is returning the frequency for all the terms. >> >> [ >> http://localhost:8983/solr/select/?fl=documentPageId&q=documentPageId:49667.3&qt=tvrh&tv.tf=true&tv.fl=contents][1 >> ] >> >> I would like to be able to limit the query to just one term >> that I know >> occurs in the document. > > I don't fully follow but http://wiki.apache.org/solr/FunctionQuery#tf may be what you want? >
can't use strdist as functionquery?
I want to sort my results by how closely a given resultset field matches a given string. For example, say I am searching for a given product, and the product can be found in many cities including "seattle". I want to sort the results so that results from city of "seattle" are at the top, and all other results below that I thought that I could do so by using strdist as a functionquery (I am using solr 1.4 so I cant directly sort on strdist) but am having problems with the syntax of the query because functionqueries require double quotes and so does strdist. My current query, which fails with an NPE, looks something like this: http://localhost:8080/solr/select?q=(product:"foo") _val_:"strdist("seattle",city,edit)"&sort=score%20asc&fl=product, city, score I have tried various types of URL encoding (ie using %22 instead of double quotes in the strdist function), but no success. Any ideas?? Is there a better way to accomplish this sorting?? -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1023390.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't use strdist in sortiing either?
I tried I also noticed that I am unable to sort by the strdist function: http://localhost:8080/solr/select?q=*:*&sort=strdist("seattle",city,edit)%20desc Am I using the strdist incorrectly? The version of Solr I am using is $Id: CHANGES.txt 903398 2010-01-26 20:21:09Z hossman $ I know it isnt the latest version, but I am constrained by needing to keep to a minimum the number of changes between our current version and the version that accomplishes the task mentioned previously (essentially a binary sort that separates results where the city matches a given criteria from those that dont). Appreciate any help or advice someone can offer on this -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1032200.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't use strdist in sortiing either?
forgot to mention: 1. yes, I upgraded to a version that allows sorting by Functions (thx Grant for the work done on this feature, very cool) 2. when I try to sort by strdist, it doesnt seem to do any sorting; I get the same results if I sort asc or desc, if I change the static string value, if I change the third argument, etc. -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1032231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't use strdist in sorting either?
finally figured out that I can simply escape the quotation marks in the query URL using backslashes to use strdist as a functionquery (sorry all, that should have been a no-brainer) http://10.0.11.54:8994/solr/select?q=(*:*)^0%20_val_:"strdist(\"phoenix\",city,edit)"&fl=score,*&sort=score%20desc however, sorting by the score in this query doesnt work (ie same problem as when sorting by strdist function - results dont change when I go from asc to desc or vice-versa). -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1057056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't use strdist in sorting either?
issue resolved I should have read the documentation with more care; "Calculate the distance between two strings" my city field was a tokenized text field so changing it to string type got things working sorry all -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1058059.html Sent from the Solr - User mailing list archive at Nabble.com.
how to update solr to older 1.5 builds instead of to trunk
please excuse this newbie question, but: I want to upgrade solr to a version but not to the latest version in the trunk (because there are so many changes that I would have to test against, and modify my custom classes for, and behavior changes, and deal with the lucene index change, etc) My thought was to try to look at versions that are post 903398 2010-01-26 20:21:09Z but pre the change in the lucene index. Eventually picking up the version that had the features I wanted but with as few other changes as feasible. I know I could probably apply a bunch of patches but some of the patches seem to rely on other patches which rely on other patches which rely on ... It just seems easier to pick the version that has just the features/patches I want. I have no trouble seeing/using the trunk at http://svn.apache.org/repos/asf/lucene/dev/trunk/ but it only seems to have builds 984777 thru 984832 So where would I find significantly older builds (ie like the one I am currently using - 903398)? I tried using svn on repository http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev/ but get a "Repository moved permanently to '/viewc/lucene/solr/branches/branch-1.5-dev/' message. Any help would be great -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1113863.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to update solr to older 1.5 builds instead of to trunk
Thanks Yonik but http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/CHANGES.txt says that the lucene index has changed " Upgrading from Solr 1.4 -- * The Lucene index format has changed and as a result, once you upgrade, previous versions of Solr will no longer be able to read your indices. In a master/slave configuration, all searchers/slaves should be upgraded before the master. If the master were to be updated first, the older searchers would not be able to read the new index format." not to mention that regression testing is a pain Is there any way to get a set of builds with versions prior to 3.x?? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1114353.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to update solr to older 1.5 builds instead of to trunk
no, once upgraded I wouldnt need to have an older solr read the indexes. misunderstood the note. thx -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-solr-to-older-1-5-builds-instead-of-to-trunk-tp1113863p1115694.html Sent from the Solr - User mailing list archive at Nabble.com.
possible bug in sorting by Function?
I was looking at the ability to sort by Function that was added to solr. For the most part it seems to work. However solr doesn't seem to like to sort by certain functions. For example, this sum works: http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(1,Latitude,Longitude,sum(Latitude,Longitude)) asc but this hsin doesn't work: http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(3959,rad(47.544594),rad(-122.38723),rad(Latitude),rad(Longitude)) and gives me a "Must declare sort field or function" error, pointing to a line in QueryParsing.java. Note that I did apply the SOLR-1297-2.patch supplied by Koji Sekiguchi but it didn't seem to help. I am using solr 903398 2010-01-26 20:21:09Z. Any suggestions appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118235.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: possible bug in sorting by Function?
small typo in last email: second sum should have been hsin, but I notice that the problem also occurs when I leave it as sum -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1118260.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: possible bug in sorting by Function?
problem could be related to some oddity in sum()?? some more examples: note: Latitude and Longitude are fields of type=double works: http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(1,1.0))%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(Latitude,Latitude)%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(rad(Latitude))%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1))%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1.0))%20asc fails: http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1),sum(Latitude,1))%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(sum(Latitude,1.0),sum(Latitude,1.0))%20asc http://10.0.11.54:8994/solr/select?q=*:*&sort=sum(rad(Latitude),rad(Latitude))%20asc -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1120017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: possible bug in sorting by Function?
issue resolve. problem was that solr.war was silently not being overwritten by new version. will try to spend more time debugging before posting. -- View this message in context: http://lucene.472066.n3.nabble.com/possible-bug-in-sorting-by-Function-tp1118235p1121349.html Sent from the Solr - User mailing list archive at Nabble.com.
what would cause large numbers of executeWithRetry INFO messages?
I see a large number (~1000) of the following executeWithRetry messages in my apache catalina log files every day (see bolded snippet below). They seem to appear at random intervals. Since they are not flagged as errors or warnings, I have been ignoring them for now. However, I started wondering if "INFO" message is a red-herring and thinking there might be an actual problem somewhere. Does anyone know what would cause this type of message? Are they normal? I have not seen anything in my google searches for solr that contain this message Details: 1. My CPU usage seems fine as does my heap; we have lots of cpu capacity and heap space 2. The log is from a searcher but I know that the intervals do not correspond to replication (every 15 min on the hour) 3. the INFO lines appear in all searcher logs (we have a number of searchers) 4. the data is around 10m records per searcher and occupies around 14gb 5. I am not noticing any problems performing queries on the solr (so no trace info to give you); performance and queries seem fine Log snippet: Sep 10, 2010 2:17:59 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.admin.inf failed to respond Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Sep 10, 2010 2:18:20 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. any info appreciated. thx -- View this message in context: http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p1453417.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: WELCOME to solr-user@lucene.apache.org
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement boosting on a specific field like title? Thanks, Solr User On Thu, Nov 11, 2010 at 10:26 AM, wrote: > Hi! This is the ezmlm program. I'm managing the > solr-user@lucene.apache.org mailing list. > > I'm working for my owner, who can be reached > at solr-user-ow...@lucene.apache.org. > > Acknowledgment: I have added the address > > solr...@gmail.com > > to the solr-user mailing list. > > Welcome to solr-u...@lucene.apache.org! > > Please save this message so that you know the address you are > subscribed under, in case you later want to unsubscribe or change your > subscription address. > > > --- Administrative commands for the solr-user list --- > > I can handle administrative requests automatically. Please > do not send them to the list address! Instead, send > your message to the correct command address: > > To subscribe to the list, send a message to: > > > To remove your address from the list, send a message to: > > > Send mail to the following for info and FAQ for this list: > > > > Similar addresses exist for the digest list: > > > > To get messages 123 through 145 (a maximum of 100 per request), mail: > > > To get an index with subject and author for messages 123-456 , mail: > > > They are always returned as sets of 100, max 2000 per request, > so you'll actually get 100-499. > > To receive all messages with the same subject as message 12345, > send a short message to: > > > The messages should contain one line or word of text to avoid being > treated as s...@m, but I will ignore their content. > Only the ADDRESS you send to is important. > > You can start a subscription for an alternate address, > for example "j...@host.domain", just add a hyphen and your > address (with '=' instead of '@') after the command word: > > > To stop subscription for this address, mail: > > > In both cases, I'll send a confirmation message to that address. When > you receive it, simply reply to it to complete your subscription. > > If despite following these instructions, you do not get the > desired results, please contact my owner at > solr-user-ow...@lucene.apache.org. Please be patient, my owner is a > lot slower than I am ;-) > > --- Enclosed is a copy of the request I received. > > Return-Path: > Received: (qmail 48883 invoked by uid 99); 11 Nov 2010 15:26:44 - > Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) >by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:44 > + > X-ASF-Spam-Status: No, hits=2.2 required=10.0 > > > tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL > X-Spam-Check-By: apache.org > Received-SPF: pass (nike.apache.org: domain of solr...@gmail.comdesignates > 209.85.213.48 as permitted sender) > Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) > (209.85.213.48) >by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:35 > + > Received: by ywp4 with SMTP id 4so1394872ywp.35 >for @lucene.apache.org>; Thu, 11 Nov 2010 07:26:14 -0800 (PST) > DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; >d=gmail.com; s=gamma; >h=domainkey-signature:mime-version:received:received:in-reply-to > :references:date:message-id:subject:from:to:content-type; >bh=4KuKRrRVLjzTO4oB9/DNxMdQPfNQH2GnYznzPE6YqOo=; >b=l5lBfUYcyvipJn9SE+5j+t1XUmBjTtbyPYlRVj7jDb6G+W3NzQ21EHOowiD9rNH2L9 > > gc2+6mGEZmRJOZQwpKD7SUQ2bXL9fVm7mVfS21TMAgC+ZsWQ3vvFOHXalWZa8dbtcOY7 > C23KauLY7YH1UfducfXL77J7u0/snEZl5jQ7A= > DomainKey-Signature: a=rsa-sha1; c=nofws; >d=gmail.com; s=gamma; > > h=mime-version:in-reply-to:references:date:message-id:subject:from:to > :content-type; >b=nb9+3a9bOHnjGO5T5BhMlW15adcafr+MPzvpgc5X5NXEUGCI05ViLho0SSoQP2Wp2i > > xp1Mfjrjw05umeKmHX23oeD5Idc2G6xgz8I3ZcJ1bUM+cD7c52cMKG2suE2VvhUHlfah > z52rEtlqd0Q9fk/ZDWwR2DS7GoiVMRmgaWgD0= > MIME-Version: 1.0 > Received: by 10.229.216.201 with SMTP id hj9mr877669qcb.58.1289489174123; > Thu, > 11 Nov 2010 07:26:14 -0800 (PST) > Received: by 10.229.66.165 with HTTP; Thu, 11 Nov 2010 07:26:14 -0800 (PST) > In-Reply-To: <1289489103.46214.ez...@lucene.apache.org> > References: <1289489103.46214.ez...@lucene.apache.org> > Date: Thu, 11 Nov 2010
Boosting
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement boosting on a specific field like title? Thanks, Solr User
Re: WELCOME to solr-user@lucene.apache.org
Eric, Thank you so much for the reply and apologize for not providing all the details. The following are the field definitons in my schema.xml: Copy Fields: searchFields Before creating the indexes I feed XML file to the Solr job to create index files. I added Boost attribute to the title field before creating indexes and an example is below: 1785440Each Little Bird That Sings16.001520511399780152051136Hardcover2005-03-0120052005-02-22272ActiveSpring 2005Children's8.0-12.03-6Marla FrazeeJacket IllustratorDeborah WilesAuthorSocial Issues/FriendshipSocial Issues/General (see also headings under Family)GeneralGirls & WomenFiction/Middle GradeFiction/Award WinnersComing of AgeSocial Situations/Death & DyingSocial Situations/Friendship/assets/product/0152051139.gif<div>Ten-year-old Comfort Snowberger has attended 247 funerals. But that's not surprising, considering that her family runs the town funeral home. And even though Great-uncle Edisto keeled over with a heart attack and Great-great-aunt Florentine dropped dead--just like that--six months later, Comfort knows how to deal with loss, or so she thinks. She's more concerned with avoiding her crazy cousin Peach and trying to figure out why her best friend, Declaration, suddenly won't talk to her. Life is full of surprises. And the biggest one of all is learning what it takes to handle them.<br> <br>Deborah Wiles has created a unique, funny, and utterly real cast of characters in this heartfelt, and quintessentially Southern coming-of-age novel. Comfort will charm young readers with her wit, her warmth, and her struggles as she learns about life, loss, and ultimately, triumph.<br></div>Ten-year-old Comfort Snowberger learns about life's surprises in this funny, poignant, and very Southern coming-of-age story.1195443Baby Bear's Chairs16.001520511479780152051143Hardcover2005-09-0120052005-08-0140ActiveFall 2005Children's2.0-5.0P-KJane YolenAuthorMelissa SweetIllustratorBedtime & DreamsAnimals/BearsFamily/General (see also headings under Social Issues)Social Issues/Emotions & FeelingsFamily/ParentsAnimals/BearsBedtime BooksFamily Relationships/Parent-Child/assets/product/0152051147.gif<div>Baby Bear is the littlest bear in his family, and sometimes that's not so easy. Mama and Papa Bear get to stay up late in their great big chairs. Big brother gets to play fun games in his middle-sized chair. And Baby Bear only seems to cause trouble in his own tiny chair. But at the end of the day, he finds the one<i> </i>perfect chair that's comfier and cozier than all the rest.<br> <br>Bestselling author Jane Yolen and popular illustrator Melissa Sweet have come together to create a lyrical bedtime tale about a baby bear trying to find his place in a family. With a playful rhyming text and adorable, fun illustrations, here is a book for parents and their own baby bears to treasure.<br></div>In this sweet, bedtime story, Baby Bear discovers that Papa's lap is the best chair of all! I am trying to boost the title field so that the search results brings the actual match with title as the first item in the results. Adding boost attribute to the title field and Index time boosting did not change the search results. I tried Query time boosting also as mentioned below but no luck /select?q=Each+Little+Bird+That+Sings&title^9&fl=score Any help to fix this issue would be really helpful. Thanks, Solr User On Thu, Nov 11, 2010 at 10:32 AM, Solr User wrote: > Hi, > > I have a question about boosting. > > I have the following fields in my schema.xml: > > 1. title > 2. description > 3. ISBN > > etc > > I want to boost the field title. I tried index time boosting but it did not > work. I also tried Query time boosting but with no luck. > > Can someone help me on how to implement boosting on a specific field like > title? > > Thanks, > Solr User > > >
Re: WELCOME to solr-user@lucene.apache.org
Ahmet, Thanks for the reply. select/?q=built+to+last&defType=dismax&qf=searchFields^0.2+title^20&debugQuery=on For some reason if I use title field in my query I don't get any results. I am copying all searchable fields into searchFields field. So I am able to search only in the searchFields field not in any other fields. I request you all to clarify if anything wrong with my schema.xml. The schema.xml is at the bottom of this email. I am not able to get the boosting working on the title field. Please help me here too. Thanks, Solr User On Thu, Nov 11, 2010 at 5:11 PM, Ahmet Arslan wrote: > There are several mistakes in your approach: > > copyField just copies data. Index time boost is not copied. > > There is no such boosting syntax. /select?q=Each&title^9&fl=score > > You are searching on your default field. > > This is not your cause of your problem but omitNorms="true" disables index > time boosts. > > http://wiki.apache.org/solr/DisMaxQParserPlugin can satisfy your need. > > > --- On Thu, 11/11/10, Solr User wrote: > > > From: Solr User > > Subject: Re: WELCOME to solr-user@lucene.apache.org > > To: solr-user@lucene.apache.org > > Date: Thursday, November 11, 2010, 11:54 PM > > Eric, > > > > Thank you so much for the reply and apologize for not > > providing all the > > details. > > > > The following are the field definitons in my schema.xml: > > > > > stored="true" > > omitNorms="false" /> > > > > > stored="true" > > multiValued="true" omitNorms="true" /> > > > > > stored="true" > > multiValued="true" omitNorms="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" > > multiValued="true" omitNorms="true" /> > > > > > stored="true" /> > > > > > stored="true" > > multiValued="true" omitNorms="true" /> > > > > > stored="true" > > multiValued="true" omitNorms="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" /> > > > > > stored="true" > > omitNorms="true"/> > > > > > stored="true"/> > > > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true"/> > > > > Copy Fields: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > searchFields > > > > > > > > Before creating the indexes I feed XML file to the Solr job > > to create index > > files. I added Boost attribute to the title field before > > creating indexes > > and an example is below: > > > > > standalone="no"?> > name="material">1785440 > boost="10.0" name="title">Each Little > > Bird That Sings > name="price">16.0 > name="isbn10">0152051139 > name="isbn13">9780152051136 > name="format">Hardcover > name="pubdate">2005-03-01 > name="pubyear">2005 > name="reldate">2005-02-22 > name="pages">272 > name="bisacstatus">Active > name="season">Spring > > 2005 > name="imprint">Children's > name="age">8.0-12.0 > name="grade">3-6 > name="author">Marla Frazee > name="authortype">
Re: WELCOME to solr-user@lucene.apache.org
Ahmet, In production system we are using /spell/?q=built+to+last so that we can check the spelling. We are not using /select?q=built+to+last Can I use dismax with /spell? I understood from your reply that I need to change my schema.xml and modify the field types. Do I need to still use the searchFields field and what do I need to specify in the defaultSearchField tag? searchFields is one of the field names that we provided. Thanks, Solr User On Fri, Nov 12, 2010 at 10:26 AM, Ahmet Arslan wrote: > > > select/?q=built+to+last&defType=dismax&qf=searchFields^0.2+title^20&debugQuery=on > > > > For some reason if I use title field in my query I don't > > get any results. > > > > I am copying all searchable fields into searchFields field. > > So I am able to > > search only in the searchFields field not in any other > > fields. > > > > I request you all to clarify if anything wrong with my > > schema.xml. The > > schema.xml is at the bottom of this email. > > > > I am not able to get the boosting working on the title > > field. Please help me > > here too. > > Change type of your title field. It is string now. Make it solr.TextField. > Actually you dont need cath-all copy field with dismax. > Just change their types string to text and append them qf= parameter. > > > >
is there a way to prevent abusing rows parameter
silly question is there any configuration value I can set to prevent someone from entering a bad value for the rows parameter? ie to prevent something like "&rows=1" from crashing my servers? the server I am looking at is a solr v3.6 -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a way to prevent abusing rows parameter
Thanks guys. This is a problem with the front end not validating requests. I was hoping there might be a simple config value I could enter/change, rather than going the long process of migrating a proper fix all the way up to our production servers. Looks like not, but thx. -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467p4021892.html Sent from the Solr - User mailing list archive at Nabble.com.
upgrading from 4.0 to 4.1 causes "CorruptIndexException: checksum mismatch in segments file"
hi all I have been working on moving us from 4.0 to a newer build of 4.1 I am seeing a "CorruptIndexException: checksum mismatch in segments file" error when I try to use the existing index files. I did see something in the build log for #119 re "LUCENE-4446" that mentions "flip file formats to point to 4.1 format" Do I just need to reindex or is this some other issue (ie do I need to configure something differently)? or should I move back a few builds? note, we are currently using: solr-spec 4.0.0.2012.04.05.15.05.52 solr-impl 4.0-SNAPSHOT 1310094M - - 2012-04-05 15:05:52 lucene-spec 4.0-SNAPSHOT lucene-impl 4.0-SNAPSHOT 1309921 - - 2012-04-05 10:25:27 and are considering moving to: solr-spec 4.1.0.2012.11.03.18.08.42 solr-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:08:42 lucene-spec 4.1-2012-11-03_18-05-49 lucene-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:06:50 (aka apache-solr-4.1-2012-11-03_18-05-49) -- View this message in context: http://lucene.472066.n3.nabble.com/upgrading-from-4-0-to-4-1-causes-CorruptIndexException-checksum-mismatch-in-segments-file-tp4021913.html Sent from the Solr - User mailing list archive at Nabble.com.
spatial searches and geo-json data
hi all. I have a large amount of spatial data in geo-json format that I get from mssql server. I want to be able to index that data and am trying to figure out how to convert the data into WKT format since solr only accepts WKT. is anyone away of any solr module or tsql code or c# code that would help me with the conversion? -- View this message in context: http://lucene.472066.n3.nabble.com/spatial-searches-and-geo-json-data-tp4026140.html Sent from the Solr - User mailing list archive at Nabble.com.
what is difference between 4.1 and 5.x
just curious as to what the difference is between 4.1 and 5.0 i.e. is 4.1 a maintenance branch for what is currently 4.0 or are they very different designs/architectures -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-difference-between-4-1-and-5-x-tp4032064.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud on multiple appservers
Does anyone have a blog, wiki with detailed step by step instructions on setting up SOLRCloud on multiple JBOSS instances? Thanks in advance,
Using Customized sorting in Solr
Hi, We are planning to move the search of one of our listing based portal to solr/lucene search server from sphinx search server. But we are facing a challenge is porting customized sorting being used in our portal. We only have last 60 days of data live.The algorithm is as follows:- 1. Put all listings into 54 buckets – (Date bucket for 60 days) i.e. buckets of 7day, 1 day, 1 day…… 2. For each date bucket we make 2 buckets –(Paid / free bucket) 3. For each paid / free bucket cycle the advertisers on uniqueness basis i.e. inside a bucket the ordering should be 1st listing of each advertiser, 2nd listing of each advertiser and so on in other words within a *sub-bucket* second listing of an advertiser will be displayed only after first listing of all advertiser has been displayed. For taking care of point 1 and 2 we have created a field named bucket_index at the time of indexing the data and get the results sorted by this index, but we are not able to find a way to create a sort field at index time or think of a sort function for the point no 3. Please suggest if there is a way to do so in solr. Tia, BC Rathore
Re: Using Customized sorting in Solr
Jan, Thanks for the response, I though of using it, but it will be suboptimal to do this in the scenario I have. I guess I have to explain the scenario better, let me try it again:- 1. I have importance based buckets in the system, this is implemented using a variable named bucket_count having integer values 0,1,2,3, and I have to show results in order of bucket_count i.e. results from 0th bucket at top, then results from 1st bucket and so on. That is done by doing a asc sort on this variable. 2. Now *within these buckets* I need to ensure that 1st listing of every advertiser comes at top, then 2nd listing from every advertiser and so on. Now if I go with the grouping on advertiserId and and use the group.offset, then probably I also need to do additive filtering on bucket_count. To explain it better pseudo algorithm will be like 1. query solr with group.offset 0 and bucket count 0 2. if results more than zero in step1 then increase group offset and follow step 1 again 3. else increase bucket count with group offset zero and start from step 1. With this logic in the worst case I need to query solr (number of importance buckets)*(max number of listings by an advertiser). Which could be very high number of solr queries for a single user query. Please suggest if I can do this with more optimal way. I am also open to do modifications in solr/lucene code if needed. Regards, BC Rathore On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl wrote: > Hi, > > How about trying grouping with paging? > First you do > group=true&group.field=advertiserId&group.limit=1&group.offset=0&group.main=true&sort=something&group.sort=how-much-paid > desc > > That gives you one listing per advertiser, sorted the way you like. > Then to grab the next batch of ads, you go group.offset=1 etc etc. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > On 26. apr. 2012, at 08:10, solr user wrote: > > > Hi, > > > > We are planning to move the search of one of our listing based portal to > > solr/lucene search server from sphinx search server. But we are facing a > > challenge is porting customized sorting being used in our portal. We only > > have last 60 days of data live.The algorithm is as follows:- > > > > 1. Put all listings into 54 buckets – (Date bucket for 60 days) i.e. > > buckets of 7day, 1 day, 1 day…… > > 2. For each date bucket we make 2 buckets –(Paid / free bucket) > > 3. For each paid / free bucket cycle the advertisers on uniqueness > basis > > > > i.e. inside a bucket the ordering should be 1st listing > > of each advertiser, 2nd listing of each advertiser and so on > > in other words within a *sub-bucket* second listing of > an > > advertiser will be displayed only after first listing of all advertiser > has > > been displayed. > > > > For taking care of point 1 and 2 we have created a field named > bucket_index > > at the time of indexing the data and get the results sorted by this > index, > > but we are not able to find a way to create a sort field at index time or > > think of a sort function for the point no 3. Please suggest if there is > a > > way to do so in solr. > > > > Tia, > > > > BC Rathore > >
Re: Using Customized sorting in Solr
Hi, Any suggestions, Am I trying to do too much with solr? Is there any other search engine, which should be used here? I am looking into solr codebase and planning to modify QueryComponent. Will this be the right approach? Regards, Shivam On Fri, Apr 27, 2012 at 10:48 AM, solr user wrote: > Jan, > > Thanks for the response, > > I though of using it, but it will be suboptimal to do this in the scenario > I have. I guess I have to explain the scenario better, let me try it again:- > > 1. I have importance based buckets in the system, this is implemented > using a variable named bucket_count having integer values 0,1,2,3, and I > have to show results in order of bucket_count i.e. results from 0th bucket > at top, then results from 1st bucket and so on. That is done by doing a asc > sort on this variable. > 2. Now *within these buckets* I need to ensure that 1st listing of every > advertiser comes at top, then 2nd listing from every advertiser and so on. > > Now if I go with the grouping on advertiserId and and use the > group.offset, then probably I also need to do additive filtering on > bucket_count. To explain it better pseudo algorithm will be like > > 1. query solr with group.offset 0 and bucket count 0 > 2. if results more than zero in step1 then increase group offset and > follow step 1 again > 3. else increase bucket count with group offset zero and start from step 1. > > With this logic in the worst case I need to query solr (number of > importance buckets)*(max number of listings by an advertiser). Which could > be very high number of solr queries for a single user query. Please suggest > if I can do this with more optimal way. I am also open to do modifications > in solr/lucene code if needed. > > Regards, > BC Rathore > > > > On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl wrote: > >> Hi, >> >> How about trying grouping with paging? >> First you do >> group=true&group.field=advertiserId&group.limit=1&group.offset=0&group.main=true&sort=something&group.sort=how-much-paid >> desc >> >> That gives you one listing per advertiser, sorted the way you like. >> Then to grab the next batch of ads, you go group.offset=1 etc etc. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> On 26. apr. 2012, at 08:10, solr user wrote: >> >> > Hi, >> > >> > We are planning to move the search of one of our listing based portal to >> > solr/lucene search server from sphinx search server. But we are facing a >> > challenge is porting customized sorting being used in our portal. We >> only >> > have last 60 days of data live.The algorithm is as follows:- >> > >> > 1. Put all listings into 54 buckets – (Date bucket for 60 days) i.e. >> > buckets of 7day, 1 day, 1 day…… >> > 2. For each date bucket we make 2 buckets –(Paid / free bucket) >> > 3. For each paid / free bucket cycle the advertisers on uniqueness >> basis >> > >> > i.e. inside a bucket the ordering should be 1st listing >> > of each advertiser, 2nd listing of each advertiser and so on >> > in other words within a *sub-bucket* second listing of >> an >> > advertiser will be displayed only after first listing of all advertiser >> has >> > been displayed. >> > >> > For taking care of point 1 and 2 we have created a field named >> bucket_index >> > at the time of indexing the data and get the results sorted by this >> index, >> > but we are not able to find a way to create a sort field at index time >> or >> > think of a sort function for the point no 3. Please suggest if there >> is a >> > way to do so in solr. >> > >> > Tia, >> > >> > BC Rathore >> >> >
Dismax - Boosting
Hi, Currently we are using StandardRequestHandler and the configuration in SolrConfig.xml is as below: explicit We would like to switch to DisMax request handler and the configuration in SolrConfig.xml is: dismax explicit 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 popularity^0.5 recip(price,1,1000,1000)^0.3 id,name,price,score 2<-1 5<-2 6<90% 100 *:* text features name 0 name regex Questions: 1. Do we need to change the above DisMax handler configuration as per our requirements? Or Leave it as it is? What changes? 2. Do we need make DisMax as a default request handler? Do I need to add attribute default="true" to the tag? 3. I read in the documentation that Default Search Handler and DisMax are the same except that to use DisMaxQueryParser add defType=dismax in the query string. Is there anything else do we need to do? We are basically moving on to dismax handler and trying to understand what changes we need to make to SolrConfig.xml. I understood what changes need to be made to schema.xml in a different thread on this forum. Thanks, Solr User
Re: Dismax - Boosting
Ahmet, Thanks for the reply and it was very helpful. The query that I used before changing to dismax was: /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true The above query use to return all the data related to facets, data and also any suggestions related to spelling mistakes properly. The configuration after modifying using dismax is as below: Schema.xml: SolrConfig.xml: dismax explicit title^9.0 subtitle^3.0 author^1.0 desc shortdesc imprint category isbn13 isbn10 format series season bisacsub award * The query that I used after changing to dismax is: solr/tradecore/select/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true The following are the issues that I am having after modifying to dismax: 1. Facets data is not coming correctly. Lot of extra data is coming. Why and how to fix it? 2. How to use spell checker request handler along with dismax? Thanks, Murali On Mon, Nov 15, 2010 at 5:38 PM, Ahmet Arslan wrote: > > 1. Do we need to change the above DisMax handler > > configuration as per our > > requirements? Or Leave it as it is? What changes? > > Yes, you need to edit it. At least field names. Does your schema has a > field named sku? > > > 2. Do we need make DisMax as a default request > > handler? Do I need to add > > attribute default="true" to the tag? > > If you are going to always use it, why not, change it by adding > default="true". By doing so you need to add qt parameter in every request. > But don't forget to delete other default="true". There can be only one > default="true" :) > > > 3. I read in the documentation that Default Search Handler > > and DisMax are the same except that to use DisMaxQueryParser add > > defType=dismax in the query string. Is there anything else do we need to > > do? > > Above dismax config contains default parameter list. So you don't need to > add &defType=dismax&qf=title^1.0 text^1.5 ... etc. to the query string. > > > > We are basically moving on to dismax handler and trying to > > understand what > > changes we need to make to SolrConfig.xml. > > As you can see in default solrconfig.xml, you can register multiple > instances of solr.SearchHandler with different default parameter list and > name. default="true" one is executed by default. > > And this can be helpful deciding about dismax params: qf,pf,ps,ps,mm etc > http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ > > > >
Re: Dismax - Boosting
Ahmet, I modified the schema as follows: (Added more fields for faceting) Also added Copy Fields as below: With the above changes I am not getting any facet data as a result. Why is that the facet data not returning and what mistake I did with the schema? Thanks, Solr User On Wed, Nov 17, 2010 at 6:42 PM, Ahmet Arslan wrote: > > > Wow you facet on many fields : > > author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price > > The fields you facet on should be untokenized type: string, int, tint date > etc. > > The fields you want full text search, e.g. the ones you specify in qf, pf > parameter should be text type. > (title subtitle authordesc shortdesc imprint category isbn13 isbn10 format > series season bisacsub award) > > If you have common fields, for example category, you need two copy of that. > one string one text. So that you can both full-text search and facet on. > Use copy field for this. > > > > Example document: > category: electronic devices > > > query electronic will return it, and facets on category_string will be > displayed as : > > electronic devices (1) > > not : > > electronic (1) > devices (1) > > > > --- On Wed, 11/17/10, Solr User wrote: > > > From: Solr User > > Subject: Re: Dismax - Boosting > > To: solr-user@lucene.apache.org > > Date: Wednesday, November 17, 2010, 11:31 PM > > Ahmet, > > > > Thanks for the reply and it was very helpful. > > > > The query that I used before changing to dismax was: > > > > > /solr/tradecore/spell/?q=curious&wt=json&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true > > > > The above query use to return all the data related to > > facets, data and also > > any suggestions related to spelling mistakes properly. > > > > The configuration after modifying using dismax is as > > below: > > > > Schema.xml: > > > > > indexed="true" stored="true" > > omitNorms="true" /> > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true" /> > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="false" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="false" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true" /> > > > indexed="false" stored="true" /> > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true" /> > > > indexed="true" stored="true" > > multiValued="true" omitNorms="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="false" stored="true" /> > > > indexed="true" stored="true" /> > > > indexed="true" stored="true" > > omitNorms="true"/> > > > indexed="true" stored="true"/> > > > > SolrConfig.xml: > > > >> class="solr.SearchHandler" default="true"> > > > > > n
Re: Dismax - Boosting
Hi Ahmet, The below is my previous configuration which use to work correctly. textSpell default searchFields /solr/qa/tradedata/spellchecker true We use to search only in one field which is "searchFields" but with implementing dismax we are searching in different fields like title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint category isbn13 isbn10 format series season bisacsub award. Do we need to modify the above configuration to include all the above fields:??? Please give me an example. In the past we use to query twice to get first the suggestions and then we use to query using the first suggestion to show the data. Is there a way that we can do it in one step? Thanks, Murali On Wed, Nov 17, 2010 at 7:00 PM, Ahmet Arslan wrote: > > > 2. How to use spell checker request handler along with > > dismax? > > Just append this at the end of dismax request handler definition: > > > spellcheck > > > > > > >
Re: Dismax - Boosting
Hi Ahmet, In the past we used /spell and if there is not match then we use to get a list of suggestions and then we use to make another call with the first suggestion to get search results. After that we show user both suggestions for the spelling mistake and results of the first suggestion. I think the URL that you provided which has plug in will do help doing that. Is there a way from Solr to directly get the spelling suggestions as well as first suggestion data at the same time? For example: if seach keywork is mooon (typed by mistake instead of moon) the we need all suggestions like: Did you mean: moon, mo, mooing, moonen, soon, mood, moose, moore, spoon, moons? and also the search results for the first suggestion moon. Thanks, Solr User On Fri, Nov 19, 2010 at 6:41 PM, Ahmet Arslan wrote: > > The below is my previous configuration which use to work > > correctly. > > > > > class="solr.SpellCheckComponent"> > > > name="queryAnalyzerFieldType">textSpell > > > > default > > searchFields > >> name="spellcheckIndexDir">/solr/qa/tradedata/spellchecker > > true > > > > > > > > We use to search only in one field which is "searchFields" > > but with > > implementing dismax we are searching in different fields > > like > > > > title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint > > category isbn13 > > isbn10 format series season bisacsub award. > > > > Do we need to modify the above configuration to include all > > the above > > fields:??? Please give me an example. > > Searching and spell checking are independent. For example you can search on > 10 fields, and create suggestions from 2 fields. Spell checker accepts one > field in its configuration. So you need to populate this field with > copyField. Using the fields that you want to use spell checking. And type of > this field should be textSpell in your case. You can use above config. > > > > > In the past we use to query twice to get first the > > suggestions and then we > > use to query using the first suggestion to show the data. > > > > Is there a way that we can do it in one step? > > Are you talking about queries that return 0 numFound? Re-executing the > search like, described here > http://sematext.com/products/dym-researcher/index.html > > Not out-of-the-box. > > > >
Special Characters
Hi, I am searching for j.r.r. tolkien and getting results back but if I search for jrr I am not getting any results. Also not getting any results if I am searching for jrr tolkien. I am using AND as the default operator. The search results should work for both j.r.r. tolkien and jrr tolkien. What configuration changes I need to make so that special characters like hypen (-), period (.) are ignored while indexing? or any other suggestions? Thanks, Solr User
Re: Special Characters
Hi Eric, I use solr version 1.4.0 and below is my schema.xml It creates 3 tokens j r r tolkien works fine but not jrr tolkien. I will read about PatternReplaceCharFilterFactory and try it. Please let me know if I need to do anything differently. Thanks, Solr User On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson wrote: > What version of Solr are you using? You can think about > PatternReplaceCharFilterFactory if you're using the right > version of Solr. > > But you have other problems than that. Let's claim you > get the periods removed. Do you tokenize three tokens or > one? I.e. jrr or j r r? In the latter case your search still won't > match. > > Best > Erick > > On Mon, Nov 22, 2010 at 7:45 AM, Solr User wrote: > > > Hi, > > > > I am searching for j.r.r. tolkien and getting results back but if I > search > > for jrr I am not getting any results. Also not getting any results if I > am > > searching for jrr tolkien. I am using AND as the default operator. > > > > The search results should work for both j.r.r. tolkien and jrr tolkien. > > > > What configuration changes I need to make so that special characters like > > hypen (-), period (.) are ignored while indexing? or any other > suggestions? > > > > Thanks, > > Solr User > > >
Facet - Range Query issue
Hi, I am having issue with querying and using facet. This was working fine earlier: /spell/?q=(sun) AND (pubyear:[1991 TO 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on After modifying to use dismax handler with new schema the below query does not work: /select/?q=(sun) AND (pubyear:[1991 TO 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on (sun) AND (pubyear:[1991 TO 2011]) (sun) AND (pubyear:[1991 TO 2011]) +((+DisjunctionMaxQuery((series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun)) +DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" | shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 | isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" | subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991")) DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011)))~1) () +((+(series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear 1991" | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" | shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 | isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" | subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011))~1) () DisMaxQParser Basically we are trying to pass the query string along with a facet field and the range. Is there any syntax issue? Please help this is urgent as I got stuck. Thanks, Solr user
Re: Facet - Range Query issue
Eric, I solved the issue by adding fq parameter in the query. Thank you so much for your reply. Thanks, Murali On Mon, Nov 22, 2010 at 1:51 PM, Erick Erickson wrote: > Well, without seeing the changes you made to the schema, it's hard to tell > much. > Also, could you define "not work"? What, exactly, fails to do what you > expect? > > But the first question I have is "did you reindex after changing your > schema?". > > And have you checked your index to verify that there values in the fields > you > changed? > > Best > Erick > > On Mon, Nov 22, 2010 at 1:42 PM, Solr User wrote: > > > Hi, > > > > I am having issue with querying and using facet. > > > > This was working fine earlier: > > > > /spell/?q=(sun) AND (pubyear:[1991 TO > > > > > 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on > > > > After modifying to use dismax handler with new schema the below query > does > > not work: > > > > /select/?q=(sun) AND (pubyear:[1991 TO > > > > > 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on > > > > > > (sun) AND (pubyear:[1991 TO 2011]) > > (sun) AND (pubyear:[1991 TO 2011]) > > +((+DisjunctionMaxQuery((series:sun | desc:sun | > > bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | > > author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | > > imprint:sun | subtitle:sun^3.0 | isbn13:sun)) > > +DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" | > > bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" | > > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear > > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 | > > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" | > > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991")) > > DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 > | > > format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | > > category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 > | > > subtitle:2011^3.0 | isbn13:2011)))~1) () > > +((+(series:sun | desc:sun | > bisacsub:sun > > | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | > > category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | > > subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear > > 1991" > > | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" > | > > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear > > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 | > > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" | > > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 | > > desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | > > pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | > > isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | > > isbn13:2011))~1) () > > > > DisMaxQParser > > > > Basically we are trying to pass the query string along with a facet field > > and the range. Is there any syntax issue? Please help this is urgent as I > > got stuck. > > > > Thanks, > > Solr user > > >
How to get all the search results?
Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: How to get all the search results?
Hi, I tried *:* using dismax and I get no results. Is there a way that I can get all the search results using dismax? Thanks, Murali On Mon, Dec 6, 2010 at 11:17 AM, Savvas-Andreas Moysidis < savvas.andreas.moysi...@googlemail.com> wrote: > Hello, > > shouldn't that query syntax be *:* ? > > Regards, > -- Savvas. > > On 6 December 2010 16:10, Solr User wrote: > > > Hi, > > > > First off thanks to the group for guiding me to move from default search > > handler to dismax. > > > > I have a question related to getting all the search results. In the past > > with the default search handler I was getting all the search results > (8000) > > if I pass q=* as search string but with dismax I was getting only 16 > > results > > instead of 8000 results. > > > > How to get all the search results using dismax? Do I need to configure > > anything to make * (asterisk) work? > > > > Thanks, > > Solr User > > >
Re: How to get all the search results?
Hi Shawn, Yes you did. I tried and did not work so I asked the same question again. Now I understood and tried directly on the Solr admin and I got all the search results. I will implement the same on the website. Thank you so much Shawn. On Mon, Dec 13, 2010 at 5:16 PM, Shawn Heisey wrote: > On 12/13/2010 9:59 AM, Solr User wrote: > >> Hi, >> >> I tried *:* using dismax and I get no results. >> >> Is there a way that I can get all the search results using dismax? >> > > For dismax, use q= or simply leave the q parameter off the URL entirely. > It appears that you need to have q.alt set to *:* for this to work. It > would be a good idea to include this in your handler definition: > > *:* > > Two people (myself and Peter Karich) gave this answer on this thread last > week, within 15 minutes of the time your original question was posted. > Here's the entire thread on nabble: > > > http://lucene.472066.n3.nabble.com/How-to-get-all-the-search-results-td2028233.html > > Shawn > >
Re: what would cause large numbers of executeWithRetry INFO messages?
sorry, never did find a solution to that. if you do happen to figure it out, pls post a reply to this thread. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p2281087.html Sent from the Solr - User mailing list archive at Nabble.com.
Out of memory while creating indexes
Hi All, I am trying to create indexes out of a 400MB XML file using the following command and I am running into out of memory exception. $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST SOLR_PORT/solr/customercarecore/update -jar $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml I am planning to bump up the memory and try again. Did any one ran into similar issue? Any inputs would be very helpful to resolve the out of memory exception. I was able to create indexes with small file but not with large file. I am not using Solr J. Thanks, Solr User
Terms Component - solr-1.4.0
Hi All, I am using Solr 1.4.0 and dismax as request handler.I have the following in my solrconfig.xml in the dismax request handler tag spellcheck The above tags helps to find terms if there are spelling issues. I tried configuring terms component and no luck. May I know how to configure terms component with dismax? or Do I need to call terms component directly to get auto suggestions? Thank you so much in advance. Regards, Solr User
Re: Terms Component - solr-1.4.0
Hi All, Please help me in implementing TermsComponent in my current Solr solution. Regards, Solr User On Tue, May 17, 2011 at 4:12 PM, Solr User wrote: > Hi All, > > I am using Solr 1.4.0 and dismax as request handler.I have the following in > my solrconfig.xml in the dismax request handler tag > > > spellcheck > > > The above tags helps to find terms if there are spelling issues. I tried > configuring terms component and no luck. > > May I know how to configure terms component with dismax? or Do I need to > call terms component directly to get auto suggestions? > > Thank you so much in advance. > > Regards, > Solr User >
question(s) re lucene spatial toolkit aka LSP aka spatial4j
hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j, and can answer this question we are using this spatial tool for doing searches. overall, it seems to work very well. however, finding documentation is difficult. I have a couple of questions: 1. I have a geohash field in my solr schema that contains indexed geographic polygon data. I want to find all docs where that polygon intersects a given lat/long. I was experimenting with returning distance in the resultset and with sorting by distance and found that the following query works. However, I dont know what distance means in the query. i.e. is it distance from point to the polygon centroid, to the closest outer edge of the polygon, its a useless random value, etc. Does anyone know?? http://solrserver:solrport/solr/core0/select?q=*:*&fq={!v=$geoq%20cache=false}&geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22&sort=query($geoq)+asc&fl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state 2. some of the polygons, being geographic representations, are very big (ie state/province polygons). when solr starts processing a spatial query (like the one above), I can see ("INFO: Building Cache [xx]") it fills in some sort of memory cache (org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed polygon data. We are encountering Java OOM issues when this occurs (even when we booested the mem to 7GB). I know that some of the polygons can have more than 2300 points, but heavy trimming isn't really an option due to level of detail issues. Can we control this caching, or the indexing of the polygons, in any way to reduce the memory requirements?? -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j
Thanks David. No worries about the delay; am always happy and appreciative when someone responds. I don't understand what you mean by "All center points get cached into memory upon first use in a score" in question 2 about the Java OOM errors I am seeing. The Solr instance I have setup for testing has around 200k docs, with one WKT field per doc (indexed and stored and set to multivalue). I did a count of the number of points that get indexed in Solr (computed in MS SQL by counting the number of points (using STNumPoints) for each geometry (using STNumGeometries) in the WKT data I am indexing), and I have around 35M points total. If only the center points for 190K docs get cached, wouldn't that easily fit in 7GB of heap? Even if Solr was caching 35M points, that still doesn't sound like 7GB worth of data. -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j
Thanks David. You are a life saver. I didn't know how the cache got triggered and the "needScore=false" now allows some of my problem queries to finally work, and well within 2gb of mem. will look at your other suggestion when I can. MANY thanks again. -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000286.html Sent from the Solr - User mailing list archive at Nabble.com.
"Intersects" spatial query returns polygons it shouldn't
2207227 45.26619551047824, -93.23899176558338 45.26613779367068, -93.24250527367546 45.26608234822973, -93.243445378056 45.26606503829342, -93.24512861083372 45.2660344570852, -93.24588057830995 45.26602026067889, -93.24713274287363 45.26599455787498, -93.25036838013868 45.26592734514467, -93.25172461510564 45.265900698298395, -93.25236738024864 45.265888260809106, -93.25481754173921 45.26583307838667, -93.25571357952906 45.265819559899164, -93.2594981489083 45.26575415212897, -93.26098138766197 45.265754375486374, -93.26155216698102 45.26565612540643, -93.26170097145753 45.26562288963898, -93.26208574477789 45.26553876835043, -93.26245875524685 45.265434673708015, -93.26277275191426 45.265316250819595, -93.26311663127117 45.26517251314189, -93.26346212923646 45.26500240317637, -93.26393572774133 45.26477558787491, -93.2651820516718 45.26406759657772, -93.26518110226205 45.26337226279194, -93.26515218908767 45.26311636791454, -93.26518703008779 45.262871689663605, -93.2652064900752 45.26265582104258, -93.2652110298225 45.26215614194132, -93.26522443086994 45.26112430402238, -93.26522989950563 45.260703199933474, -93.26524872191168 45.25930812973533, -93.26525187087448 45.258897852775995, -93.26525857049303 45.258025812056765, -93.26527734826267 45.256675072153314, -93.26528081766433 45.25612813038996, -93.265287399575 45.25512698071874, -93.26530031054412 45.253711671615115, -93.26531490547187 45.25273002640574, -93.26532214123614 45.252243491267, -93.26533817105908 45.25062180123498, -93.26535413994274 45.24906421173263, -93.26536141910549 45.24841165046578, -93.26536638602661 45.24796649509243, -93.26537318826473 45.24735637067748, -93.26539798003012 45.24589779189643, -93.265404909549 45.24454674190931, -93.2654060939449 45.24296904311022, -93.26540624905046 45.24276127146885, -93.26540843815205 45.2420263885843, -93.26541275006169 45.240577352345994, -93.2654375717671 45.238843301612725, -93.26544518264211 45.237906888690105, -93.26544940933664 45.23738688110566, -93.26546966016808 45.236093591927926, -93.2654781584622 45.235359229961944, -93.26548338867605 45.23490715107922, -93.26553582901259 45.23354268990693, -93.26554071996831 45.23330119833777, -93.26555987026248 45.2323552839169, -93.26557251955711 45.23173040973764, -93.26556626032777 45.22975235185782, -93.26556606661761 45.229367333607186, -93.26556579189545 45.228823722705066, -93.26562882232702 45.226872206176665, -93.26571073971922 45.224335971082276, -93.26574560622672 45.2219321787, -93.26574836877063 45.22173093256304, -93.26577033227747 45.22021043432355, -93.26578588443306 45.21913391123174, -93.26580662128347 45.21769799745153, -93.26580983179628 45.217475736026664, -93.26581322607608 45.217240685631346, -93.26590715360736 45.210737684073244, -93.26591966090616 45.209871711997586, -93.2659016992406 45.20722015227932, -93.26587484243684 45.203254836571126, -93.26585637174348 45.20052765082941, -93.26585684827346 45.19841676076085, -93.26587786763154 45.19732741144391, -93.2658624676632 45.1970879109074, -93.2659274100303 45.194004979577755, -93.26595017983325 45.191531890895845, -93.26595423366354 45.19092534610275, -93.26593099287571 45.190637988686554, -93.2659274057232 45.18986823069059, -93.26592485308495 45.18931973506328))' -- View this message in context: http://lucene.472066.n3.nabble.com/Intersects-spatial-query-returns-polygons-it-shouldn-t-tp4008646.html Sent from the Solr - User mailing list archive at Nabble.com.
question about schemas
I just started using Solr, and I am trying to figure out how to setup my schema. I know that Solr doesn’t have JOINs, and so I am having some difficulty figuring out how would I setup a schema for the following fictional situation. For example, let us say that : - I have a 1+ customers, each having some specific info (StoreId , Name, Phone, Address, City, State, Zip, etc) - Each customer has a subset of the 100+ products I am looking to track, each product having some specific info (ProductId, Name, Width, Height, Depth, Weight, Density, etc) - I want to be able to search by the product info but have facets return the number of customers, rather than the number of products, that meet my criteria - I want to display (and sort) customers based on my product search In relational databases, I would simply create two tables (customer and product) and JOIN them. I could then craft a sql query to count the number of distinct StoreId values in the result (something like facets). In Solr, however, there are no joins. As far as I can tell, my options are to: - create two Solr instances, one with customer info and one with product info; I would search the product Solr instance and identify the StoreId values return, and then use that info to search the customer Solr instance to get the customer info. The problem with this is the second query could have ten thousand ANDs (one for each StoreId returned by the first query) - create a single Solr instance that contains a denormalized version of the data where each doc would contain both the customer info and the product info for a given product. The problem with this is that my facets would return the number of products, not the number of customers - create a single Solr instance that contains a denormalized version of the data where each doc contains the customer info and info for ALL products that the customer might have (likely done via dynamicfields). The problem with this is that my schema would be a bit messy and that my queries could have hundreds of ANDs and Ors (one AND for each product field, and one OR for each product); for example, q=((Width1:50 AND Density1:7) OR (Width2:50 AND Density2:7) OR …) Does anyone have any advice on this? Are there other schemas that might work? Hopefully the example makes sense. -- View this message in context: http://old.nabble.com/question-about-schemas-tp26600956p26600956.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: question about schemas
cbennett wrote: > > Solr supports multi value fields so you could store one document per > customer and have multi value fields for the product information. > > Colin. Quoted from: http://old.nabble.com/question-about-schemas-tp26600956p26608618.html Thanks Colin. From the online docs, there doesnt seem to be a way to directly map a multivalue field value in one field to the multivalue field value in another field (ie the first value in myMultiValueProductId wouldnt necessarily match the first value in myMultiValueDensity or in myMultiValueWeight)? Is there a technique to do this? -- View this message in context: http://old.nabble.com/question-about-schemas-tp26600956p26611715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about schemas
Lance Norskog-2 wrote: > > But, in general, this is a "shopping cart" database and Solr/Lucene may > not be the best fit for this problem. > True, every tool has strengths and weaknesses. Given how powerful Solr appears to be, I would be surprised if I was not able to handle this use case. Lance Norskog-2 wrote: > > You can make a separate facet field which contains a range of "buckets": > 10, 20, 50, or 100 means that the field has a value 0-10, 11-20, 21-50, or > 51-100. You could use a separate filter query with values for these > buckets. Filter queries are very fast in Solr 1.4 and this would limit > your range query execution to documents which match the buckets. > Thank you for this suggestion. I will look into this. -- View this message in context: http://old.nabble.com/question-about-schemas-tp26600956p26636155.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about schemas
Lance Norskog-2 wrote: > > You can make a separate facet field which contains a range of "buckets": > 10, 20, 50, or 100 means that the field has a value 0-10, 11-20, 21-50, or > 51-100. You could use a separate filter query with values for these > buckets. Filter queries are very fast in Solr 1.4 and this would limit > your range query execution to documents which match the buckets. > Lance, I am afraid that I do not see how to use this suggestion. Which of the three (four?) suggested schemas would I be using? How would these range facets prevent the potential issues I found such as getting product facets instead of customer facets, or having very large numbers of ANDs and ORs, and so forth. -- View this message in context: http://old.nabble.com/question-about-schemas-tp26600956p26679922.html Sent from the Solr - User mailing list archive at Nabble.com.