Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
If the ping request handler is taking too long, and the server is not recovering automatically, there is not much you can do automatically on that server. You have to intervene manually, and restart Solr on that node. First of all: the ping is just an internal check. If it takes too long to respond, the requester (i.e. the script calling it), should stop the request, and mark that node as problematic. If there are for example memory problems every subsequent request will only enhance the problem, and Solr cannot recover from that. > On 5 Aug 2019, at 06:15, dinesh naik wrote: > > Thanks john,Erick and Furknan. > > I have already defined the ping request handler in solrconfig.xml as below: > name="invariants"> /select _root_:abc > > My question is regarding the custom query being used. Here i am querying > for field _root_ which is available in all of my cluster and defined as a > string field. The result for _root_:abc might not get me any match as > well(i am ok with not finding any matches, the query should not be taking > 10-15 seconds for getting the response). > > If the response comes within 1 second , then the core recovery issue is > solved, hence need your suggestion if using _root_ field in custom query is > fine? > > > On Mon, Aug 5, 2019 at 2:49 AM Furkan KAMACI wrote: > >> Hi, >> >> You can change invariants i.e. *qt* and *q* of a *PingRequestHandler*: >> >> >> >> /search >> some test query >> >> >> >> Check documentation fore more info: >> >> https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/handler/PingRequestHandler.html >> >> Kind Regards, >> Furkan KAMACI >> >> On Sat, Aug 3, 2019 at 4:17 PM Erick Erickson >> wrote: >> >>> You can also (I think) explicitly define the ping request handler in >>> solrconfig.xml to do something else. >>> On Aug 2, 2019, at 9:50 AM, Jörn Franke wrote: Not sure if this is possible, but why not create a query handler in >> Solr >>> with any custom query and you use that as ping replacement ? > Am 02.08.2019 um 15:48 schrieb dinesh naik >> : > > Hi all, > I have few clusters with huge data set and whenever a node goes down >> its > not able to recover due to below reasons: > > 1. ping request handler is taking more than 10-15 seconds to respond. >>> The > ping requesthandler however, expects it will return in less than 1 >>> second > and fails a requestrecovery if it is not responded to in this time. > Therefore recoveries never would start. > > 2. soft commit is very low ie. 5 sec. This is a business requirement >> so > not much can be done here. > > As the standard/default admin/ping request handler is using *:* >> queries >>> , > the response time is much higher, and i am looking for an option to >>> change > the same so that the ping handler returns the results within few > miliseconds. > > here is an example for standard query time: > > snip--- > curl " > >>> >> http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false&debug=timing > " > { > "responseHeader":{ > "zkConnected":true, > "status":0, > "QTime":16620, > "params":{ >"q":"*:*", >"distrib":"false", >"debug":"timing", >"indent":"on", >"rows":"0", >"wt":"json"}}, > "response":{"numFound":1329638799,"start":0,"docs":[] > }, > "debug":{ > "timing":{ >"time":16620.0, >"prepare":{ > "time":0.0, > "query":{ >"time":0.0}, > "facet":{ >"time":0.0}, > "facet_module":{ >"time":0.0}, > "mlt":{ >"time":0.0}, > "highlight":{ >"time":0.0}, > "stats":{ >"time":0.0}, > "expand":{ >"time":0.0}, > "terms":{ >"time":0.0}, > "block-expensive-queries":{ >"time":0.0}, > "slow-query-logger":{ >"time":0.0}, > "debug":{ >"time":0.0}}, >"process":{ > "time":16619.0, > "query":{ >"time":16619.0}, > "facet":{ >"time":0.0}, > "facet_module":{ >"time":0.0}, > "mlt":{ >"time":0.0}, > "highlight":{ >"time":0.0}, > "stats":{ >"time":0.0}, > "expand":{ >"time":0.0}, > "terms":{ >"time":0.0}, > "block-expensive-queries":{ >"time":0.0}, > "slow-query-logger":{ >"time":0.0}, > "debug":{ >"time":0.0} > > > snap > > can we use query: _root_:abc in the ping request handler ? Tried this >>> query > and its returning the results within few miliseconds and also the >> nodes >>> are > able to recover with
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
On 8/4/2019 10:15 PM, dinesh naik wrote: My question is regarding the custom query being used. Here i am querying for field _root_ which is available in all of my cluster and defined as a string field. The result for _root_:abc might not get me any match as well(i am ok with not finding any matches, the query should not be taking 10-15 seconds for getting the response). Typically the *:* query is the fastest option. It is special syntax that means "all documents" and it usually executes very quickly. It will be faster than querying for a value in a specific field, which is what you have defined currently. I will typically add a "rows" parameter to the ping handler with a value of 1, so Solr will not be retrieving a large amount of data. If you are running Solr in cloud mode, you should experiment with setting the distrib parameter to false, which will hopefully limit the query to the receiving node only. Erick has already mentioned GC pauses as a potential problem. With a 10-15 second response time, I think that has high potential to be the underlying cause. The response you included at the beginning of the thread indicates there are 1.3 billion documents, which is going to require a fair amount of heap memory. If seeing such long ping times with a *:* query is something that happens frequently, your heap may be too small, which will cause frequent full garbage collections. The very low autoSoftCommit time can contribute to system load. I think it's very likely, especially with such a large index, that in many cases those automatic commits are taking far longer than 5 seconds to complete. If that's the case, you're not achieving a 5 second visibility interval and you are putting a lot of load on Solr, so I would consider increasing it. Thanks, Shawn
Difference between search results from Solr 5 and 8
Hi all, We upgraded our Solr cluster from 5 to 8 and I've found a difference in search results. Previously we had this in the schema.xml: Which stopped working in Solr 8, so we mowed this to solrconfig.xml as: AND Now, this search gives 0 results while previously it worked fine and returned 2 records: [ path=select parameters={fq: ["type:Member"], sort: "score desc", q: "u...@gmail.com ad...@yahoo.com", fl: "* score", qf: "email_words_ngram", defType: "edismax", mm: 1, start: 0, rows: 20} ] At the same time the docs say that terms without explicit "+" or "-" are considered as optional and results of both terms should be returned. This search works: [ path=select parameters={fq: ["type:Member"], sort: "score desc", q: "u...@gmail.com OR ad...@yahoo.com", fl: "* score", qf: "email_words_ngram", defType: "edismax", mm: 1, start: 0, rows: 20} ] I need help figuring out what's wrong with our configuration and how to handle this properly. Thank you, Alexander
Re: Difference between search results from Solr 5 and 8
On 8/5/2019 7:34 AM, Alexander Sherbakov wrote: Which stopped working in Solr 8, so we mowed this to solrconfig.xml as: AND Now, this search gives 0 results while previously it worked fine and returned 2 records: [ path=select parameters={fq: ["type:Member"], sort: "score desc", q: "u...@gmail.com ad...@yahoo.com", fl: "* score", qf: "email_words_ngram", defType: "edismax", mm: 1, start: 0, rows: 20} ] At the same time the docs say that terms without explicit "+" or "-" are considered as optional and results of both terms should be returned. Untagged clauses are indeed optional -- if you leave the default operator at "OR". You've set it to "AND", which means that effectively any query clause without a +/- or a boolean operator has an implicit + -- it will be required. The behavior in Solr 5 should be the same with a default operator of AND, unless you were perhaps running into a bug there. Or maybe everything was not entirely the same before. Thanks, Shawn
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Hi Nikolas, The restart of node is not helping , the node keeps trying to recover and always fails: here is the log : 2019-07-31 06:10:08.049 INFO (coreZkRegister-1-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.ZkController Core needs to recover:parts_shard30_replica_n2697 2019-07-31 06:10:08.050 INFO (updateExecutor-3-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.u.DefaultSolrCoreState Running recovery 2019-07-31 06:10:08.056 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true 2019-07-31 06:10:08.261 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy startupVersions size=49956 range=[1640550593276674048 to 1640542396328443904] 2019-07-31 06:10:08.328 INFO (qtp689401025-58) o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/key params={omitHeader=true&wt=json} status=0 QTime=0 2019-07-31 06:10:09.276 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy Failed to connect leader http://hostname:8983/solr on recovery, try again The ping request query is being called from solr itself and not via some script,so there is no way to stop it . code where the time is hardcoded to 1 sec: try (HttpSolrClient httpSolrClient = new HttpSolrClient.Builder(leaderReplica.getCoreUrl()) .withSocketTimeout(1000) .withConnectionTimeout(1000) .withHttpClient(cc.getUpdateShardHandler().getRecoveryOnlyHttpClient()) .build()) { SolrPingResponse resp = httpSolrClient.ping(); return leaderReplica; } catch (IOException e) { log.info("Failed to connect leader {} on recovery, try again", leaderReplica.getBaseUrl()); Thread.sleep(500); } catch (Exception e) { if (e.getCause() instanceof IOException) { log.info("Failed to connect leader {} on recovery, try again", leaderReplica.getBaseUrl()); Thread.sleep(500); } else { return leaderReplica; } } On Mon, Aug 5, 2019 at 1:19 PM Nicolas Franck wrote: > If the ping request handler is taking too long, > and the server is not recovering automatically, > there is not much you can do automatically on that server. > You have to intervene manually, and restart Solr on that node. > > First of all: the ping is just an internal check. If it takes too long > to respond, the requester (i.e. the script calling it), should stop > the request, and mark that node as problematic. If there are > for example memory problems every subsequent request will only enhance > the problem, and Solr cannot recover from that. > > > On 5 Aug 2019, at 06:15, dinesh naik wrote: > > > > Thanks john,Erick and Furknan. > > > > I have already defined the ping request handler in solrconfig.xml as > below: > > > name="invariants"> /select _root_:abc > > > > My question is regarding the custom query being used. Here i am querying > > for field _root_ which is available in all of my cluster and defined as a > > string field. The result for _root_:abc might not get me any match as > > well(i am ok with not finding any matches, the query should not be taking > > 10-15 seconds for getting the response). > > > > If the response comes within 1 second , then the core recovery issue is > > solved, hence need your suggestion if using _root_ field in custom query > is > > fine? > > > > > > On Mon, Aug 5, 2019 at 2:49 AM Furkan KAMACI > wrote: > > > >> Hi, > >> > >> You can change invariants i.e. *qt* and *q* of a *PingRequestHandler*: > >> > >> > >> > >> /search > >> some test query > >> > >> > >> > >> Check documentation fore more info: > >> > >> > https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/handler/PingRequestHandler.html > >> > >> Kind Regards, > >> Furkan KAMACI > >> > >> On Sat, Aug 3, 2019 at 4:17 PM Erick Erickson > >> wrote: > >> > >>> You can also (I think) explicitly define the ping request handler in > >>> solrconfig.xml to do something else. > >>> > On Aug 2, 2019, at 9:50 AM, Jörn Franke wrote: > > Not sure if this is possible, but why not create a query handler in > >> Solr > >>> with any custom query and you use that as ping replacement ? > > > Am 02.08.2019 um 15:48 schrieb dinesh naik < > dineshkumarn...@gmail.com > >>> : > > > > Hi all, > > I have few clusters with huge data set and whenever a node goes down > >> its > > n
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Hi Shawn, yes i am running solr in cloud mode and Even after adding the params row=0 and distrib=false, the query response is more than 15 sec due to more than a billion doc set. Also the soft commit setting can not be changed to a higher no. due to requirement from business team. http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false takes more than 10 sec always. Here are the java heap and G1GC setting i have , /usr/java/default/bin/java -server -Xmx31g -Xms31g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:ConcGCThreads=5 -XX:ParallelGCThreads=10 -XX:+UseLargePages -XX:+AggressiveOpts -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:InitiatingHeapOccupancyPercent=50 -XX:G1ReservePercent=18 -XX:MaxNewSize=6G -XX:PrintFLSStatistics=1 -XX:+PrintPromotionFailure -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/solr7/logs/heapdump -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime JVM heap has never crossed 20GB in my setup , also Young G1GC timing is well within milli seconds (in range of 25-200 ms). On Mon, Aug 5, 2019 at 6:37 PM Shawn Heisey wrote: > On 8/4/2019 10:15 PM, dinesh naik wrote: > > My question is regarding the custom query being used. Here i am querying > > for field _root_ which is available in all of my cluster and defined as a > > string field. The result for _root_:abc might not get me any match as > > well(i am ok with not finding any matches, the query should not be taking > > 10-15 seconds for getting the response). > > Typically the *:* query is the fastest option. It is special syntax > that means "all documents" and it usually executes very quickly. It > will be faster than querying for a value in a specific field, which is > what you have defined currently. > > I will typically add a "rows" parameter to the ping handler with a value > of 1, so Solr will not be retrieving a large amount of data. If you are > running Solr in cloud mode, you should experiment with setting the > distrib parameter to false, which will hopefully limit the query to the > receiving node only. > > Erick has already mentioned GC pauses as a potential problem. With a > 10-15 second response time, I think that has high potential to be the > underlying cause. > > The response you included at the beginning of the thread indicates there > are 1.3 billion documents, which is going to require a fair amount of > heap memory. If seeing such long ping times with a *:* query is > something that happens frequently, your heap may be too small, which > will cause frequent full garbage collections. > > The very low autoSoftCommit time can contribute to system load. I think > it's very likely, especially with such a large index, that in many cases > those automatic commits are taking far longer than 5 seconds to > complete. If that's the case, you're not achieving a 5 second > visibility interval and you are putting a lot of load on Solr, so I > would consider increasing it. > > Thanks, > Shawn > -- Best Regards, Dinesh Naik
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
How much total physical memory on your machine? Lucene holds a lot of the index in MMapDirectory space. My starting point is to allocate no more than 50% of my physical memory to the Java heap. You’re allocating 31G, if you don’t have at _least_ 64G on these machines you’re probably swapping. See: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick > On Aug 5, 2019, at 10:58 AM, dinesh naik wrote: > > Hi Shawn, > yes i am running solr in cloud mode and Even after adding the params row=0 > and distrib=false, the query response is more than 15 sec due to more than > a billion doc set. > Also the soft commit setting can not be changed to a higher no. due to > requirement from business team. > > http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false > takes more than 10 sec always. > > Here are the java heap and G1GC setting i have , > > /usr/java/default/bin/java -server -Xmx31g -Xms31g -XX:+UseG1GC > -XX:MaxGCPauseMillis=250 -XX:ConcGCThreads=5 > -XX:ParallelGCThreads=10 -XX:+UseLargePages -XX:+AggressiveOpts > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled > -XX:InitiatingHeapOccupancyPercent=50 -XX:G1ReservePercent=18 > -XX:MaxNewSize=6G -XX:PrintFLSStatistics=1 > -XX:+PrintPromotionFailure -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/solr7/logs/heapdump > -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -XX:+PrintGCTimeStamps > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > > JVM heap has never crossed 20GB in my setup , also Young G1GC timing is > well within milli seconds (in range of 25-200 ms). > > On Mon, Aug 5, 2019 at 6:37 PM Shawn Heisey wrote: > >> On 8/4/2019 10:15 PM, dinesh naik wrote: >>> My question is regarding the custom query being used. Here i am querying >>> for field _root_ which is available in all of my cluster and defined as a >>> string field. The result for _root_:abc might not get me any match as >>> well(i am ok with not finding any matches, the query should not be taking >>> 10-15 seconds for getting the response). >> >> Typically the *:* query is the fastest option. It is special syntax >> that means "all documents" and it usually executes very quickly. It >> will be faster than querying for a value in a specific field, which is >> what you have defined currently. >> >> I will typically add a "rows" parameter to the ping handler with a value >> of 1, so Solr will not be retrieving a large amount of data. If you are >> running Solr in cloud mode, you should experiment with setting the >> distrib parameter to false, which will hopefully limit the query to the >> receiving node only. >> >> Erick has already mentioned GC pauses as a potential problem. With a >> 10-15 second response time, I think that has high potential to be the >> underlying cause. >> >> The response you included at the beginning of the thread indicates there >> are 1.3 billion documents, which is going to require a fair amount of >> heap memory. If seeing such long ping times with a *:* query is >> something that happens frequently, your heap may be too small, which >> will cause frequent full garbage collections. >> >> The very low autoSoftCommit time can contribute to system load. I think >> it's very likely, especially with such a large index, that in many cases >> those automatic commits are taking far longer than 5 seconds to >> complete. If that's the case, you're not achieving a 5 second >> visibility interval and you are putting a lot of load on Solr, so I >> would consider increasing it. >> >> Thanks, >> Shawn >> > > > -- > Best Regards, > Dinesh Naik
Re: Solr 7.6.0: PingRequestHandler - Changing the default query (*:*)
Hi Erick, Each vm has 128GB of physical memory. On Mon, Aug 5, 2019, 8:38 PM Erick Erickson wrote: > How much total physical memory on your machine? Lucene holds a lot of the > index in MMapDirectory space. My starting point is to allocate no more than > 50% of my physical memory to the Java heap. You’re allocating 31G, if you > don’t > have at _least_ 64G on these machines you’re probably swapping. > > See: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best, > Erick > > > > On Aug 5, 2019, at 10:58 AM, dinesh naik > wrote: > > > > Hi Shawn, > > yes i am running solr in cloud mode and Even after adding the params > row=0 > > and distrib=false, the query response is more than 15 sec due to more > than > > a billion doc set. > > Also the soft commit setting can not be changed to a higher no. due to > > requirement from business team. > > > > > http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false > > takes more than 10 sec always. > > > > Here are the java heap and G1GC setting i have , > > > > /usr/java/default/bin/java -server -Xmx31g -Xms31g -XX:+UseG1GC > > -XX:MaxGCPauseMillis=250 -XX:ConcGCThreads=5 > > -XX:ParallelGCThreads=10 -XX:+UseLargePages -XX:+AggressiveOpts > > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled > > -XX:InitiatingHeapOccupancyPercent=50 -XX:G1ReservePercent=18 > > -XX:MaxNewSize=6G -XX:PrintFLSStatistics=1 > > -XX:+PrintPromotionFailure -XX:+HeapDumpOnOutOfMemoryError > > -XX:HeapDumpPath=/solr7/logs/heapdump > > -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps > > -XX:+PrintGCTimeStamps > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > > > > JVM heap has never crossed 20GB in my setup , also Young G1GC timing is > > well within milli seconds (in range of 25-200 ms). > > > > On Mon, Aug 5, 2019 at 6:37 PM Shawn Heisey wrote: > > > >> On 8/4/2019 10:15 PM, dinesh naik wrote: > >>> My question is regarding the custom query being used. Here i am > querying > >>> for field _root_ which is available in all of my cluster and defined > as a > >>> string field. The result for _root_:abc might not get me any match as > >>> well(i am ok with not finding any matches, the query should not be > taking > >>> 10-15 seconds for getting the response). > >> > >> Typically the *:* query is the fastest option. It is special syntax > >> that means "all documents" and it usually executes very quickly. It > >> will be faster than querying for a value in a specific field, which is > >> what you have defined currently. > >> > >> I will typically add a "rows" parameter to the ping handler with a value > >> of 1, so Solr will not be retrieving a large amount of data. If you are > >> running Solr in cloud mode, you should experiment with setting the > >> distrib parameter to false, which will hopefully limit the query to the > >> receiving node only. > >> > >> Erick has already mentioned GC pauses as a potential problem. With a > >> 10-15 second response time, I think that has high potential to be the > >> underlying cause. > >> > >> The response you included at the beginning of the thread indicates there > >> are 1.3 billion documents, which is going to require a fair amount of > >> heap memory. If seeing such long ping times with a *:* query is > >> something that happens frequently, your heap may be too small, which > >> will cause frequent full garbage collections. > >> > >> The very low autoSoftCommit time can contribute to system load. I think > >> it's very likely, especially with such a large index, that in many cases > >> those automatic commits are taking far longer than 5 seconds to > >> complete. If that's the case, you're not achieving a 5 second > >> visibility interval and you are putting a lot of load on Solr, so I > >> would consider increasing it. > >> > >> Thanks, > >> Shawn > >> > > > > > > -- > > Best Regards, > > Dinesh Naik > >
SOLR 8.1.1 index on pdate field included in search results
I am migrating from SOLR 4.10.2 to 8.1.1. For some reason, in the 8.1.1 core, a pdate index named IDX_ExpirationDate is appearing as a field in the search results documents. I have several other indexes that are defined and (correctly) do not appear in the results. But the index I am having trouble with is the only one based on a pdate. Here is a sample 8.1.1 response that demonstrates the issue: "response":{"numFound":58871,"start":0,"docs":[ { "id":"1", "ExpirationDate":"2018-01-26T00:00:00Z", "_version_":1641033044033798170, "IDX_ExpirationDate":["2018-01-26T00:00:00Z"]}, { "id":"2", "ExpirationDate":"2018-02-20T00:00:00Z", "_version_":1641032965380112384, "IDX_ExpirationDate":["2018-02-20T00:00:00Z"]}, ExpirationDate is supposed to be there, but IDX_ExpirationDate should not. I know that I can probably keep using date, but it is deprecated, and part of the reason for upgrading to 8.1.1 is to use the latest non-deprecated stuff ;-) I have an index named IDX_ExpirationDate based on a field called ExpirationDate that was a date field in 4.10.2: In the 8.1.1 core, I have this configured as a pdate:
Re: SOLR 8.1.1 index on pdate field included in search results
On 8/5/2019 10:37 AM, Hodder, Rick wrote: ExpirationDate is supposed to be there, but IDX_ExpirationDate should not. I know that I can probably keep using date, but it is deprecated, and part of the reason for upgrading to 8.1.1 is to use the latest non-deprecated stuff ;-) The DatePointField class defaults to docValues="true" and useDocValuesAsStored="true". Unless those parameters are changed, if the field is defined for a document, it will typically be in search results. https://lucene.apache.org/solr/guide/6_6/docvalues.html#DocValues-RetrievingDocValuesDuringSearch Thanks, Shawn
RE: SOLR 8.1.1 index on pdate field included in search results
Hi Shawn, >The DatePointField class defaults to docValues="true" and >useDocValuesAsStored="true". Unless those parameters are changed, >if the field is defined for a document, it will typically be in search results. Just checking, I'm fine with ExpirationDate appearing in the results, it's the index IDX_ExpirationDate that I don't want in the results. So you are saying that I should add docValues="false" or docValuesAsStored="false" to the indexed but not stored field?: I have other IDX_ fields defined that are not pdate and they don't appear in results, that's what's confusing me, for example: Thanks, Rick
RE: SOLR 8.1.1 index on pdate field included in search results
You are right of course, Shawn. I added useDocValuesAsStored="false" to the IDX_ExpirationDate field definition, and it no longer shows up Thanks, Rick -Original Message- From: Hodder, Rick Sent: Monday, August 05, 2019 2:02 PM To: solr-user@lucene.apache.org Subject: RE: SOLR 8.1.1 index on pdate field included in search results Hi Shawn, >The DatePointField class defaults to docValues="true" and >useDocValuesAsStored="true". Unless those parameters are changed, if the >field is defined for a document, it will typically be in search results. Just checking, I'm fine with ExpirationDate appearing in the results, it's the index IDX_ExpirationDate that I don't want in the results. So you are saying that I should add docValues="false" or docValuesAsStored="false" to the indexed but not stored field?: I have other IDX_ fields defined that are not pdate and they don't appear in results, that's what's confusing me, for example: Thanks, Rick
Re: NRT for new items in index
On 2019/08/03 18:00:28, Furkan KAMACI wrote: > Hi, > > First of all, could you check here: > https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > to > better understand hard commits, soft commits and transaction logs to > achieve NRT search. > > Kind Regards, > Furkan KAMACI > > On Wed, Jul 31, 2019 at 3:47 PM profiuser wrote: > > > Hi, > > > > we have something about 400 000 000 items in a solr collection. > > We have set up auto commit property for this collection to 15 minutes. > > Is a big collection and we using some caches etc. Therefore we have big > > autocommit value. > > > > This have disadvantage that we haven't NRT searches. > > > > We would like to have NRT at least for searching for the newly added items. > > > > We read about new functionality "Category routed alilases" in a solr > > version > > 8.1. > > > > And we got an idea, that we could add to our collection schema field for > > routing. > > And at the time of indexing we check if item is new and to routing field we > > set up value "new", or the item is older than some time period we set up > > value to "old". > > And we will have one category routed alias routedCollection, and there will > > be 2 collections old and new. > > > > If we index new item, router choose new collection and this item is > > inserted > > to it. After some period we reindex item and we decide that this item is > > old > > and to routing field we set up value "old". Router decide to update > > (insert) > > item to collection old. But we expect that solr automatically check > > uniqueness in all routed collections. And if solr found item in other > > collection, than will be automatically deleted. But not !!! > > > > Is this expected behaviour? > > > > Could be used this functionality for issue we have? Or could someone > > suggest > > another solution, which ensure that we have all new items ready for NRT > > searches? > > > > Thanks for your help > > > > > > > > > > > > > > -- > > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html > > > Hi, we know this page, and we understand how commits and transaction logs works, but as I said we have a very big index size ;-) Therefore we cannot create commits to often. We must cache data for fast search, and if we will commit to often, then we can any cache throw out. Now we have only one server, and we prepare new solution with Solr Cloud. Where we would have several servers. We have limited resources and we cannot afford to have for example 20 Solr servers, which I believe is a standard solution for big indexes. Therefore we search for some compromise between price/performance. Therefore we think about have more collections. And one collection would be a daily feed (small index) and then we can commit every several seconds. And these collections would be merge to main collection alias. Do you have another idea? Best
Re: NRT for new items in index
Do you have some more information on index and size? Do you have to store everything in the Index? Can you store some data (blobs etc) outside ? I think you are generally right with your solution, but also be aware that it is sometimes cheaper to have several servers instead keeping engineer busy for some months to find a solution. I don’t say this is the case in your solution and I am also not a fan at throwing hardware at a problem, but an engineer (even if it affects him/herself) should always make that decision. That does not necessarily mean that engineer looses a job - one can implement other valuable features for a customer. > Am 06.08.2019 um 08:21 schrieb Updates Profimedia : > > > >> On 2019/08/03 18:00:28, Furkan KAMACI wrote: >> Hi, >> >> First of all, could you check here: >> https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >> to >> better understand hard commits, soft commits and transaction logs to >> achieve NRT search. >> >> Kind Regards, >> Furkan KAMACI >> >>> On Wed, Jul 31, 2019 at 3:47 PM profiuser wrote: >>> >>> Hi, >>> >>> we have something about 400 000 000 items in a solr collection. >>> We have set up auto commit property for this collection to 15 minutes. >>> Is a big collection and we using some caches etc. Therefore we have big >>> autocommit value. >>> >>> This have disadvantage that we haven't NRT searches. >>> >>> We would like to have NRT at least for searching for the newly added items. >>> >>> We read about new functionality "Category routed alilases" in a solr >>> version >>> 8.1. >>> >>> And we got an idea, that we could add to our collection schema field for >>> routing. >>> And at the time of indexing we check if item is new and to routing field we >>> set up value "new", or the item is older than some time period we set up >>> value to "old". >>> And we will have one category routed alias routedCollection, and there will >>> be 2 collections old and new. >>> >>> If we index new item, router choose new collection and this item is >>> inserted >>> to it. After some period we reindex item and we decide that this item is >>> old >>> and to routing field we set up value "old". Router decide to update >>> (insert) >>> item to collection old. But we expect that solr automatically check >>> uniqueness in all routed collections. And if solr found item in other >>> collection, than will be automatically deleted. But not !!! >>> >>> Is this expected behaviour? >>> >>> Could be used this functionality for issue we have? Or could someone >>> suggest >>> another solution, which ensure that we have all new items ready for NRT >>> searches? >>> >>> Thanks for your help >>> >>> >>> >>> >>> >>> >>> -- >>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>> >> > > Hi, > > we know this page, and we understand how commits and transaction logs works, > but as I said we have a very big index size ;-) Therefore we cannot create > commits to often. > We must cache data for fast search, and if we will commit to often, then we > can any cache throw out. > > Now we have only one server, and we prepare new solution with Solr Cloud. > Where we would have several servers. We have limited resources and we cannot > afford to have for example 20 Solr servers, which I believe is a standard > solution for big indexes. > > Therefore we search for some compromise between price/performance. Therefore > we think about have more collections. And one collection would be a daily > feed (small index) and then we can commit every several seconds. And these > collections would be merge to main collection alias. > > Do you have another idea? > > Best > > >