date:20200707

RE: Time-out errors while indexing (Solr 7.7.1)

2020-07-07 Thread Kommu, Vinodh K.

Hi Eric, Toke,

Can you please look at the details shared in my trail email & respond with your 
suggestions/feedback?


Thanks & Regards,
Vinodh

From: Kommu, Vinodh K.
Sent: Monday, July 6, 2020 4:58 PM
To: solr-user@lucene.apache.org
Subject: RE: Time-out errors while indexing (Solr 7.7.1)


Thanks Eric & Toke for your response over this.





Just wanted to correct few things here about number of docs:



Total number of documents exists in the entire cluster (all collections) = 
6393876826 (6.3B)

Total number of documents exists on 2 bigger collections (3749389864 & 
1780147848) = 5529537712 (5.5B)

Total number of documents exists on remaining collections = 864339114 (864M)



So all collections docs altogether do not have 13B. If you see above numbers, 
the biggest collection in the cluster holds close to 3.7B docs and second 
biggest collection holds upto 1.7B docs whereas remaining 20 collections in the 
cluster holds 864M docs only which gives the total docs in the cluster is 6.3B 
docs



On hardware side, cluster sits on 6 solr VMs, each VMs has 170G total memory 
(with 2 solr instances running per VM), 16 vCPUs and each solr JVM runs with 
31G heap. Remaining memory is allocated to OS disk cache and other OS related 
operations. Vm.swapiness on each VM is set to 0 so swap memory will be never 
used. Each collection is created using rule based replica placement API with 6 
shards and replicas factor as 3.



One other observation with collections cores placement, as mentioned above we 
create collections using rule based replica placement i.e. rule to ensure no 
same shard’s replica should sit on same VM using following command.



curl -s -k user:password 
"https://localhost:22010/solr/admin/collections?action=CREATE&name=$SOLR_COLLECTION&numShards=${SHARDS_NO?}&replicationFactor=${REPLICATION_FACTOR?}&maxShardsPerNode=${MAX_SHARDS_PER_NODE?}&collection.configName=$SOLR_COLLECTION&rule=shard:*,replica:<2,host:*"



Variable values in above command:



SOLR_COLLECTION = collection name

SHARDS_NO = 6

REPLICATION_FACTOR = 3

MAX_SHARDS_PER_NODE = (a math logic will work based on number of solr VMs, 
number of nodes per VM and total number of replicas i.e total number of 
replicas / number of VMs. Here in this cluster the number would be 18/6 = 3 max 
shards per machine)





Ideally it is supposed to create 3 cores per VM for each collection based on 
rule based replica placement but from below snippet, there were 2, 3 & 4 cores 
for each collection are placed differently on each VMs. So apparently VM2 and 
VM6 have more cores than other VMs so I presume this could be one of the reason 
to see more IO operations than remaining 4 VMs.





That said, I believe solr does this replica placement considering other factors 
like free disk on each VM etc while creating a new collection correct? If so, 
is this replica placement across the VMs are fine? If not, what's needed to 
correct this? Can an additional core with 210G size can create more disk IO 
operations? If yes, can move the additional core from these VMs to other VM 
where the cores are less make any difference? (like ensuring each VM has only 
max of 3 shards)



Also we have been noticing significant surge in IO operations at storage level 
too. Wondering to understand if storage has IOPS limit could make solr crave 
for IO operations or other way around which is solr make more read write 
operations leading storage IOPS to reach its higher limit?





VM1:



176G  node1/solr/Collection2_shard5_replica_n30



176G  node2/solr/Collection2_shard2_replica_n24



176G  node2/solr/Collection2_shard3_replica_n2



177G  node1/solr/Collection2_shard6_replica_n10



208G  node1/solr/Collection1_shard5_replica_n18



208G  node2/solr/Collection1_shard2_replica_n1



1.1T  total





VM2:



176G  node2/solr/Collection2_shard4_replica_n16



176G  node2/solr/Collection2_shard6_replica_n34



177G  node1/solr/Collection2_shard5_replica_n6



207G  node2/solr/Collection1_shard6_replica_n10



208G  node1/solr/Collection1_shard1_replica_n32



208G  node2/solr/Collection1_shard5_replica_n30



210G  node1/solr/Collection1_shard3_replica_n14



1.4T  total





VM3:



175G  node2/solr/Collection2_shard2_replica_n12



177G  node1/solr/Collection2_shard1_replica_n20



208G  node1/solr/Collection1_shard1_replica_n8



208G  node2/solr/Collection1_shard2_replica_n12



209G  node1/solr/Collection1_shard4_replica_n28



976G  total





VM4:



176G  node1/solr/Collection2_shard4_replica_n28



177G  node1/solr/Collection2_shard1_replica_n8



207G  node2/solr/Collection1_shard6_replica_n22



208G  node1/solr/Collection1_shard5_replica_n6



210G  node1/solr/Collection1_shard3_replica_n26



975G  total





VM5:



176G  node2/solr/Collection2_shard3_replica_n14



177G  node1/solr/Collection2_shard5_replica_n18



177G  node2/solr/Collection2_shard1_replica_n32



208G  node1/solr/Collection1_shard2_replica_n24



210G  node1/solr/Collection1_shard

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-07 Thread vishal patel

Any one is looking my issue? Please guide me.

Regards,
Vishal Patel



From: vishal patel 
Sent: Monday, July 6, 2020 7:11 PM
To: solr-user@lucene.apache.org 
Subject: Replica goes into recovery mode in Solr 6.1.0

I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 2 
shards and each shard has 1 replica. We have 3 collection.
We do not use any cache and also disable in Solr config.xml. Search and Update 
requests are coming frequently in our live platform.

*Our commit configuration in solr.config are below

60
   2
   false


   ${solr.autoSoftCommit.maxTime:-1}


*We used Near Real Time Searching So we did below configuration in solr.in.cmd
set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100

*Our collections details are below:

Collection  Shard1  Shard1 Replica  Shard2  Shard2 Replica
Number of Documents Size(GB)Number of Documents Size(GB)
Number of Documents Size(GB)Number of Documents Size(GB)
collection1 26913364201 26913379202 26913380
198 26913379198
collection2 13934360310 13934367310 13934368
219 13934367219
collection3 351539689   73.5351540040   73.5351540136   
75.2351539722   75.2

*My server configurations are below:

Server1 Server2
CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 
Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 
Mhz, 10 Core(s), 20 Logical Processor(s)
HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB)
Total memory(GB)320 320
Shard1 Allocated memory(GB) 55
Shard2 Replica Allocated memory(GB) 55
Shard2 Allocated memory(GB) 55
Shard1 Replica Allocated memory(GB) 55
Other Applications Allocated Memory(GB) 60  22
Other Number Of Applications11  7


Sometimes, any one replica goes into recovery mode. Why replica goes into 
recovery? Due to heavy search OR heavy update/insert OR long GC pause time? If 
any one of them then what should we do in configuration?
Should we increase the shard for recovery issue?

Regards,
Vishal Patel

LTR feature computation caching

2020-07-07 Thread krishan goyal

Hi,

I am adding few features to my LTR model which re-uses the same value for
different features.

For example, I have features that compare different similarities for each
document with the input text: "token1 token2 token3 token4"

My features are

   - No of common terms
   - No of common terms / Term count in document
   - Term count in document - No of common terms
   - 4 - No of common terms
   - Boolean feature : Is no of common terms == 3

As you can see "No of common terms" is recomputed for each feature. Feature
cache caches the values per feature and isn't helpful here.

Is there any way where "No of common terms" is computed per document only
once and can be shared for all features for that document ?

Max number of documents in update request

2020-07-07 Thread Sidharth Negi

Hi,

Could someone help me with the best way to go about determining the maximum
number of docs I can send in a single update call to Solr in a master /
slave architecture.

Thanks!

Re: Max number of documents in update request

2020-07-07 Thread Erick Erickson

As many as you can send before blowing up.

Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500?

And I don’t think it’s a good use of time to pursue much. See:

https://lucidworks.com/post/really-batch-updates-solr-2/

If you’re looking at trying to maximize throughput, adding
client threads that send Solr documents is a better approach.

All that said, I usually just pick 1,000 and don’t worry about it.

Best,
Erick

> On Jul 7, 2020, at 8:59 AM, Sidharth Negi  wrote:
> 
> Hi,
> 
> Could someone help me with the best way to go about determining the maximum
> number of docs I can send in a single update call to Solr in a master /
> slave architecture.
> 
> Thanks!

Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-07 Thread Jae Joo

8.3.1

 
 

the field "id" is for nested document.




On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev  wrote:

> Hi,
> What's the version? What's uniqueKey? is it stored? what's fl param?
>
> On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:
>
> > I am seeing the nullPointerException in the list below and I am
> > looking for how to fix the exception.
> >
> > Thanks,
> >
> >
> > NamedList sortFieldValues =
> > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> > if (sortFieldValues.size()==0 && // we bypass merging this response
> > only if it's partial itself
> > thisResponseIsPartial) { // but not the previous
> one!!
> >   continue; //fsv timeout yields empty sort_vlaues
> > }
> >
> >
> >
> > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> > o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> > at
> >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> > at
> >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> > at
> >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> > at
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> > at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > at
> >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > at
> >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> > at
> >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> > at
> >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> > at
> >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Max number of documents in update request

2020-07-07 Thread Walter Underwood

Agreed, I do something between 20 and 1000. If the master node is not 
handling any search traffic, use twice as many client threads as there are
CPUs in the node. That should get you close to 100% CPU utilization.
One thread will be waiting while a batch is being processed and another
thread will be sending the next batch so there is no pause in processing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 6:12 AM, Erick Erickson  wrote:
> 
> As many as you can send before blowing up.
> 
> Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500?
> 
> And I don’t think it’s a good use of time to pursue much. See:
> 
> https://lucidworks.com/post/really-batch-updates-solr-2/
> 
> If you’re looking at trying to maximize throughput, adding
> client threads that send Solr documents is a better approach.
> 
> All that said, I usually just pick 1,000 and don’t worry about it.
> 
> Best,
> Erick
> 
>> On Jul 7, 2020, at 8:59 AM, Sidharth Negi  wrote:
>> 
>> Hi,
>> 
>> Could someone help me with the best way to go about determining the maximum
>> number of docs I can send in a single update call to Solr in a master /
>> slave architecture.
>> 
>> Thanks!
>

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-07 Thread Walter Underwood

This isn’t a support list, so nobody looks at issues. We do try to help.

It looks like you have 1 TB of index on a system with 320 GB of RAM.
I don’t know what "Shard1 Allocated memory” is, but maybe half of
that RAM is used by JVMs or some other process, I guess. Are you
running multiple huge JVMs?

The servers will be doing a LOT of disk IO, so look at the read and
write iops. I expect that the solr processes are blocked on disk reads
almost all the time. 

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). 
That is probably causing your outages.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 5:18 AM, vishal patel  
> wrote:
> 
> Any one is looking my issue? Please guide me.
> 
> Regards,
> Vishal Patel
> 
> 
> 
> From: vishal patel 
> Sent: Monday, July 6, 2020 7:11 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica goes into recovery mode in Solr 6.1.0
> 
> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 
> 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and 
> Update requests are coming frequently in our live platform.
> 
> *Our commit configuration in solr.config are below
> 
> 60
>   2
>   false
> 
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
> 
> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
> 
> *Our collections details are below:
> 
> Collection  Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents Size(GB)Number of Documents Size(GB)  
>   Number of Documents Size(GB)Number of Documents Size(GB)
> collection1 26913364201 26913379202 26913380  
>   198 26913379198
> collection2 13934360310 13934367310 13934368  
>   219 13934367219
> collection3 351539689   73.5351540040   73.5351540136 
>   75.2351539722   75.2
> 
> *My server configurations are below:
> 
>Server1 Server2
> CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 
> Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 
> Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)320 320
> Shard1 Allocated memory(GB) 55
> Shard2 Replica Allocated memory(GB) 55
> Shard2 Allocated memory(GB) 55
> Shard1 Replica Allocated memory(GB) 55
> Other Applications Allocated Memory(GB) 60  22
> Other Number Of Applications11  7
> 
> 
> Sometimes, any one replica goes into recovery mode. Why replica goes into 
> recovery? Due to heavy search OR heavy update/insert OR long GC pause time? 
> If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
> 
> Regards,
> Vishal Patel
>

Solr Query

2020-07-07 Thread swetha vemula

Hi,

I have an URL and I want to break this down and run it in the admin console
but I am not what is ++ and - represents in the query.
select?q=(StartPublish%3a%5b*+TO+-12-31T23%3a59%3a59.999Z%5d++-Content%3a(Birthdays%5c%2fAnniversaries))++-FriendlyUrl%3a(*%2farchive%2f*))++((Title_NGram%3a(swetha))%5e500+OR+(MetaTitle_NGram%3a(swetha))%5e400+OR+(MetaKeywords_NGram%3a(swetha))%5e300+OR+(MetaDescription_NGram%3a(swetha))%5e200+OR+(Content_NGram%3a(swetha))%5e1))++(ACL%3a((Everyone)+OR+(MIDCO410%5c%5cMidco%5c-AllEmployees)+OR+(MIDCO410%5c%5cMidco%5c-DotNetDevelopers)+OR+(MIDCO410%5c%5cMidco%5c-WebAdmins)+OR+(MIDCO410%5c%5cMidco%5c-Source%5c-Admin)&start=0&rows=1&wt=xml&version=2.2

Thank You,
Swetha.

Re: replica deleted but directory remains

2020-07-07 Thread ChienHuaWang

Hi Erick,

I also have issue about deleting collections or replicas but the data is
still in directory.
It not show in admin UI, but data is still in folder and the disk is not
clean.
Not observing specific error message, could you please advise any other
possible reason to fix this?

Regards,
Chien



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-07 Thread Mikhail Khludnev

Still not clear regarding fl param. Does request enabled timeAllowed param?
Anyway debugQuery true should give a clue why  "sort_values"  are absent in
shard response, note they should be supplied at
QueryComponent.doFieldSortValues(ResponseBuilder, SolrIndexSearcher).

On Tue, Jul 7, 2020 at 4:19 PM Jae Joo  wrote:

> 8.3.1
>
>   required="true" multiValued="false" docValues="true"/>
>   required="true" multiValued="false"/>
>
> the field "id" is for nested document.
>
>
>
>
> On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev  wrote:
>
> > Hi,
> > What's the version? What's uniqueKey? is it stored? what's fl param?
> >
> > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:
> >
> > > I am seeing the nullPointerException in the list below and I am
> > > looking for how to fix the exception.
> > >
> > > Thanks,
> > >
> > >
> > > NamedList sortFieldValues =
> > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> > > if (sortFieldValues.size()==0 && // we bypass merging this response
> > > only if it's partial itself
> > > thisResponseIsPartial) { // but not the previous
> > one!!
> > >   continue; //fsv timeout yields empty sort_vlaues
> > > }
> > >
> > >
> > >
> > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> > > at
> > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> > > at
> > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> > > at
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > > at
> > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > > at
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > > at
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> > > at
> > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr Query

2020-07-07 Thread Furkan KAMACI

Hi Swetha,

Given URL is encoded. So, you can decode it before analyzing. Plus
character is used for whitespaces when you encode a URL and minus sign
represents a negative query in Solr.

Kind Regards,
Furkan KAMACI

On Tue, Jul 7, 2020 at 9:16 PM swetha vemula 
wrote:

> Hi,
>
> I have an URL and I want to break this down and run it in the admin console
> but I am not what is ++ and - represents in the query.
>
> select?q=(StartPublish%3a%5b*+TO+-12-31T23%3a59%3a59.999Z%5d++-Content%3a(Birthdays%5c%2fAnniversaries))++-FriendlyUrl%3a(*%2farchive%2f*))++((Title_NGram%3a(swetha))%5e500+OR+(MetaTitle_NGram%3a(swetha))%5e400+OR+(MetaKeywords_NGram%3a(swetha))%5e300+OR+(MetaDescription_NGram%3a(swetha))%5e200+OR+(Content_NGram%3a(swetha))%5e1))++(ACL%3a((Everyone)+OR+(MIDCO410%5c%5cMidco%5c-AllEmployees)+OR+(MIDCO410%5c%5cMidco%5c-DotNetDevelopers)+OR+(MIDCO410%5c%5cMidco%5c-WebAdmins)+OR+(MIDCO410%5c%5cMidco%5c-Source%5c-Admin)&start=0&rows=1&wt=xml&version=2.2
>
> Thank You,
> Swetha.
>

Re: Null pointer exception in QueryComponent.MergeDds method

2020-07-07 Thread Jae Joo

Yes, we have timeAllowed=2 sec.


On Tue, Jul 7, 2020 at 2:20 PM Mikhail Khludnev  wrote:

> Still not clear regarding fl param. Does request enabled timeAllowed param?
> Anyway debugQuery true should give a clue why  "sort_values"  are absent in
> shard response, note they should be supplied at
> QueryComponent.doFieldSortValues(ResponseBuilder, SolrIndexSearcher).
>
> On Tue, Jul 7, 2020 at 4:19 PM Jae Joo  wrote:
>
> > 8.3.1
> >
> >   > required="true" multiValued="false" docValues="true"/>
> >   > required="true" multiValued="false"/>
> >
> > the field "id" is for nested document.
> >
> >
> >
> >
> > On Mon, Jul 6, 2020 at 4:17 PM Mikhail Khludnev  wrote:
> >
> > > Hi,
> > > What's the version? What's uniqueKey? is it stored? what's fl param?
> > >
> > > On Mon, Jul 6, 2020 at 5:12 PM Jae Joo  wrote:
> > >
> > > > I am seeing the nullPointerException in the list below and I am
> > > > looking for how to fix the exception.
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > NamedList sortFieldValues =
> > > > (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
> > > > if (sortFieldValues.size()==0 && // we bypass merging this response
> > > > only if it's partial itself
> > > > thisResponseIsPartial) { // but not the previous
> > > one!!
> > > >   continue; //fsv timeout yields empty sort_vlaues
> > > > }
> > > >
> > > >
> > > >
> > > > 2020-07-06 12:45:47.001 ERROR (qtp745962066-636182) [c:]]
> > > > o.a.s.h.RequestHandlerBase java.lang.NullPointerException
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:914)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:613)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:592)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:431)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:198)
> > > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2576)
> > > > at
> > > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> > > > at
> > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> > > > at
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
> > > > at
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
> > > > at
> > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Max number of documents in update request

2020-07-07 Thread Sidharth Negi

Thanks. This was useful, really appreciate it! :)

On Tue, Jul 7, 2020, 8:07 PM Walter Underwood  wrote:

> Agreed, I do something between 20 and 1000. If the master node is not
> handling any search traffic, use twice as many client threads as there are
> CPUs in the node. That should get you close to 100% CPU utilization.
> One thread will be waiting while a batch is being processed and another
> thread will be sending the next batch so there is no pause in processing.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jul 7, 2020, at 6:12 AM, Erick Erickson 
> wrote:
> >
> > As many as you can send before blowing up.
> >
> > Really, the question is not answerable. 1K docs? 1G docs? 1 field or 500?
> >
> > And I don’t think it’s a good use of time to pursue much. See:
> >
> > https://lucidworks.com/post/really-batch-updates-solr-2/
> >
> > If you’re looking at trying to maximize throughput, adding
> > client threads that send Solr documents is a better approach.
> >
> > All that said, I usually just pick 1,000 and don’t worry about it.
> >
> > Best,
> > Erick
> >
> >> On Jul 7, 2020, at 8:59 AM, Sidharth Negi 
> wrote:
> >>
> >> Hi,
> >>
> >> Could someone help me with the best way to go about determining the
> maximum
> >> number of docs I can send in a single update call to Solr in a master /
> >> slave architecture.
> >>
> >> Thanks!
> >
>
>

Suggester.count parameter for different dictionaries

2020-07-07 Thread Ing. Andrea Vettori

Hello,
we’re using different dictionaries with the suggester component for 
autocomplete with a setup similar to the following:


  
true
10
titles

suggester
  



Is there a way to specify different count options for different dictionaries ? 
For example I’d like to have suggestions for all authors (say 1000) but 10 for 
titles and just one for abstracts. The reason to have 1000 authors is to 
present the number to the user saying ‘your search matches xxx authors, click 
here to show all’ while at the same time show the most 10 relevant titles and 
just one abstract.

Thanks!

—
Ing. Andrea Vettori
Responsabile Sistemi Informativi

solr query to return matched text to regex with default schema

2020-07-07 Thread Phillip Wu

Hi,
I want to search Solr for server names in a set of Microsoft Word documents, 
PDF, and image files like jpg,gif.
Server names are given by the regular expression(regex)
INFP[a-zA-z0-9]{3,9}
TRKP[a-zA-z0-9]{3,9}
PLCP[a-zA-z0-9]{3,9}
SQRP[a-zA-z0-9]{3,9}


Problem
===
I want to get the text in the documents matching the regex. eg. INFPWSV01, 
PLCPLDB01

I've index the files using Solr/Tikka/Tesseract using the default schema.

I've used the highlight search tool
hl ticked
hl.usePhraseHighlighter ticked

Solr only returns the metadata (presumably) like filename for the file 
containing the pattern(s).

Questions
=
1. Would I have to modify the managed schema?
2. If so would I have to save the file content in the schema
3. If so is this the way to do it:
a. solrconfig.xml <- inside my "core"


true
ignored_
_text_

...
b. Remove line
ignored_
as I want meta data
c. Change this to the managed schema

stored to "true"
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "replace-field":{
 "name":"_text_",
 "type":"text_general",
 "multiValued":true,
 "indexed":true
 "stored":true }
}' http://localhost:8983/api/cores/gettingstarted/schema

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-07 Thread vishal patel

Thanks for your reply.

One server has total 320GB ram. In this 2 solr node one is shard1 and second is 
shard2 replica. Each solr node have 55GB memory allocated. shard1 has 585GB 
data and shard2 replica has 492GB data. means almost 1TB data in this server. 
server has also other applications and for that 60GB memory allocated. So total 
150GB memory is left.

Proper formatting details:
https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view

Are you running multiple huge JVMs?
>> Not huge but 60GB memory allocated for our 11 application. 150GB memory are 
>> still free.

The servers will be doing a LOT of disk IO, so look at the read and write iops. 
I expect that the solr processes are blocked on disk reads almost all the time.
>> is it chance to go in recovery mode if more IO read and write or blocked?

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
>> Our requirement is NRT so we keep the less time

Regards,
Vishal Patel

From: Walter Underwood 
Sent: Tuesday, July 7, 2020 8:15 PM
To: solr-user@lucene.apache.org 
Subject: Re: Replica goes into recovery mode in Solr 6.1.0

This isn’t a support list, so nobody looks at issues. We do try to help.

It looks like you have 1 TB of index on a system with 320 GB of RAM.
I don’t know what "Shard1 Allocated memory” is, but maybe half of
that RAM is used by JVMs or some other process, I guess. Are you
running multiple huge JVMs?

The servers will be doing a LOT of disk IO, so look at the read and
write iops. I expect that the solr processes are blocked on disk reads
almost all the time.

"-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms).
That is probably causing your outages.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jul 7, 2020, at 5:18 AM, vishal patel  
> wrote:
>
> Any one is looking my issue? Please guide me.
>
> Regards,
> Vishal Patel
>
>
> 
> From: vishal patel 
> Sent: Monday, July 6, 2020 7:11 PM
> To: solr-user@lucene.apache.org 
> Subject: Replica goes into recovery mode in Solr 6.1.0
>
> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We have 
> 2 shards and each shard has 1 replica. We have 3 collection.
> We do not use any cache and also disable in Solr config.xml. Search and 
> Update requests are coming frequently in our live platform.
>
> *Our commit configuration in solr.config are below
> 
> 60
>   2
>   false
> 
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
> *We used Near Real Time Searching So we did below configuration in solr.in.cmd
> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100
>
> *Our collections details are below:
>
> Collection  Shard1  Shard1 Replica  Shard2  Shard2 Replica
> Number of Documents Size(GB)Number of Documents Size(GB)  
>   Number of Documents Size(GB)Number of Documents Size(GB)
> collection1 26913364201 26913379202 26913380  
>   198 26913379198
> collection2 13934360310 13934367310 13934368  
>   219 13934367219
> collection3 351539689   73.5351540040   73.5351540136 
>   75.2351539722   75.2
>
> *My server configurations are below:
>
>Server1 Server2
> CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), 20 
> Logical Processor(s)Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 
> Mhz, 10 Core(s), 20 Logical Processor(s)
> HardDisk(GB)3845 ( 3.84 TB) 3485 GB (3.48 TB)
> Total memory(GB)320 320
> Shard1 Allocated memory(GB) 55
> Shard2 Replica Allocated memory(GB) 55
> Shard2 Allocated memory(GB) 55
> Shard1 Replica Allocated memory(GB) 55
> Other Applications Allocated Memory(GB) 60  22
> Other Number Of Applications11  7
>
>
> Sometimes, any one replica goes into recovery mode. Why replica goes into 
> recovery? Due to heavy search OR heavy update/insert OR long GC pause time? 
> If any one of them then what should we do in configuration?
> Should we increase the shard for recovery issue?
>
> Regards,
> Vishal Patel
>

Solr multi word search across multiple fields with mm

2020-07-07 Thread Venu

Hi 
We observed that the multi-word queries spanned across multiple fields with
mm create a problem. Any help would be appreciated.

Current Problem:
Search on words spanning across different fields with minimum match(mm) and
sow=false generates field centric query with per field mm rather than term
centric query with mm across fields when a field undergoes different
query-time analyses(like multi-word synonyms, stop words, etc)

Below are the sample field and term centric queries:

*term centric query with the query string as "amul cheese slice" (none of
the terms has synonyms):*

"parsedquery_toString": "+description:amul)^6.0 | description_l2:amul |
(description_l1:amul)^4.0 | (brand_name_h:amul)^8.0 |
(manual_tags:amul)^3.0) ((description:cheese)^6.0 | description_l2:cheese |
(description_l1:cheese)^4.0 | (brand_name_h:cheese)^8.0 |
(manual_tags:cheese)^3.0) ((description:slice)^6.0 | description_l2:slice |
(description_l1:slice)^4.0 | (brand_name_h:slice)^8.0 |
(manual_tags:slice)^3.0))~2)",

*field centric query with the query string as "amul cheese cake" (cake has a
synonym of plum cake):*

"parsedquery_toString": "+(((description:amul description:cheese
description:cake)~2)^6.0 | ((description_l2:amul description_l2:cheese
(description_l2:cupcak description_l2:pastri (+description_l2:plum
+description_l2:cake) description_l2:cake))~2) | ((description_l1:amul
description_l1:cheese description_l1:cake)~2)^4.0 | ((brand_name_h:amul
brand_name_h:cheese brand_name_h:cake)~2)^8.0 | ((manual_tags:amul
manual_tags:cheese manual_tags:cake)~2)^3.0)",


Referring to multiple blogs below helped us try different things below
1. autogeneratephrase queries 
2. per field mm q=({!edismax qf=brand_name description v=$qx mm=2}^10 OR
{!edismax qf=description_l1 manual_tags_l1 v=$qx mm=2} OR {!edismax
qf=description_l2 v=$qx mm=2} )&qx=amul cheese cake

But we observed that the above are still being converted to field centric
queries with mm per field resulting in no match if the words span across
multiple fields.





--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Time-out errors while indexing (Solr 7.7.1)

Re: Replica goes into recovery mode in Solr 6.1.0

LTR feature computation caching

Max number of documents in update request

Re: Max number of documents in update request

Re: Null pointer exception in QueryComponent.MergeDds method

Re: Max number of documents in update request

Re: Replica goes into recovery mode in Solr 6.1.0

Solr Query

Re: replica deleted but directory remains

Re: Null pointer exception in QueryComponent.MergeDds method

Re: Solr Query

Re: Null pointer exception in QueryComponent.MergeDds method

Re: Max number of documents in update request

Suggester.count parameter for different dictionaries

solr query to return matched text to regex with default schema

Re: Replica goes into recovery mode in Solr 6.1.0

Solr multi word search across multiple fields with mm

18 matches

Site Navigation

Mail list logo

Footer information