Disastor Scenario Question Regarding Tlog+pull solrcloud setup

2020-03-04 Thread Sandeep Dharembra
Hi,

My question is about the solrcloud cluster we are trying to have. We have a
collection with Tlog and pull type replicas. We intend to keep all the
tlogs on one node and use that for writing and pull replicas distributed on
the remaining nodes.

What we have noticed is that when the Tlog node goes down and let's say the
disk is also lost, when the node is brought back up (since it was the only
node holding Tlog replicas for all shards it is not deleted from state.json

1) There is no way to remove the node from the cluster since the remaining
pull cannot become leader

2) Since the disk is blank (like a new node), when solr comes up the Tlog
replicas remain down. If we create the core folders with only
core.properties the Tlog replicas become active without any data but in
this case pull replicas sync and become blank as well

Is there a way to stop this sync of pull replicas in solrcloud mode
temporarily while we populate the index on tlogs. We can do this in legacy
mode

We understand that we can have multiple tlogs as replicas to solve this but
wanted to know how we can stop this replication.

Thanks,
Sandeep


Fwd: Issue With Autoscaling

2020-03-04 Thread Sandeep Dharembra
Hi,

*Here are the details of what we are trying to do -*

1) Setup a solr cloud cluster using solr version 8.4.1
2) Replica type - Tlog + Pull setup
3) All Tlog replicas to be placed on one node
4) All pull replicas need to be placed on the remaining nodes
5) External Zookeeper to be used
6) On a node going down, we want the node to be deleted from the cluster
7) On node joining back the cluster, replicas for all shards should be
appropriately added back
8) Number of shards is not yet decided as of now
9) Node to host Tlog replicas will have solr running on port 8984 and the
ones hosting pull have solr running on port 8985

*Steps -*

1) Setup solr as a system service on 3 nodes - 1 Tlog and 2 Pull (Zookeeper
chroot was created)
2) Uploaded a dummy config on zookeeper
3) Start solr process on all three nodes
4) Add policy/preferences to the cluster (copied at the end of the mail)
5) Use the below commands to create the collection


/solr/admin/collections?action=CREATE&name=&numShards=4&maxShardsPerNode=8&tlogReplicas=1&collection.configName=



/solr/admin/collections?action=MODIFYCOLLECTION&collection=&replicationFactor=3

/solr/admin/collections?action=ADDREPLICA&collection=
&shard=shard1&type=pull&pullReplicas=2
/solr/admin/collections?action=ADDREPLICA&collection=
&shard=shard2&type=pull&pullReplicas=2
/solr/admin/collections?action=ADDREPLICA&collection=
&shard=shard3&type=pull&pullReplicas=2
/solr/admin/collections?action=ADDREPLICA&collection=
&shard=shard4&type=pull&pullReplicas=2

6) Stop solr on one of the server hosting pull replicas and wait for 2
minutes as per the trigger waitfor

Node gets removed from the cluster - no issues here

7) Restart solr process on the same node and wait another 2 minutes

*ISSUE *

Solr starts adding pull replicas only for Shard4 on the just added node. It
keeps on doing so till the node goes OOM

If we change the policy from {"replica": "#ALL"  , "shard": "#EACH" ,
"type": PULL, "nodeset": {"port":"8985"}} TO {"replica": ">3"  , "shard":
"#EACH" , "type": PULL, "nodeset": {"port":"8985"}}, we have no problems
and pull replicas of all shards get added to the just added node.

We would like to avoid the hardcoding of numbers and percentage and want to
restrict tlogs to certain nodes which would not be serving read requests
via specifying preferred replicas for serving requests (more like master
slave architecture of solr 4.X we currently use)

Any help would be appreciated

Thanks,
Sandeep



*autoscaling.json for reference*

{
  "responseHeader":{
"status":0,
"QTime":20},
  "cluster-preferences":[{
  "minimize":"cores",
  "precision":1}
,{
  "maximize":"freedisk",
  "precision":10}
,{
  "minimize":"sysLoadAvg"}],
  "cluster-policy":[{
  "replica":"#ALL",
  "shard":"#EACH",
  "type":"PULL",
  "nodeset":{"port":"8985"}}
,{
  "replica":"#ALL",
  "type":"TLOG",
  "nodeset":{"port":"8984"}}
,{
  "replica":0,
  "type":"PULL",
  "nodeset":{"port":"8984"}}],
  "triggers":{
".scheduled_maintenance":{
  "name":".scheduled_maintenance",
  "event":"scheduled",
  "startTime":"NOW",
  "every":"+1DAY",
  "enabled":true,
  "actions":[{
  "name":"inactive_shard_plan",
  "class":"solr.InactiveShardPlanAction"},
{
  "name":"inactive_markers_plan",
  "class":"solr.InactiveMarkersPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
"node_lost_trigger":{
  "event":"nodeLost",
  "waitFor":120,
  "preferredOperation":"DELETENODE",
  "actions":[{
  "name":"compute_plan",
  "class":"solr.ComputePlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
"node_added_trigger":{
  "event":"nodeAdded",
  "waitFor":120,
  "preferredOperation":"ADDREPLICA",
  "replicaType":"PULL",
  "actions":[{
  "name":"compute_plan",
  "class":"solr.ComputePlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]},
".auto_add_replicas":{
  "name":".auto_add_replicas",
  "event":"nodeLost",
  "waitFor":120,
  "enabled":true,
  "actions":[{
  "name":"auto_add_replicas_plan",
  "class":"solr.AutoAddReplicasPlanAction"},
{
  "name":"execute_plan",
  "class":"solr.ExecutePlanAction"}]}},
  "listeners":{
".scheduled_maintenance.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":["STARTED",
"ABORTED",
"SUCCEEDED",
"FAILED",
"BEFORE_ACTION",
"AFTER_ACTION",
"IGNORED"],
  "trigger":".scheduled_maintenance",
  "class":"org.apache.solr.cloud.autoscaling.SystemLogListener"},
"node_added_trigger.system":{
  "beforeAction":[],
  "afterAction":[],
  "stage":["STARTED",
"ABORT

SolrCloud - Underlying core creation failed while creating collection with new configset

2020-03-04 Thread Vignan Malyala
Hi
I created a new config set as mentioned in Solr Cloud documentation using
upload zip.
I get this error when I try to create a collection using my new configset.

Error from shard: http://X.X.X.X:8983/solr

OverseerCollectionMessageHandler Cleaning up collection [test5].

Collection: test5 operation: create
failed:org.apache.solr.common.SolrException: Underlying core creation
failed while creating collection: test5
at 
org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:303)
at 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
at 
org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)



Please help me out with this.

Regards,

Sai Vignan


Re: Custom update processor and race condition with concurrent requests

2020-03-04 Thread Sachin Divekar
So, the idea was that Solr serializes writes while appending records to
tlogs. Combining that with realtime gets to make decisions to modify the
request would achieve the result.

> I do wonder if it’s possible to insure that a given doc is always updated
from the same thread?  I’m assuming that the root of your issue is that
you’re pushing updates in parallel and the same doc is being updated from
two different places.

Yes, you are right. I am trying to find out a solution to this issue. But I
guess as Solr is not a transactional database this is not possible easily.
I was trying to do something by using the fact that Solr serializes writes
to tlogs.

On Wed, Mar 4, 2020 at 2:31 AM Erick Erickson 
wrote:

> I guess I’m missing something. Assuming that S1 and S2 are sent in
> different batches from different threads from your client, there are any
> number of ways they could arrive out of order. Network delays, client
> delays, etc. So I don’t se any way to serialize them reliably.
>
> If they’re sent either in the same batch or by the same thread, then they
> should be sequential and I’d look again at your custom processor to see
> what’s happening there.
>
> I do wonder if it’s possible to insure that a given doc is always updated
> from the same thread? I’m assuming that the root of your issue is that
> you’re pushing updates in parallel and the same doc is being updated from
> two different places.
>
> Best,
> Erick
>
> > On Mar 3, 2020, at 11:23, Sachin Divekar  wrote:
> >
> > Thank, Erick.
> >
> > I think I was not clear enough. With the custom update processor, I'm not
> > using optimistic concurrency at all. The update processor just modifies
> the
> > incoming document with updated field values and atomic update
> instructions.
> > It then forwards the modified request further in the chain. So, just to
> be
> > clear in this test setup optimistic concurrency is not in the picture.
> >
> > However, it looks like if I want to run concurrent update requests I will
> > have to use optimistic concurrency, be it in update processor or in the
> > client. I was wondering if I can avoid that by serializing requests at
> the
> > update processor level.
> >
> >> Hmmm, _where_ is your custom update processor running? And is this
> > SolrCloud?
> > Currently, it's a single node Solr but eventually, it will be SolrCloud.
> I
> > am just testing the idea of doing something like this. Right now I am
> > running the custom update processor before DistributedProcessor in the
> > chain.
> >
> >> If you run it _after_ the update is distributed (i.e. insure it’ll run
> on
> > the leader) _and_ you can insure that your custom update processor is
> smart
> > enough to know which version of the document is the “right” one, I should
> > think you can get this to work.
> > I think that's the exact problem. My update processor fetches the
> document,
> > updates the request object and forwards it in the chain. The two
> concurrent
> > instances (S1 and S2) of the update processor can fetch the document, get
> > value 'x' of field 'f1' at the same time and process them whereas
> ideally,
> > S2 should see the value updated by S1.
> >
> > S1: fetches id1 -> gets f1: x -> sets f1: y -> Solr append it to tlog
> > S2: fetches id1 -> gets f1: x .. ideally it should get 'y'
> >
> > Is that possible with UpdateProcessor? I am using realtimeget (
> > RealTimeGetComponent.getInputDocument()) in the update processor to fetch
> > the document.
> >
> >> You’ll have to use “real time get”, which fetches the most current
> > version of the document even if it hasn’t been committed and reject the
> > update if it’s too old. Anything in this path requires that the desired
> > update doesn’t depend on the value already having been changed by the
> first
> > update...
> >
> > In the case of multiple concurrent instances of the update processor
> > are RealTimeGetComponent.getInputDocument()
> > calls serialzed?
> >
> > thank you
> > Sachin
>


Re: SolrCloud - Underlying core creation failed while creating collection with new configset

2020-03-04 Thread Erick Erickson
You need to look at the solr logs on the machine where the attempt was made to 
create the replica...

Best,
Erick

> On Mar 4, 2020, at 03:24, Vignan Malyala  wrote:
> 
> Hi
> I created a new config set as mentioned in Solr Cloud documentation using
> upload zip.
> I get this error when I try to create a collection using my new configset.
> 
> Error from shard: http://X.X.X.X:8983/solr
> 
> OverseerCollectionMessageHandler Cleaning up collection [test5].
> 
> Collection: test5 operation: create
> failed:org.apache.solr.common.SolrException: Underlying core creation
> failed while creating collection: test5
>   at 
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:303)
>   at 
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
>   at 
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 
> 
> 
> Please help me out with this.
> 
> Regards,
> 
> Sai Vignan


Re: Custom update processor and race condition with concurrent requests

2020-03-04 Thread Walter Underwood
This really, really looks like something that should be done with a
database, not with Solr. This assumes a transactional model, which
Solr doesn’t have. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 3, 2020, at 7:56 PM, Sachin Divekar  wrote:
> 
> Thanks for the reply, Chris. Sure, I will start from the beginning and
> explain the problem I'm trying to solve.
> 
> We have objects which we index in Solr. They go through state transitions
> based on various events in their life. But, the events can come out of
> sequence. So, to maintain that consistency we need to implement rules while
> updating the document state in Solr. e.g. if the old state is X and the new
> one is Y then update status field, if the old state is Y and the new one is
> X then do not update status field, etc.
> 
> This is a distributed system and the events of the same object can be
> produced on different nodes. They are updated into Solr on the same node.
> This is SolrCloud setup so these updates can be received by different Solr
> nodes.
> 
> We have already implemented it by using optimistic concurrency and realtime
> get. The client program runs on each node where the events are produced.
> Summary of the processing the client does as follows:
> 
> - the client batches multiple events
> - it uses _version_ to /update the records
> - based on various conflicts it modifies the records for which update failed
> - it /updates the modified records
> 
> That works fine but there is a lot of to and fro between the client and
> Solr and the implementation is complex.
> 
> So, I thought it can be simplified by moving this state transitions and
> processing logic into Solr by writing a custom update processor. The idea
> occurred to me when I was thinking about Solr serializing multiple
> concurrent requests for a document on the leader replica. So, my thought
> process was if I am getting this serialization for free I can implement the
> entire processing inside Solr and a dumb client to push records to Solr
> would be sufficient. But, that's not working. Perhaps the point I missed is
> that even though this processing is moved inside Solr I still have a race
> condition because of time-of-check to time-of-update gap.
> 
> While writing this it just occurred to me that I'm running my custom update
> processor before DistributedProcessor. I'm committing the same XY crime
> again but if I run it after DistributedProcessor can this race condition be
> avoided?
> 
> My secondary purpose in doing this exercise is to understand how Solr and
> distributed databases in general work. And that's the reason I am coming up
> with these hypotheses and try to validate them.
> 
> thanks
> Sachin
> 
> On Wed, Mar 4, 2020 at 12:09 AM Chris Hostetter 
> wrote:
> 
>> 
>> It sounds like fundementally the problem you have is that you want solr to
>> "block" all updates to docId=X ... at the update processor chain level ...
>> until an existing update is done.
>> 
>> but solr has no way to know that you want to block at that level.
>> 
>> ie: you asked...
>> 
>> : In the case of multiple concurrent instances of the update processor
>> : are RealTimeGetComponent.getInputDocument()
>> : calls serialzed?
>> 
>> ...but the answer to that question isn't really relevant, because
>> regardless of the answer, there is no garuntee at the java thread
>> scheduling level that the operations your custom code performs on the
>> results will happen in any particular order -- even if
>> RealTimeGetComponent.getInputDocument(42) where to block other concurrent
>> calls to RealTimeGetComponent.getInputDocument(42) that wouldn't ensure
>> that the custom code you have in Thread1 that calls that method will
>> finish it's modifications to the SolrInputDocument *before* the same
>> custom code in Thread2 calls RealTimeGetComponent.getInputDocument(42).
>> 
>> The only way to do something like this would be to add locking in your
>> custom code itself -- based on the uniqueKey of the document -- to say
>> "don't allow another thread to modify this document until i'm done" and
>> keep that lock held until the delegated processAdd call finishes (so you
>> know that the other update processors include RunUpdateProcessor has
>> finished) ... but that would only work (easily) in a single node
>> situation, in a multinode situation you'd have to first check the state of
>> the request and ensure that your processor (and it's locking logic) only
>> happen on the "leader" for that document, and deal with things at a
>> distributed level ... andyou've got a whole host of new headaches.
>> 
>> I would really suggest you take a step back and re-think your objectve,
>> and share with us the "end goal" you're trying to achieve with this custom
>> update processor, because it seems you may haveheaded down an
>> uneccessarily complex route.
>> 
>> what exactly is it you're trying to achieve?
>> 
>> https://people.apache.org/~hossman/#xyprobl

Re: Solr Search cause high CPU with **

2020-03-04 Thread Shreyas Kothiya
Thanks you for quick response.

we do use wild card searches. I am sorry if i have not asked my question 
correctly. 

in SOLR if you search with consecutive asterisks [**]  does not like it and 
cause high CPU. and does not return any result.

q= N*W* 154 ** underpass was converted to 
key&q=((text:N*W*+OR+text:154+OR+text:**+OR+text:underpass))

After doing the testing in our lower environment it appears that it caused 
because of consecutive asterisks in search term above. 
I was able to reproduce it by just searching q=**

  

  


> On Mar 3, 2020, at 10:50 PM, em...@yeikel.com wrote:
> 
> According to the documentation, the standard query parser uses asterisks to
> do wild card searches[1]. If you do not need to do wildcard queries and what
> you are trying to do is to use the asterisks as a search term, you should
> escape it[2]
> 
> [1]
> https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheS
> tandardQueryParser-WildcardSearches
> [2]
> https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html#TheS
> tandardQueryParser-EscapingSpecialCharacters 
> 
> -Original Message-
> From: Shreyas Kothiya  
> Sent: Tuesday, March 3, 2020 11:43 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Search cause high CPU with **
> 
> Hello
> 
> we are using SOLR 6.6. we recently noticed unusual CPU spike. while one of
> customer did search with ** in query string.
> 
> we had customer who searched for following term
> 
> q = N*W* 154 ** underpass
> 
> it caused CPU to go above 80%. our normal range of CPU is around 20%.
> I wanted to know few things.
> 
> 1. what does ** mean in SOLR search.
> 2. Is there a bug filed already for this issue.
> 
> 
> Please let me know if you need more information.
> 
> Thanks
> Shreyas Kothiya
> 



Re: Custom update processor and race condition with concurrent requests

2020-03-04 Thread Chris Hostetter


: So, I thought it can be simplified by moving this state transitions and
: processing logic into Solr by writing a custom update processor. The idea
: occurred to me when I was thinking about Solr serializing multiple
: concurrent requests for a document on the leader replica. So, my thought
: process was if I am getting this serialization for free I can implement the
: entire processing inside Solr and a dumb client to push records to Solr
: would be sufficient. But, that's not working. Perhaps the point I missed is
: that even though this processing is moved inside Solr I still have a race
: condition because of time-of-check to time-of-update gap.

Correct.  Solr is (hand wavy) "locking" updates to documents by id on the 
leader node to ensure they are transactional, but that locking happens 
inside DistributedUpdateProcessor, other update processors don't run 
"inside" that lock.

: While writing this it just occurred to me that I'm running my custom update
: processor before DistributedProcessor. I'm committing the same XY crime
: again but if I run it after DistributedProcessor can this race condition be
: avoided?

no.  that would just introduce a whole new host of problems that are a 
much more ivolved conversation to get into (remeber: the processors after 
DUH run on every replica, after the leader has already assigned a 
version and said this update should go thorugh ... so now imagine what 
your error handling logic has to look like?)


Ultimately the goal that you're talking about really feels like "business 
logic that requires syncronizing/blocking updates" but you're trying to 
avoid writing a syncronized client to do that syncronization and error 
handling before forwarding those updates to solr.

I mean -- even with your explanation of your goal, there is a whole host 
of nuance / use case specific logic that has to go into "based on various 
conflicts it modifies the records for which update failed" -- and that 
logic seems like it would affect the locking: if you get a request that 
violates the legal state transition because of another request that 
(blocked it until it) just finished  now what?  fail? apply some new 
rules?

this seems like logic you should really want in a "middle ware" layer that 
your clients talk to and sends docs to solr.

If you *REALLY* want to try and piggy back this logic into solr, then 
there is _one_ place i can think of where you can "hook in" to the logic 
DistributedUpdateHandler does while "locking" an id on the leader, and 
that would be extending the AtomicUpdateDocumentMerger...

It's marked experimental, and I don't really understand the use cases 
for why it exists, and in order to customize this you would have to 
also subclass DistributedUpdateHandlerFactory to build your custom 
instance and pass it to the DUH constructor, but then -- in theory -- you 
could intercept any document update *after* the RTG, and before it's 
written to the TLOG, and apply some business logic.

But i wouldn't recommend this ... "the'r be Dragons!"



-Hoss
http://www.lucidworks.com/


Re: Custom update processor and race condition with concurrent requests

2020-03-04 Thread Sachin Divekar
Thanks, Chris.

I think I should stop thinking about doing it in Solr. Anyway, I was just
trying to see how far I can go.

On Wed, Mar 4, 2020 at 11:50 PM Chris Hostetter 
wrote:

>
> : So, I thought it can be simplified by moving this state transitions and
> : processing logic into Solr by writing a custom update processor. The idea
> : occurred to me when I was thinking about Solr serializing multiple
> : concurrent requests for a document on the leader replica. So, my thought
> : process was if I am getting this serialization for free I can implement
> the
> : entire processing inside Solr and a dumb client to push records to Solr
> : would be sufficient. But, that's not working. Perhaps the point I missed
> is
> : that even though this processing is moved inside Solr I still have a race
> : condition because of time-of-check to time-of-update gap.
>
> Correct.  Solr is (hand wavy) "locking" updates to documents by id on the
> leader node to ensure they are transactional, but that locking happens
> inside DistributedUpdateProcessor, other update processors don't run
> "inside" that lock.
>

Understood. I was not thinking clearly about locking.


>
> : While writing this it just occurred to me that I'm running my custom
> update
> : processor before DistributedProcessor. I'm committing the same XY crime
> : again but if I run it after DistributedProcessor can this race condition
> be
> : avoided?
>
> no.  that would just introduce a whole new host of problems that are a
> much more ivolved conversation to get into (remeber: the processors after
> DUH run on every replica, after the leader has already assigned a
> version and said this update should go thorugh ... so now imagine what
> your error handling logic has to look like?)
>

I completely missed that post-processors run on every replica. It will be
too convoluted to implement.


>
>
> Ultimately the goal that you're talking about really feels like "business
> logic that requires syncronizing/blocking updates" but you're trying to
> avoid writing a syncronized client to do that syncronization and error
> handling before forwarding those updates to solr.
>
> I mean -- even with your explanation of your goal, there is a whole host
> of nuance / use case specific logic that has to go into "based on various
> conflicts it modifies the records for which update failed" -- and that
> logic seems like it would affect the locking: if you get a request that
> violates the legal state transition because of another request that
> (blocked it until it) just finished  now what?  fail? apply some new
> rules?
>
> this seems like logic you should really want in a "middle ware" layer that
> your clients talk to and sends docs to solr.
>
> If you *REALLY* want to try and piggy back this logic into solr, then
> there is _one_ place i can think of where you can "hook in" to the logic
> DistributedUpdateHandler does while "locking" an id on the leader, and
> that would be extending the AtomicUpdateDocumentMerger...
>
> It's marked experimental, and I don't really understand the use cases
> for why it exists, and in order to customize this you would have to
> also subclass DistributedUpdateHandlerFactory to build your custom
> instance and pass it to the DUH constructor, but then -- in theory -- you
> could intercept any document update *after* the RTG, and before it's
> written to the TLOG, and apply some business logic.
>
> But i wouldn't recommend this ... "the'r be Dragons!"
>

Thanks for this explanation. Yes, that's too dangerous and really not worth
the effort.

I think I am concluding this exercise now. I will stick to my older
implementation where I am handling state transitions on the
clientside using optimistic locking.

--
Sachin


exactMatchFirst Solr Suggestion Component

2020-03-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
 Hi All,

Would anyone be able to help me debug my suggestion component? Right now, our 
config looks like this: 


  
mySuggester
FuzzyLookupFactory
FileDictionaryFactory
./conf/queries_list_with_weights.txt
,
conf
keywords_w3_en
false
  


We like the idea of the FuzzyLookupFactory because of how it interacts with 
misspelled prefixes. However, we are finding that the exactMatchFirst 
parameter, which is supposed to be set to true by default in the code, is NOT 
showing exact match prefixes first. I think this is because of the weights we 
have with each term. However, the documentation specifically states that 
exactMatchFirst is meant to ignore weights 
(https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/suggester.html#fuzzylookupfactory).
 

For the prefix "box" this is what our suggestions list looks like. You can see 
that "bond" is above other results I would expect to be above it, such as 
"box@ibm," etc.:

{
  "responseHeader":{
"status":0,
"QTime":112},
  "command":"build",
  "suggest":{"mySuggester":{
  "box":{
"numFound":8,
"suggestions":[{
"term":"box",
"weight":1799,
"payload":""},
  {
"term":"bond",
"weight":805,
"payload":""},
  {
"term":"box@ibm",
"weight":202,
"payload":""},
  {
"term":"box at ibm",
"weight":54,
"payload":""},
  {
"term":"books",
"weight":45,
"payload":""},
  {
"term":"box drive",
"weight":34,
"payload":""},
  {
"term":"books 24x7",
"weight":31,
"payload":""},
  {
"term":"box sync",
"weight":31,
"payload":""}]

Any help is greatly appreciated!

Best,
Audrey



Re: SolrCloud - Underlying core creation failed while creating collection with new configset

2020-03-04 Thread Vignan Malyala
Hi Erick,
Did see any extra error in solr logs. Its the same error I mentioned
earlier.
I'm using SolrCloud by the way.

On Wed, Mar 4, 2020 at 8:06 PM Erick Erickson 
wrote:

> You need to look at the solr logs on the machine where the attempt was
> made to create the replica...
>
> Best,
> Erick
>
> > On Mar 4, 2020, at 03:24, Vignan Malyala  wrote:
> >
> > Hi
> > I created a new config set as mentioned in Solr Cloud documentation using
> > upload zip.
> > I get this error when I try to create a collection using my new
> configset.
> >
> > Error from shard: http://X.X.X.X:8983/solr
> >
> > OverseerCollectionMessageHandler Cleaning up collection [test5].
> >
> > Collection: test5 operation: create
> > failed:org.apache.solr.common.SolrException: Underlying core creation
> > failed while creating collection: test5
> >   at
> org.apache.solr.cloud.api.collections.CreateCollectionCmd.call(CreateCollectionCmd.java:303)
> >   at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
> >   at
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
> >   at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> >   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >   at java.lang.Thread.run(Thread.java:748)
> >
> >
> >
> > Please help me out with this.
> >
> > Regards,
> >
> > Sai Vignan
>