Using stream and response with /export Request Handler

2016-10-11 Thread Nkechi Achara
Hi All,

I have an export handler defined as the following:


  
{!xport}
xsort
false
  
  
query
  


I am then attempting to utilise stream and response from a SolrCloudServer
as.

server.queryAndStreamResponse(query,callback)

And i am receiving an error regarding the output received by the query
being in json, where the stream would have been possible with octet-stream
(for serialisation purposes).

Is there any way I can alter my export handler to handle this? Can I alter
wt to be something that can be parsed by the StreamingBinaryResponseParser?

Thanks in advance,

Keech


Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Jerome Yang
Hi all,

I'm facing a strange problem.

Here's a solrcloud on a single machine which has 2 solr nodes, version:
solr6.1.

I create a collection with 2 shards and replica factor is 3 with default
router called "test_collection".
Index some documents and commit. Then I backup this collection.
After that, I restore from the backup and name the restored collection
"restore_test_collection".
Query from "restore_test_collection". It works fine and data is consistent.

Then, I index some new documents, and commit.
I find that the documents are all indexed in shard1 and the leader of
shard1 don't have these new documents but other replicas do have these new
documents.

Anyone have this issue?
Really need your help.

Regards,
Jerome


Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Jerome Yang
Using curl do some tests.

curl 'http://localhost:8983/solr/restore_test_collection/update?
*commit=true*&wt=json' --data-binary @test.json -H
'Content-type:application/json'

The leader don't have new documents, but other replicas have.

curl 'http://localhost:8983/solr/restore_test_collection/update?
*commitWithin**=1000*&wt=json' --data-binary @test.json -H
'Content-type:application/json'
All replicas in shard1 have new documents include leader, and all new
documents route to shard1.

On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang  wrote:

> Hi all,
>
> I'm facing a strange problem.
>
> Here's a solrcloud on a single machine which has 2 solr nodes, version:
> solr6.1.
>
> I create a collection with 2 shards and replica factor is 3 with default
> router called "test_collection".
> Index some documents and commit. Then I backup this collection.
> After that, I restore from the backup and name the restored collection
> "restore_test_collection".
> Query from "restore_test_collection". It works fine and data is consistent.
>
> Then, I index some new documents, and commit.
> I find that the documents are all indexed in shard1 and the leader of
> shard1 don't have these new documents but other replicas do have these new
> documents.
>
> Anyone have this issue?
> Really need your help.
>
> Regards,
> Jerome
>


Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Jerome Yang
@Mark Miller Please help~

On Tue, Oct 11, 2016 at 5:32 PM, Jerome Yang  wrote:

> Using curl do some tests.
>
> curl 'http://localhost:8983/solr/restore_test_collection/update?
> *commit=true*&wt=json' --data-binary @test.json -H
> 'Content-type:application/json'
>
> The leader don't have new documents, but other replicas have.
>
> curl 'http://localhost:8983/solr/restore_test_collection/update?
> *commitWithin**=1000*&wt=json' --data-binary @test.json -H
> 'Content-type:application/json'
> All replicas in shard1 have new documents include leader, and all new
> documents route to shard1.
>
> On Tue, Oct 11, 2016 at 5:27 PM, Jerome Yang  wrote:
>
>> Hi all,
>>
>> I'm facing a strange problem.
>>
>> Here's a solrcloud on a single machine which has 2 solr nodes, version:
>> solr6.1.
>>
>> I create a collection with 2 shards and replica factor is 3 with default
>> router called "test_collection".
>> Index some documents and commit. Then I backup this collection.
>> After that, I restore from the backup and name the restored collection
>> "restore_test_collection".
>> Query from "restore_test_collection". It works fine and data is
>> consistent.
>>
>> Then, I index some new documents, and commit.
>> I find that the documents are all indexed in shard1 and the leader of
>> shard1 don't have these new documents but other replicas do have these new
>> documents.
>>
>> Anyone have this issue?
>> Really need your help.
>>
>> Regards,
>> Jerome
>>
>
>


Re: Highlight partial match

2016-10-11 Thread Shawn Heisey
On 10/11/2016 12:15 AM, Juan Fernando Mora wrote:
> Hi, I have been doing some research on highlighting partial matches,
> there are some information on google but is far from complete and I
> just can't get it to work. *I have highlighting working but it
> highlights complete words, example:* 

I have no experience with highlighting, but I think the reason that this
happens is because of how the Lucene index (specifically in this case,
the EdgeNGram filter) stores information in the index.  I put your
fieldType into a 6.2 example index and did index analysis on
"computer".  This is the result:

https://www.dropbox.com/s/ph524b8ij1hk28o/solr-analysis-computer-edgengrams.png?dl=0

Notice how every term has a start value of "0" and an end value of "8"
... this is the character position inside the original indexed text. 
Every term resolves to the original source text of "computer".

I believe these start/end values in the index are how highlighting
decides *what* to highlight, though I admit I could have a flawed
understanding of how it works.  If my understanding is correct, then
obtaining what you want would involve an alternate NGram filter that
writes different start/end values.  I'm don't think that an alternative
like this to EdgeNGram exists.

Thanks,
Shawn



Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Shawn Heisey
On 10/11/2016 3:27 AM, Jerome Yang wrote:
> Then, I index some new documents, and commit. I find that the
> documents are all indexed in shard1 and the leader of shard1 don't
> have these new documents but other replicas do have these new documents. 

Not sure why the leader would be missing the documents but other
replicas have them, but I do have a theory about why they are only in
shard1.  Testing that theory will involve obtaining some information
from your system:

What is the router on the restored collection? You can see this in the
admin UI by going to Cloud->Tree, opening "collections", and clicking on
the collection.  In the right-hand side, there will be some info from
zookeeper, with some JSON below it that should mention the router.  I
suspect that the router on the new collection may have been configured
as implicit, instead of compositeId.

Thanks,
Shawn



Lowercase all characters in String

2016-10-11 Thread Zheng Lin Edwin Yeo
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin


Re: Lowercase all characters in String

2016-10-11 Thread Ahmet Arslan
Hi,

KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add 
TrimFilter too.

Ahmet


On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo 
 wrote:
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin


Re: Lowercase all characters in String

2016-10-11 Thread Walter Underwood
Like this:



  


  


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 11, 2016, at 7:43 AM, Ahmet Arslan  wrote:
> 
> Hi,
> 
> KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add 
> TrimFilter too.
> 
> Ahmet
> 
> 
> On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo 
>  wrote:
> Hi,
> 
> Would like to find out, what is the best way to lowercase all the text,
> while preserving all the tokens.
> 
> As I need to preserve every character of the text (including symbols and
> white space), I'm using String. However, I can't put the
> LowerCaseFilterFactory in String.
> 
> I found that we can use WhitespaceTokenizerFactory, followed by
> LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
> the symbols, it will still split on Whitespace, which is what we do not
> want. This is because we may have words like 'One' and 'One Way'. If we use
> the WhitespaceTokenizerFactory and search for 'One', it will return records
> with 'One Way' too, which is what we do not want.
> 
> Is there other way which we can achieve this?
> 
> I'm using Solr 6.2.1.
> 
> Regards,
> Edwin



Query by distance

2016-10-11 Thread marotosg
Hi,

I have a field which contains Job Positions for people. This field uses a
SynonymFilterFactory 


The field contains the following data "Chief Sales Officer"  and my synonyms
file has an entrance
like "Chief Sales Officer, Chief of Sales, Chief Sales Executive".

My Analyzer return for "Chief Sales Officer"  these tokens. "chief chief
chief sales of sales officer sales executive"

I have a query like below which is returning a match for "Chief Executive
officer" which is not good.
 PositionNSD:(Chief))^3 OR ((PositionNSD:Chief*))^1.5) 
AND
 ((PositionNSD:(Executive))^3 OR ((PositionNSD:Executive*))^1.5) 
AND
 ((PositionNSD:(Officer))^3 OR ((PositionNSD:Officer*))^1.5)))

Can anyone suggest a solution to keep the distance between the terms or do
something to avoid to match on any token no matter the position?

Thanks a lot.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-by-distance-tp4300660.html
Sent from the Solr - User mailing list archive at Nabble.com.


VPAT?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich


Re: VPAT?

2016-10-11 Thread KRIS MUSSHORN
I'm sure someone will correct me if i am wrong but SOLR is data layer device so 
508 compliance would have to be assured by the presentation layer device. 

Unless your talking about 508 compliance for the admin webapp. 

K 

- Original Message -

From: "Bill Yosmanovich"  
To: solr-user@lucene.apache.org 
Sent: Tuesday, October 11, 2016 12:26:00 PM 
Subject: VPAT? 

Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information? 

Thanks! 
Bill Yosmanovich 



Re: VPAT?

2016-10-11 Thread Walter Underwood
Solr is not designed for direct customer use, so a VPAT is not needed. Each 
implementation builds their own end user search UI. That UI needs an evaluation.

When I did the evaluation for Ultraseek Server, we exempted the entire admin UI 
using the Section 508 “back office” exception. 

https://www.section508.gov/content/glossary#BackOffice_Exception 


At the time, Ultraseek was the only enterprise search engine with an 
evaluation. We made a lot of sales in the quarter when departments were 
required to report to the President on their accessibility.

To be clear, I’m not making jokes about accessibility. Our oldest son is 
disabled, and I take it very seriously.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 11, 2016, at 9:26 AM, Bill Yosmanovich  
> wrote:
> 
> Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
> Section 508 compliance information?
> 
> Thanks!
> Bill Yosmanovich



Re: Highlight partial match

2016-10-11 Thread Juan Fernando Mora
Well, that would explain it,

I hand't noticed the start and end values, I'm not experienced with
analysis,
but this is really interesting, I will look into this,

Thanks a lot Shawn!

On Tue, Oct 11, 2016 at 7:31 AM, Shawn Heisey  wrote:

> On 10/11/2016 12:15 AM, Juan Fernando Mora wrote:
> > Hi, I have been doing some research on highlighting partial matches,
> > there are some information on google but is far from complete and I
> > just can't get it to work. *I have highlighting working but it
> > highlights complete words, example:*
>
> I have no experience with highlighting, but I think the reason that this
> happens is because of how the Lucene index (specifically in this case,
> the EdgeNGram filter) stores information in the index.  I put your
> fieldType into a 6.2 example index and did index analysis on
> "computer".  This is the result:
>
> https://www.dropbox.com/s/ph524b8ij1hk28o/solr-analysis-
> computer-edgengrams.png?dl=0
>
> Notice how every term has a start value of "0" and an end value of "8"
> ... this is the character position inside the original indexed text.
> Every term resolves to the original source text of "computer".
>
> I believe these start/end values in the index are how highlighting
> decides *what* to highlight, though I admit I could have a flawed
> understanding of how it works.  If my understanding is correct, then
> obtaining what you want would involve an alternate NGram filter that
> writes different start/end values.  I'm don't think that an alternative
> like this to EdgeNGram exists.
>
> Thanks,
> Shawn
>
>


Unsubscribe from this mailing-list

2016-10-11 Thread prakash reddy
Please remove me from the mailing list.


Re: Lowercase all characters in String

2016-10-11 Thread Zheng Lin Edwin Yeo
Thanks Ahmet and Walter.

It works.

Regards,
Edwin


On 11 October 2016 at 23:36, Walter Underwood  wrote:

> Like this:
>
> 
> 
>   
> 
> 
>   
> 
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Oct 11, 2016, at 7:43 AM, Ahmet Arslan 
> wrote:
> >
> > Hi,
> >
> > KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can
> add TrimFilter too.
> >
> > Ahmet
> >
> >
> > On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> > Hi,
> >
> > Would like to find out, what is the best way to lowercase all the text,
> > while preserving all the tokens.
> >
> > As I need to preserve every character of the text (including symbols and
> > white space), I'm using String. However, I can't put the
> > LowerCaseFilterFactory in String.
> >
> > I found that we can use WhitespaceTokenizerFactory, followed by
> > LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
> > the symbols, it will still split on Whitespace, which is what we do not
> > want. This is because we may have words like 'One' and 'One Way'. If we
> use
> > the WhitespaceTokenizerFactory and search for 'One', it will return
> records
> > with 'One Way' too, which is what we do not want.
> >
> > Is there other way which we can achieve this?
> >
> > I'm using Solr 6.2.1.
> >
> > Regards,
> > Edwin
>
>


Re: SolrCloud - Path must not end with / character

2016-10-11 Thread Ona
I am facing same "Overseer main queue loop .." exception. Removed that bad
node and cleared version-2 folder from Zookeeper. Reinstalling Solr and
Zookeeper using backup copies also fail. Looks information about the node is
stored somewhere on the server cache. Unfortunately, I cannot find any patch
for Solr 4.3 either.

Any suggestion on how to fix this? Our Solr Cloud is being used by many
developers but with the continuous exception in the log will soon make it
unusable. 

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4300745.html
Sent from the Solr - User mailing list archive at Nabble.com.


Split words with period in between into separate tokens

2016-10-11 Thread Derek Poh

Hi

How can I split words with period in between into separate tokens.
Eg. "Co.Ltd" => "Co" "Ltd" .

I am using StandardTokenizerFactory and it does notreplace periods 
(dots) that are not followed by whitespace are kept as part of the 
token, including Internet domain names.


This is the field definition,

positionIncrementGap="100">

  

words="stopwords.txt" />


  
  

words="stopwords.txt" />
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>


  


Solr versionis 10.4.10.

Derek

--
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 


This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

VPAT / 508 compliance information?

2016-10-11 Thread Bill Yosmanovich
Would anyone happen to know if SOLR has a VPAT or where I could obtain any 
Section 508 compliance information?

Thanks!
Bill Yosmanovich



Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Jerome Yang
Hi Shawn,

I just check the clusterstate.json

which
is restored for "restore_test_collection".
The router is "router":{"name":"compositeId"},
not implicit.

So, it's a very serious bug I think.
Should this bug go into jira?

Please help!

Regards,
Jerome


On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey  wrote:

> On 10/11/2016 3:27 AM, Jerome Yang wrote:
> > Then, I index some new documents, and commit. I find that the
> > documents are all indexed in shard1 and the leader of shard1 don't
> > have these new documents but other replicas do have these new documents.
>
> Not sure why the leader would be missing the documents but other
> replicas have them, but I do have a theory about why they are only in
> shard1.  Testing that theory will involve obtaining some information
> from your system:
>
> What is the router on the restored collection? You can see this in the
> admin UI by going to Cloud->Tree, opening "collections", and clicking on
> the collection.  In the right-hand side, there will be some info from
> zookeeper, with some JSON below it that should mention the router.  I
> suspect that the router on the new collection may have been configured
> as implicit, instead of compositeId.
>
> Thanks,
> Shawn
>
>


Re: Solrcloud after restore collection, when index new documents into restored collection, leader not write to index.

2016-10-11 Thread Jerome Yang
@Erick Please help😂

On Wed, Oct 12, 2016 at 10:21 AM, Jerome Yang  wrote:

> Hi Shawn,
>
> I just check the clusterstate.json
> 
>  which
> is restored for "restore_test_collection".
> The router is "router":{"name":"compositeId"},
> not implicit.
>
> So, it's a very serious bug I think.
> Should this bug go into jira?
>
> Please help!
>
> Regards,
> Jerome
>
>
> On Tue, Oct 11, 2016 at 8:34 PM, Shawn Heisey  wrote:
>
>> On 10/11/2016 3:27 AM, Jerome Yang wrote:
>> > Then, I index some new documents, and commit. I find that the
>> > documents are all indexed in shard1 and the leader of shard1 don't
>> > have these new documents but other replicas do have these new documents.
>>
>> Not sure why the leader would be missing the documents but other
>> replicas have them, but I do have a theory about why they are only in
>> shard1.  Testing that theory will involve obtaining some information
>> from your system:
>>
>> What is the router on the restored collection? You can see this in the
>> admin UI by going to Cloud->Tree, opening "collections", and clicking on
>> the collection.  In the right-hand side, there will be some info from
>> zookeeper, with some JSON below it that should mention the router.  I
>> suspect that the router on the new collection may have been configured
>> as implicit, instead of compositeId.
>>
>> Thanks,
>> Shawn
>>
>>
>


Predicting query execution time.

2016-10-11 Thread Modassar Ather
Hi,

We see queries executing in less than a second and taking minutes to
execute as well. We need to predict the approximate time a query might take
to execute.
Need your help in finding the factors to be considered and calculating an
approximate execution time.

Thanks,
Modassar


Re: Predicting query execution time.

2016-10-11 Thread Shawn Heisey
On 10/11/2016 11:46 PM, Modassar Ather wrote:
> We see queries executing in less than a second and taking minutes to
> execute as well. We need to predict the approximate time a query might
> take to execute. Need your help in finding the factors to be
> considered and calculating an approximate execution time. 

If a query or filter query has been cached successfully by Solr, then
running it again while on the same searcher will happen very quickly.

I think you can look at the number of fields in the query and the number
of terms being searched in each of those fields as a general guide for
query complexity.  Wildcard queries can expand to a large number of
terms, so those can be quite slow.

Deep paging (setting the start parameter to a high number) can make
queries slow, particularly on multi-shard indexes.

Using complex queries as filters (fq parameter) will tend to run slower
than the same thing in the q parameter, at least the first time the
filter is executed on an index searcher.  Once a given filter query is
in the filterCache, it typically will execute VERY quickly.  Be aware
that using NOW in date queries without rounding will change the query on
every execution, so it cannot be cached.

Complex features like faceting, grouping, and the stats component will
tend to take longer with fields that have a very large number of terms,
and might also take longer with queries that have many matches.

Other people might have a better idea of what kinds of queries are slow,
but hopefully this is a decent guide.

If you truly cannot see any sort of common qualities in your slow
queries, then you may be running into a situation where you don't have
enough memory for your index to be effectively cached.  This situation
can cause a query that would normally be fast to be very slow, if the
index data the query needs has been pushed out of the operating system
disk cache by other queries.  Information about effective disk caching
is discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Re: Config for massive inserts into Solr master

2016-10-11 Thread Reinhard Budenstecher
>
> That's considerably larger than you initially indicated.  In just one
> index, you've got almost 300 million docs taking up well over 200GB.
> About half of them have been deleted, but they are still there.  Those
> deleted docs *DO* affect operation and memory usage.
>
> Getting rid of deleted docs would go a long way towards reducing memory
>
> usage.  The only effective way to get rid of them is to optimize the
> index ... but I will warn you that with an index of that size, the time

It really seems to be a matter of size :) We've extended servers RAM from 64GB 
to 128GB and raised heap space from 32GB to 64GB and now ETL processes are 
running for 3 days now without interruption. That does not satisfy me but it's 
a solution to keep business running for now.
Is my assumption correct that an OPTIMIZE of index would block all inserts? So 
that all processes have to pause when I will start an hour-running OPTIMIZE? If 
so, this would also be no option for the moment.

__
Gesendet mit Maills.de - mehr als nur Freemail www.maills.de




Re: Predicting query execution time.

2016-10-11 Thread Modassar Ather
Thanks Shawn for your suggestions.

Best,
Modassar

On Wed, Oct 12, 2016 at 11:44 AM, Shawn Heisey  wrote:

> On 10/11/2016 11:46 PM, Modassar Ather wrote:
> > We see queries executing in less than a second and taking minutes to
> > execute as well. We need to predict the approximate time a query might
> > take to execute. Need your help in finding the factors to be
> > considered and calculating an approximate execution time.
>
> If a query or filter query has been cached successfully by Solr, then
> running it again while on the same searcher will happen very quickly.
>
> I think you can look at the number of fields in the query and the number
> of terms being searched in each of those fields as a general guide for
> query complexity.  Wildcard queries can expand to a large number of
> terms, so those can be quite slow.
>
> Deep paging (setting the start parameter to a high number) can make
> queries slow, particularly on multi-shard indexes.
>
> Using complex queries as filters (fq parameter) will tend to run slower
> than the same thing in the q parameter, at least the first time the
> filter is executed on an index searcher.  Once a given filter query is
> in the filterCache, it typically will execute VERY quickly.  Be aware
> that using NOW in date queries without rounding will change the query on
> every execution, so it cannot be cached.
>
> Complex features like faceting, grouping, and the stats component will
> tend to take longer with fields that have a very large number of terms,
> and might also take longer with queries that have many matches.
>
> Other people might have a better idea of what kinds of queries are slow,
> but hopefully this is a decent guide.
>
> If you truly cannot see any sort of common qualities in your slow
> queries, then you may be running into a situation where you don't have
> enough memory for your index to be effectively cached.  This situation
> can cause a query that would normally be fast to be very slow, if the
> index data the query needs has been pushed out of the operating system
> disk cache by other queries.  Information about effective disk caching
> is discussed here:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems
>
> Thanks,
> Shawn
>
>


Re: Config for massive inserts into Solr master

2016-10-11 Thread Shawn Heisey
On 10/12/2016 12:18 AM, Reinhard Budenstecher wrote:
> Is my assumption correct that an OPTIMIZE of index would block all
> inserts? So that all processes have to pause when I will start an
> hour-running OPTIMIZE? If so, this would also be no option for the moment.

That is not correct as of version 4.0.

The only kind of update I've run into that cannot proceed at the same
time as an optimize is a deleteByQuery operation.  If you do that, then
it will block until the optimize is done, and I think it will also block
any update you do after it.

Inserts, updates, deletes by ID (uniqueKey field), and commits will all
work perfectly while an optimize is underway.

There is a workaround in USER code for the deleteByQuery problem. 
Change all deleteByQuery updates in your indexing program(s) into a two
step process where you do a query with fl set to the uniqueKey field, to
gather a complete list of IDs, then do a series of delete by ID requests
with those IDs.  If the query matches a large number of docs, using
cursorMark to page through them might be a good idea.

I have no idea whether the deleteByQuery problem should be considered a
bug or not, because I do not know enough about what's actually happening
when it occurs.

Thanks,
Shawn