[jira] [Created] (LUCENE-9636) Exact and operation to get a SIMD optimize

2020-12-10 Thread Feng Guo (Jira)
Feng Guo created LUCENE-9636:


 Summary: Exact and operation to get a SIMD optimize
 Key: LUCENE-9636
 URL: https://issues.apache.org/jira/browse/LUCENE-9636
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Feng Guo


In `decode6()` `decode7()` `decode14()` `decode15()` `decode24`, longs always 
`&` a same mask and do some shift. By printing assemble language, i find that 
JIT did not optimize them with SIMD instructions. But when we extract all `&` 
operations and do them first, JIT will use SIMD optimize on them.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gf2121 opened a new pull request #2139: LUCENE-9636: Extract and operation to get a SIMD optimize

2020-12-10 Thread GitBox


gf2121 opened a new pull request #2139:
URL: https://github.com/apache/lucene-solr/pull/2139


   # Description
   
   In `decode6()` `decode7()` `decode14()` `decode15()` `decode24`, longs 
always `&` a same mask and do some shift. By printing assemble language, i find 
that JIT did not optimize them with SIMD instructions. But when we extract all 
`&` operations and do them first, JIT will use SIMD to optimize them.
   
   
   # Tests
   
   Java Version: 
   > java version "11.0.6" 2020-01-14 LTS
   > Java(TM) SE Runtime Environment 18.9 (build 11.0.6+8-LTS)
   > Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.6+8-LTS, mixed mode)
   
   Using `decode15` as an example, here is a microbenchmark based on JMH:
   **code**
   ```
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public void decode15a() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
   l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
   l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
   l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
   l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
   l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
   l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
   l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
   l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
   l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
   l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
   l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
   l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
   l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
   l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public void decode15b() {
   shiftLongs(TMP, 30, TMP, 0, 0, MASK16_1);
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = TMP[tmpIdx+0] << 14;
   l0 |= TMP[tmpIdx+1] << 13;
   l0 |= TMP[tmpIdx+2] << 12;
   l0 |= TMP[tmpIdx+3] << 11;
   l0 |= TMP[tmpIdx+4] << 10;
   l0 |= TMP[tmpIdx+5] << 9;
   l0 |= TMP[tmpIdx+6] << 8;
   l0 |= TMP[tmpIdx+7] << 7;
   l0 |= TMP[tmpIdx+8] << 6;
   l0 |= TMP[tmpIdx+9] << 5;
   l0 |= TMP[tmpIdx+10] << 4;
   l0 |= TMP[tmpIdx+11] << 3;
   l0 |= TMP[tmpIdx+12] << 2;
   l0 |= TMP[tmpIdx+13] << 1;
   l0 |= TMP[tmpIdx+14] << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   ```
   **Result**
   ```
   Benchmark   Mode  Cnt  Score Error  Units
   MyBenchmark.decode15a  thrpt   10   65234108.600 ± 1336311.970  ops/s
   MyBenchmark.decode15b  thrpt   10  106840656.363 ±  448026.092  ops/s
   ```
   
   And an end-to-end test based on _wikimedium1m_ also looks positive overall:
   ```
   Fuzzy1  131.77  (5.4%)  131.75  (4.2%)   -0.0% (  -9% -   
10%) 0.990
  MedPhrase  146.41  (4.5%)  146.44  (4.8%)
0.0% (  -8% -9%) 0.992
 AndHighMed  643.10  (5.4%)  643.95  (5.5%)
0.1% ( -10% -   11%) 0.939
   HighSpanNear  125.99  (5.7%)  126.48  (4.9%)
0.4% (  -9% -   11%) 0.818
Respell  164.81  (4.9%)  165.48  (4.5%)
0.4% (  -8% -   10%) 0.783
   HighSloppyPhrase  103.20  (6.2%)  103.65  (5.8%)
0.4% ( -10% -   13%) 0.816
 IntNRQ  662.80  (5.0%)  665.87  (5.1%)
0.5% (  -9% -   11%) 0.770
Prefix3  882.57  (6.8%)  887.18  (8.6%)
0.5% ( -13% -   17%) 0.832
LowSloppyPhrase   76.17  (5.5%)   76.57  (5.0%)
0.5% (  -9% -   11%) 0.754
AndHighHigh  236.71  (5.8%)  237.99  (5.2%)
0.5% (  -9% -   12%) 0.756
 Fuzzy2  100.40  (5.6%)  101.02  (4.7%)
0.6% (  -9% -   11%) 0.708
 OrHighHigh  154.05  (5.4%)  155.08  (5.0%)
0.7% (  -9% -   11%) 0.684
  LowPhrase  327.86  (4.4%)  330.10  (4.9%)
0.7% (  -8% -   10%) 0.641
   BrowseDayOfYearSSDVFacets  120.00  (5.1%)  120.88  (4.5%)
0.7% (  -8% -   10%) 0.627
MedTerm 2239.68  (6.3%) 2256.94  (5.9%)

[jira] [Updated] (LUCENE-9636) Exact and operation to get a SIMD optimize

2020-12-10 Thread Feng Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-9636:
-
Description: 
In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` a 
same mask and do some shift. By printing assemble language, i find that JIT did 
not optimize them with SIMD instructions. But when we extract all `&` 
operations and do them first, JIT will use SIMD optimize on them.

 

 

  was:
In `decode6()` `decode7()` `decode14()` `decode15()` `decode24`, longs always 
`&` a same mask and do some shift. By printing assemble language, i find that 
JIT did not optimize them with SIMD instructions. But when we extract all `&` 
operations and do them first, JIT will use SIMD optimize on them.

 

 


> Exact and operation to get a SIMD optimize
> --
>
> Key: LUCENE-9636
> URL: https://issues.apache.org/jira/browse/LUCENE-9636
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In decode6(), decode7(), decode14(), decode15(), decode24() longs always `&` 
> a same mask and do some shift. By printing assemble language, i find that JIT 
> did not optimize them with SIMD instructions. But when we extract all `&` 
> operations and do them first, JIT will use SIMD optimize on them.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #2139: LUCENE-9636: Extract and operation to get a SIMD optimize

2020-12-10 Thread GitBox


dweiss commented on pull request #2139:
URL: https://github.com/apache/lucene-solr/pull/2139#issuecomment-742434844


   This is excellent, thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15039) Error in Solr Cell extract when using multipart upload with some documents

2020-12-10 Thread sam marshall (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247159#comment-17247159
 ] 

sam marshall commented on SOLR-15039:
-

In case it is helpful to reproduce the problem, here is a complete sequence of 
commands that will reproduce it on starting from a fresh Ubuntu 18.04 
installation (I used a Microsoft Azure VM). It uses a fresh Solr 8.7.0 
installation with the supplied 'techproducts' sample, which has the extract 
handler enabled and I assume is correctly configured.

After creating the new VM, I copied the file b364b24b-public into the home 
directory, and then this is the full sequence of commands I needed to reproduce 
it (it doesn't quite run as a script, you have to press Y or Q at a couple of 
points):

{code}
sudo apt install openjdk-11-jdk
wget https://archive.apache.org/dist/lucene/solr/8.7.0/solr-8.7.0.tgz
tar xzf solr-8.7.0.tgz solr-8.7.0/bin/install_solr_service.sh 
--strip-components=2
sudo bash ./install_solr_service.sh solr-8.7.0.tgz
sudo su - solr -c "/opt/solr/bin/solr create -c testcollection -d 
sample_techproducts_configs"
curl 
"http://localhost:8983/solr/testcollection/update/extract?&extractOnly=true"; 
--data-binary '@b364b24b-public' -H 'Content-type:text/html' > 
nonmultipart-result.txt
curl 
"http://localhost:8983/solr/testcollection/update/extract?&extractOnly=true"; -F 
'myfile=@b364b24b-public' -H 'Content-type:text/html' > multipart-result.txt
{code}

After that point you can see the results in the two files, which are of clearly 
different sizes:

{code}
sam@solr-test-temp:~$ ls -l
total 212648
-rw-r--r-- 1 sam sam  10323956 Dec 10 10:32 b364b24b-public
-rwxr-xr-x 1 sam sam 12694 Oct 28 09:21 install_solr_service.sh
-rw-rw-r-- 1 sam sam   6589425 Dec 10 10:40 multipart-result.txt
-rw-rw-r-- 1 sam sam  9988 Dec 10 10:39 nonmultipart-result.txt
-rw-rw-r-- 1 sam sam 200805960 Oct 29 19:05 solr-8.7.0.tgz
{code}

> Error in Solr Cell extract when using multipart upload with some documents
> --
>
> Key: SOLR-15039
> URL: https://issues.apache.org/jira/browse/SOLR-15039
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 6.6.4, 8.4, 8.6.3, 8.7
>Reporter: sam marshall
>Priority: Major
> Attachments: b364b24b-public
>
>
> (Note: I asked about this in the IRC channel as prompted, but didn't get a 
> response.)
> When uploading particular documents to /update/extract, you get different 
> (wrong) results if you are using multipart file upload compared to the basic 
> encoded upload, even though both methods are shown on the documentation page 
> ([https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html]).
> The first example in the documentation page uses a multipart POST with a 
> field called 'myfile' set to the file content. Some later examples use a 
> standard POST with the raw data provided.
> Here are these two approaches in the commands I used with my example file (I 
> have replaced the URL, username, password, and collection name for my Solr, 
> which isn't publicly available):
> {code}
> curl --user myuser:mypassword 
> "https://example.org/solr/mycollection/update/extract?&extractOnly=true"; 
> --data-binary '@c:/temp/b364b24b728b350eac18d6379ede3437fd220829' -H 
> 'Content-type:text/html' > nonmultipart-result.txt
> curl --user myuser:mypassword 
> "https://example.org/solr/mycollection/update/extract?&extractOnly=true"; -F 
> 'myfile=@c:/temp/b364b24b728b350eac18d6379ede3437fd220829' -H 
> 'Content-type:text/html' > multipart-result.txt
> {code}
> The example file is a ~10MB PowerPoint with a few sentences of English text 
> in it (and some pictures).
> The nonmultipart-result.txt file is 9,871 bytes long and JSON-encoded; it 
> includes an XHTML version of the text content of the PowerPoint, and some 
> metadata.
> The multipart-result.txt is 7,352,348 bytes long and contains mainly a large 
> sequence of Chinese characters, or at least, random data being interpreted as 
> Chinese characters.
> This example was running against Solr 8.4 on a Linux server from our cloud 
> Solr supplier. On another Linux (Ubuntu 18) server that I set up myself I got 
> the same results using various other Solr versions. Running against localhost 
> which is a Windows 10 machine with Solr 8.5, I get slightly different 
> results; the non-multipart works correctly but the multipart-result.txt in 
> that case is a slightly more helpful error 500 message:
> {code}
> 
> 
> 
>   500
>   138
> 
> 
>   
> org.apache.solr.common.SolrException
> java.util.zip.ZipException
>   
>   org.apache.tika.exception.TikaException: E

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-12-10 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247258#comment-17247258
 ] 

David Smiley commented on SOLR-13101:
-

I would like to close this issue as won't-fix because the substance and feature 
branch (with linked PRs) pointing to this issue is dead-in-the-water (will not 
be merged, or further publicly contributed to).  However the issue title, 
"Shared storage support" (rather general) is not a "won't-fix" !  So with that, 
I propose I re-title the issue to "Shared storage via new SHARED replica type" 
because in my mind, that's the most stand-out aspect of this PR compared to 
other alternatives.  WDYT [~ilan]?

That said, do not lose hope for a solution to come into being!  I've been 
excitedly working on a new plan I've been internally sharing that solves the 
contribut-ability matters that the SHARED replica type implementation lacks.  
If things go well in the coming weeks... there will end up being a new Jira 
issue to be called "BlobDirectory, a shared storage approach" that will link 
here.

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15040) Improvements to postlogs timestamp handling

2020-12-10 Thread Joel Bernstein (Jira)
Joel Bernstein created SOLR-15040:
-

 Summary: Improvements to postlogs timestamp handling
 Key: SOLR-15040
 URL: https://issues.apache.org/jira/browse/SOLR-15040
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


This ticket will make some small improvements to how the bin/postlogs programs 
handles timestamps. In particular it will change the format of the datetime 
stamp so that it matches the ISO spec more closely. It will also add a few date 
truncated string time stamp fields which make it easier for time series 
analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15040) Improvements to postlogs timestamp handling

2020-12-10 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-15040:
-

Assignee: Joel Bernstein

> Improvements to postlogs timestamp handling
> ---
>
> Key: SOLR-15040
> URL: https://issues.apache.org/jira/browse/SOLR-15040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
>
> This ticket will make some small improvements to how the bin/postlogs 
> programs handles timestamps. In particular it will change the format of the 
> datetime stamp so that it matches the ISO spec more closely. It will also add 
> a few date truncated string time stamp fields which make it easier for time 
> series analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15040) Improvements to postlogs timestamp handling

2020-12-10 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-15040:
--
Attachment: SOLR-15040.patch

> Improvements to postlogs timestamp handling
> ---
>
> Key: SOLR-15040
> URL: https://issues.apache.org/jira/browse/SOLR-15040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-15040.patch
>
>
> This ticket will make some small improvements to how the bin/postlogs 
> programs handles timestamps. In particular it will change the format of the 
> datetime stamp so that it matches the ISO spec more closely. It will also add 
> a few date truncated string time stamp fields which make it easier for time 
> series analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15040) Improvements to postlogs timestamp handling

2020-12-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247398#comment-17247398
 ] 

ASF subversion and git services commented on SOLR-15040:


Commit 04b9a9806013d98b8ad78a33a905d10dadf3129a in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=04b9a98 ]

SOLR-15040: Improvements to postlogs timestamp handling


> Improvements to postlogs timestamp handling
> ---
>
> Key: SOLR-15040
> URL: https://issues.apache.org/jira/browse/SOLR-15040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-15040.patch
>
>
> This ticket will make some small improvements to how the bin/postlogs 
> programs handles timestamps. In particular it will change the format of the 
> datetime stamp so that it matches the ISO spec more closely. It will also add 
> a few date truncated string time stamp fields which make it easier for time 
> series analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15040) Improvements to postlogs timestamp handling

2020-12-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247430#comment-17247430
 ] 

ASF subversion and git services commented on SOLR-15040:


Commit 3bb4ed24d89e2efab742dde5f666049f7d4fff0c in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3bb4ed2 ]

SOLR-15040: Improvements to postlogs timestamp handling


> Improvements to postlogs timestamp handling
> ---
>
> Key: SOLR-15040
> URL: https://issues.apache.org/jira/browse/SOLR-15040
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-15040.patch
>
>
> This ticket will make some small improvements to how the bin/postlogs 
> programs handles timestamps. In particular it will change the format of the 
> datetime stamp so that it matches the ISO spec more closely. It will also add 
> a few date truncated string time stamp fields which make it easier for time 
> series analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15041) CSV update handler can't handle line breaks/new lines together with field split/separators for multivalued fields

2020-12-10 Thread Matt Hov (Jira)
Matt Hov created SOLR-15041:
---

 Summary: CSV update handler can't handle line breaks/new lines 
together with field split/separators for multivalued fields
 Key: SOLR-15041
 URL: https://issues.apache.org/jira/browse/SOLR-15041
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: update
Affects Versions: 8.4
 Environment: Ubuntu 20.04 8 CPU 60GB+ ram
Reporter: Matt Hov


I've been using the /update/csv option to bulk import large numbers of data 
with great success, but I believe I've found a corner case in the parsing of 
csv when the field is a multi-valued string field with a new-line character in 
it.

As soon as you specify 
{{f.[fieldname].split=true&f.[fieldname].separator=[something]}} the 
multi-field/split parsing stops at the first linebreak

My managed schema:
{code:java}
-- managed schema
 

{code}
Example POST url,  I'm using ! as split character for test1_strs and test2_strs
{code:java}
http://[myserver]/solr/[mycore]/update/csv?commitWithin=1000&f.test1_strs.split=true&f.test1_strs.separator=!&f.test2_strs.split=true&f.test2_strs.separator=!{code}
CSV content: (notice the new-lines are included but encapsulated by "", these 
new-lines need to be maintained as is)
{code:java}
id,title,test1_strs,test2_strs,test3_str
csv_test,title,"first line
with break!second line","first line!second_line","a line
break"
{code}
Resulting Solr Doc:
{code:java}
{
"id":"csv_test",
"title":"title",
"_version_":1685718010076069888,
"test1_strs":["first line "], 
"test2_strs":["first line", "second_line"],
"test3_str":"a line\r\nbreak"}]
  }
{code}
Note in the single value {{test3_str}} the new-line is appropriately maintained 
as \r\n (or just \n when this is done via code instead of manually)

{{test2_strs}} shows that the mutli-value split on ! worked correctly

{{test1_strs}} immediately stops processing after the first value's new-line, 
instead of the actual separator after the new-line.

Expected values should look like:
{code:java}
{
"id":"csv_test",
"title":"title",
"_version_":1685718010076069888,
"test1_strs":["first line\r\nwith break", "second line"], 
"test2_strs":["first line", "second_line"],
"test3_str":"a line\r\nbreak"}]
  }
{code}
 
I've tried pre-escaping line breaks but all that gives me is the escaped 
new-line in solr, which would need to be post-processed on the consuming end to 
return to a \r\n (or \n) and would be nontrivial to do.  Solr handles \n just 
find in all other cases so I consider this an expected behavior.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova commented on pull request #2129: Fix format indent from 4 to 2 spaces

2020-12-10 Thread GitBox


mayya-sharipova commented on pull request #2129:
URL: https://github.com/apache/lucene-solr/pull/2129#issuecomment-742759335


   @msokolov  Thanks for your comment. Indeed having a  `gradlew precommit` to 
fail on inconsistent code style would be useful.
   
   Thanks for feedback, I will merge the PR. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mayya-sharipova merged pull request #2129: Fix format indent from 4 to 2 spaces

2020-12-10 Thread GitBox


mayya-sharipova merged pull request #2129:
URL: https://github.com/apache/lucene-solr/pull/2129


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2020-12-10 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247510#comment-17247510
 ] 

Ilan Ginzburg commented on SOLR-13101:
--

I have no issue with "will not fix" this Jira.

>From my perspective, the fundamental problem of this approach is not the 
>introduction of a new replica type but the need to commit every batch to be 
>able to push segments and having to wait for the push to complete and succeed 
>before calling the indexing itself as successful (there are a few possible 
>optimizations such as pushing files before commit happens so they're ready on 
>blob by then, but the fundamental issues do not go away). That's a major 
>performance degradation.

So yes, please close it. Thanks.

Looking forward to see a different approach that does not have the problems 
listed above! (or less of them :))

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-10 Thread GitBox


madrob commented on pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#issuecomment-742815596


   Converting back to draft, as the new asserts that I added in the unit test 
are failing. Further discussion on JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15026) MiniSolrCloudCluster can inconsistently get confused about when it's using SSL

2020-12-10 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-15026:
-

Assignee: (was: Timothy Potter)

> MiniSolrCloudCluster can inconsistently get confused about when it's using SSL
> --
>
> Key: SOLR-15026
> URL: https://issues.apache.org/jira/browse/SOLR-15026
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Chris M. Hostetter
>Priority: Major
>
> A new test added in SOLR-14934 caused the following reproducible failure to 
> pop up on jenkins...
> {noformat}
> hossman@slate:~/lucene/dev [j11] [master] $ ./gradlew -p solr/test-framework/ 
> test --tests MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders 
> -Dtests.seed=806A85748BD81F48 -Dtests.multiplier=2 -Dtests.slow=true 
> -Dtests.locale=ln-CG -Dtests.timezone=Asia/Thimbu -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> Starting a Gradle Daemon (subsequent builds will be faster)
> > Task :randomizationInfo
> Running tests with randomization seed: tests.seed=806A85748BD81F48
> > Task :solr:test-framework:test
> org.apache.solr.cloud.MiniSolrCloudClusterTest > 
> testSolrHomeAndResourceLoaders FAILED
> org.apache.solr.client.solrj.SolrServerException: IOException occurred 
> when talking to server at: https://127.0.0.1:38681/solr
> at 
> __randomizedtesting.SeedInfo.seed([806A85748BD81F48:37548FA7602CB5FD]:0)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:712)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:269)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:390)
> at 
> org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:360)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1168)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:931)
> at 
> org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:865)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:229)
> at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:246)
> at 
> org.apache.solr.cloud.MiniSolrCloudClusterTest.testSolrHomeAndResourceLoaders(MiniSolrCloudClusterTest.java:125)
> ...
> Caused by:
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
> at 
> java.base/sun.security.ssl.SSLSocketInputRecord.handleUnknownRecord(SSLSocketInputRecord.java:439)
> {noformat}
> The problem sems to be that even though the MiniSolrCloudCluster being 
> instantiated isn't _intentionally_ using any SSL randomization (it just uses 
> {{JettyConfig.builder().build()}} the CloudSolrClient returned by 
> {{cluster.getSolrClient()}} is evidnetly picking up the ranodmized SSL and 
> trying to use it to talk to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14766) Deprecate ManagedResources from Solr

2020-12-10 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-14766:
-

Assignee: (was: Timothy Potter)

> Deprecate ManagedResources from Solr
> 
>
> Key: SOLR-14766
> URL: https://issues.apache.org/jira/browse/SOLR-14766
> Project: Solr
>  Issue Type: Task
>Reporter: Noble Paul
>Priority: Major
>  Labels: deprecation
> Attachments: SOLR-14766.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This feature has the following problems. 
> * It's insecure because it is using restlet
> * Nobody knows that code enough to even remove the restlet dependency
> * Restlest dependency on Solr exists just because of this
> We should deprecate this from 8.7 and remove it from master



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9008) Investigate feasibilty and impact of using SparseFixedBitSet where Solr is currently using FixedBitSet

2020-12-10 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-9008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-9008:


Assignee: (was: Timothy Potter)

> Investigate feasibilty and impact of using SparseFixedBitSet where Solr is 
> currently using FixedBitSet
> --
>
> Key: SOLR-9008
> URL: https://issues.apache.org/jira/browse/SOLR-9008
> Project: Solr
>  Issue Type: Improvement
>Reporter: Timothy Potter
>Priority: Major
>
> Found this gem in one of Mike's blog posts:
> {quote}
> But with 5.0.0, Lucene now supports random-writable and advance-able sparse 
> bitsets (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in 
> proportion to how many bits are set, not how many total documents exist in 
> the index. 
> {quote}
> http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-coming.html
> I don't see any uses of either of these classes in Solr code but from a quick 
> look, sounds compelling for saving memory, such as when caching fq's
> This ticket is for exploring where Solr can leverage these structures and 
> whether there's an improvement in performance and/or memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6443) TestManagedResourceStorage fails on Jenkins with SolrCore.getOpenCount()==2

2020-12-10 Thread Timothy Potter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-6443:


Assignee: (was: Timothy Potter)

> TestManagedResourceStorage fails on Jenkins with SolrCore.getOpenCount()==2
> ---
>
> Key: SOLR-6443
> URL: https://issues.apache.org/jira/browse/SOLR-6443
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Reporter: Timothy Potter
>Priority: Major
>
> FAILED:  
> junit.framework.TestSuite.org.apache.solr.rest.TestManagedResourceStorage
> Error Message:
> SolrCore.getOpenCount()==2
> Stack Trace:
> java.lang.RuntimeException: SolrCore.getOpenCount()==2
> at __randomizedtesting.SeedInfo.seed([A491D1FD4CEF5EF8]:0)
> at org.apache.solr.util.TestHarness.close(TestHarness.java:332)
> at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:620)
> at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:183)
> at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:484)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9564) Format code automatically and enforce it

2020-12-10 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247581#comment-17247581
 ] 

Erick Erickson commented on LUCENE-9564:


I've been in 2-hour meetings early in my career when I was young and unsure of 
myself arguing about whether the curly braces should be at the end of an "if" 
(or whatever) statement  or on the next line. And then if on the next line, 
should the curly brace be indented or should it be flush with the "if". And 
should the first code line be on the same line as the curly brace? If on the 
next line, should it be flush with the curly brace or indented again?

Then had the conversation repeat some time later when the person(s) who didn't 
get what they wanted brought it up again. Best guy I ever worked for had a 
method of dealing with this. If the topic was brought up again he'd say "We 
decided it this way, end of discussion".

Later in my career I'd have walked out about 30 seconds into that conversation. 
So you can see why it's easy to get me to sign on ;)

When we reconcile the reference impl, I can help...

> Format code automatically and enforce it
> 
>
> Key: LUCENE-9564
> URL: https://issues.apache.org/jira/browse/LUCENE-9564
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a trivial change but a bold move. And I'm sure it's not for everyone.
> I started using google java format [1] in my projects a while ago and have 
> never looked back since. It is an oracle-style formatter (doesn't allow 
> customizations or deviations from the defined 'ideal') - this takes some 
> getting used to - but it also eliminates *all* the potential differences 
> between IDEs, configs, etc.  And the formatted code typically looks much 
> better than hand-edited one. It is also verifiable on precommit (so you can't 
> commit code that deviates from what you'd get from automated formatting 
> output).
> The biggest benefit I see is that refactorings become such a joy and keep the 
> code neat, everywhere. Before you commit you just reformat everything 
> automatically, no matter how much you messed it up.
> This isn't a change for everyone. I myself love hand-edited, neat code... but 
> the reality is that with IDE support for automated code changes and so many 
> people with different styles working on the same codebase keeping it neat is 
> a big pain. 
> Checkstyle and other tools are fine for ensuring certain rules but they don't 
> take the burden of formatting off your shoulders. This tool does. 
> Like I said - I had *great* reservations about using it at the beginning but 
> over time got so used to it that I almost can't live without it now. It's 
> like magic - you play with the code in any way you like, then run formatting 
> and it's nice and neat.
> The downside is that automated formatting does imply potential merge problems 
> in backward patches (or any currently existing branches).
> Like I said, it is a bold move. Just throwing this for your consideration.
> -I've added a PR that adds spotless but it's not ready; some files would have 
> to be excluded as they currently violate header rules.-
> A more interesting thing is here where the current code is automatically 
> reformatted - this branch is for eyeballing only.
> https://github.com/dweiss/lucene-solr/compare/LUCENE-9564...dweiss:LUCENE-9564-example
> [1] https://google.github.io/styleguide/javaguide.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on pull request #2129: Fix format indent from 4 to 2 spaces

2020-12-10 Thread GitBox


ErickErickson commented on pull request #2129:
URL: https://github.com/apache/lucene-solr/pull/2129#issuecomment-742891680


   It already fails on _tabs_ rather than spaces, but failing on too many 
spaces isn’t checked.
   
   That said, rather than a one-off for indentation, I’d rather see the effort 
go here
   rather than a separate precommit check….:
   
   https://issues.apache.org/jira/browse/LUCENE-9564 and SOLR-14920…
   
   BTW, 'gradlew check’ does all the precommit tasks as well as run tests...
   
   > On Dec 10, 2020, at 2:55 PM, Mayya Sharipova  
wrote:
   > 
   > 
   > Merged #2129 into master.
   > 
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-13101) Shared storage via a new SHARED replica type

2020-12-10 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-13101:

Description: 
_This issue is closed as Won't-Fix because the particular approach here won't 
be contributed. Linked issues may appear approaching it differently._

Solr should have first-class support for shared storage (blob/object stores 
like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).

The key component will likely be a new replica type for shared storage. It 
would have many of the benefits of the current "pull" replicas (not indexing on 
all replicas, all shards identical with no shards getting out-of-sync, etc), 
but would have additional benefits:
 - Any shard could become leader (the blob store always has the index)
 - Better elasticity scaling down
 - durability not linked to number of replcias.. a single replica could be 
common for write workloads
 - could drop to 0 replicas for a shard when not needed (blob store always has 
index)
 - Allow for higher performance write workloads by skipping the transaction log
 - don't pay for what you don't need
 - a commit will be necessary to flush to stable storage (blob store)
 - A lot of the complexity and failure modes go away

An additional component a Directory implementation that will work well with 
blob stores. We probably want one that treats local disk as a cache since the 
latency to remote storage is so large. I think there are still some "locking" 
issues to be solved here (ensuring that more than one writer to the same index 
won't corrupt it). This should probably be pulled out into a different JIRA 
issue.

  was:
Solr should have first-class support for shared storage (blob/object stores 
like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).

The key component will likely be a new replica type for shared storage.  It 
would have many of the benefits of the current "pull" replicas (not indexing on 
all replicas, all shards identical with no shards getting out-of-sync, etc), 
but would have additional benefits:
 - Any shard could become leader (the blob store always has the index)
 - Better elasticity scaling down
   - durability not linked to number of replcias.. a single replica could be 
common for write workloads
   - could drop to 0 replicas for a shard when not needed (blob store always 
has index)
 - Allow for higher performance write workloads by skipping the transaction log
   - don't pay for what you don't need
   - a commit will be necessary to flush to stable storage (blob store)
 - A lot of the complexity and failure modes go away

An additional component a Directory implementation that will work well with 
blob stores.  We probably want one that treats local disk as a cache since the 
latency to remote storage is so large.  I think there are still some "locking" 
issues to be solved here (ensuring that more than one writer to the same index 
won't corrupt it).  This should probably be pulled out into a different JIRA 
issue.


Summary: Shared storage via a new SHARED replica type  (was: Shared 
storage support in SolrCloud)

> Shared storage via a new SHARED replica type
> 
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> _This issue is closed as Won't-Fix because the particular approach here won't 
> be contributed. Linked issues may appear approaching it differently._
> 
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage. It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>  - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>  - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>  - don't pay for what you don't need
>  - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores. We probably want one that treats local disk as a cache since the 
> latency to remote storage 

[jira] [Resolved] (SOLR-13101) Shared storage via a new SHARED replica type

2020-12-10 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-13101.
-
Resolution: Won't Fix

> Shared storage via a new SHARED replica type
> 
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> _This issue is closed as Won't-Fix because the particular approach here won't 
> be contributed. Linked issues may appear approaching it differently._
> 
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage. It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>  - durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>  - could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>  - don't pay for what you don't need
>  - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores. We probably want one that treats local disk as a cache since the 
> latency to remote storage is so large. I think there are still some "locking" 
> issues to be solved here (ensuring that more than one writer to the same 
> index won't corrupt it). This should probably be pulled out into a different 
> JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn opened a new pull request #2140: UCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2020-12-10 Thread GitBox


zacharymorn opened a new pull request #2140:
URL: https://github.com/apache/lucene-solr/pull/2140


   # Description
   Support minimumNumberShouldMatch in WANDScorer
   
   Currently has a few `nocommit` to keep track of questions
   
   # Solution
   Similar to `MinShouldMatchSumScorer`, the logic here keeps track of number 
of matched scorers for each candidate doc, and compares it with 
`minShouldMatch` to decide if the minimum number of optional clauses have been 
matched.
   
   # Tests
   Passed existing tests (especially those in `TestBooleanMinShouldMatch` and 
`TestWANDScorer`), and updated some that check for scores.
   
   `./gradlew check` passed with `nocommit` rule commented out for now.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn closed pull request #2140: UCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2020-12-10 Thread GitBox


zacharymorn closed pull request #2140:
URL: https://github.com/apache/lucene-solr/pull/2140


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] zacharymorn opened a new pull request #2141: LUCENE-9346: Support minimumNumberShouldMatch in WANDScorer

2020-12-10 Thread GitBox


zacharymorn opened a new pull request #2141:
URL: https://github.com/apache/lucene-solr/pull/2141


   # Description
   Support minimumNumberShouldMatch in WANDScorer
   
   Currently has a few `nocommit` to keep track of questions
   
   # Solution
   Similar to `MinShouldMatchSumScorer`, the logic here keeps track of number 
of matched scorers for each candidate doc, and compares it with 
`minShouldMatch` to decide if the minimum number of optional clauses have been 
matched.
   
   # Tests
   Passed existing tests (especially those in `TestBooleanMinShouldMatch` and 
`TestWANDScorer`), and updated some that check for scores.
   
   `./gradlew check` passed with `nocommit` rule commented out for now.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [x] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] noblepaul merged pull request #1963: SOLR-14827: Refactor schema loading to not use XPath

2020-12-10 Thread GitBox


noblepaul merged pull request #1963:
URL: https://github.com/apache/lucene-solr/pull/1963


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14827) Refactor schema loading to not use XPath

2020-12-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247625#comment-17247625
 ] 

ASF subversion and git services commented on SOLR-14827:


Commit a95ce0d4224539094dc602ba8afa1ff796009a2b in lucene-solr's branch 
refs/heads/master from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a95ce0d ]

SOLR-14827: Refactor schema loading to not use XPath (#1963)



> Refactor schema loading to not use XPath
> 
>
> Key: SOLR-14827
> URL: https://issues.apache.org/jira/browse/SOLR-14827
> Project: Solr
>  Issue Type: Task
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Labels: perfomance
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> XPath is slower compared to DOM. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2020-12-10 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247640#comment-17247640
 ] 

Zach Chen commented on LUCENE-9346:
---

hi [~jpountz], I spent some time looking into this and studying the algorithms 
in *MinShouldMatchSumScorer* and *WANDScorer,* and just finished with some 
initial changes and opened a draft PR. I think I went with a different 
direction from what you suggested above, by mainly keeping track of the number 
of scorers matched without changing the *WANDScorer* algorithm (not sure if I 
understand it enough to make a correct change either :D ), and comparing it 
with *minShouldMatch* parameter after *minCompetitiveScore* has been reached. 
Could you please take a look and let me know if that approach works as well?

In the PR, I also put in some nocommit to keep track of some questions I have 
(all the tests are now passing without the nocommit comments btw):
 # Currently, *WANDScorer* will only be used for *ScoreMode.TOP_SCORES*. Should 
it be used for other score modes as well once *MinShouldMatchSumScorer* gets 
deprecated? Running *WANDScorer* with other ScodeMode now would fail some tests 
I think.
 # For now inside *WANDScorer*'s constructor, *WANDScorer.cost* is calculated 
as sum of the cost of its individual scorer.  But from 
*MinShouldMatchSumScorer*'s side, the cost is calculated also taking into 
account the *minShouldMatch* parameter as it impacts the tail capacity. Should 
*minShouldMatch* be taken into account in the calculation for *WANDScorer.cost* 
as well**, especially when the current solution in the PR doesn't change the 
tail capacity of *WANDScorer?* 

> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2020-12-10 Thread Zach Chen (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247640#comment-17247640
 ] 

Zach Chen edited comment on LUCENE-9346 at 12/11/20, 5:18 AM:
--

hi [~jpountz], I spent some time looking into this and studying the algorithms 
in *MinShouldMatchSumScorer* and *WANDScorer,* and just finished with some 
initial changes and opened a draft PR. I think I went with a different 
direction from what you suggested above, by mainly keeping track of the number 
of scorers matched without changing the *WANDScorer* algorithm (not sure if I 
understand it enough to make a correct change either :D ), and comparing it 
with *minShouldMatch* parameter after *minCompetitiveScore* has been reached. 
Could you please take a look and let me know if that approach works as well?

In the PR, I also put in some nocommit to keep track of some questions I have 
(all the tests are passing without the nocommit comments btw):
 # Currently, *WANDScorer* will only be used for *ScoreMode.TOP_SCORES*. Should 
it be used for other score modes as well once *MinShouldMatchSumScorer* gets 
deprecated? Running *WANDScorer* with other ScodeMode now would fail some tests 
I think.
 # For now inside *WANDScorer*'s constructor, *WANDScorer.cost* is calculated 
as sum of the cost of its individual scorer.  But from 
*MinShouldMatchSumScorer*'s side, the cost is calculated also taking into 
account the *minShouldMatch* parameter as it impacts the tail capacity. Should 
*minShouldMatch* be taken into account in the calculation for *WANDScorer.cost* 
as well**, especially when the current solution in the PR doesn't change the 
tail capacity of *WANDScorer?* 


was (Author: zacharymorn):
hi [~jpountz], I spent some time looking into this and studying the algorithms 
in *MinShouldMatchSumScorer* and *WANDScorer,* and just finished with some 
initial changes and opened a draft PR. I think I went with a different 
direction from what you suggested above, by mainly keeping track of the number 
of scorers matched without changing the *WANDScorer* algorithm (not sure if I 
understand it enough to make a correct change either :D ), and comparing it 
with *minShouldMatch* parameter after *minCompetitiveScore* has been reached. 
Could you please take a look and let me know if that approach works as well?

In the PR, I also put in some nocommit to keep track of some questions I have 
(all the tests are now passing without the nocommit comments btw):
 # Currently, *WANDScorer* will only be used for *ScoreMode.TOP_SCORES*. Should 
it be used for other score modes as well once *MinShouldMatchSumScorer* gets 
deprecated? Running *WANDScorer* with other ScodeMode now would fail some tests 
I think.
 # For now inside *WANDScorer*'s constructor, *WANDScorer.cost* is calculated 
as sum of the cost of its individual scorer.  But from 
*MinShouldMatchSumScorer*'s side, the cost is calculated also taking into 
account the *minShouldMatch* parameter as it impacts the tail capacity. Should 
*minShouldMatch* be taken into account in the calculation for *WANDScorer.cost* 
as well**, especially when the current solution in the PR doesn't change the 
tail capacity of *WANDScorer?* 

> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15029) Allow Shard Leader to give up leadership gracefully via shard terms

2020-12-10 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247646#comment-17247646
 ] 

Mike Drob commented on SOLR-15029:
--

I think this can be done a lot more simply than what I was trying to accomplish 
at first. If we simply do a leader election, then the current leader will go to 
the end of the queue, a new leader will come in. If there continue to be 
indexing errors on the given node, then the new leader will increase terms and 
the previous one will fall behind.

> Allow Shard Leader to give up leadership gracefully via shard terms
> ---
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15029) More gracefully allow Shard Leader to give up leadership

2020-12-10 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated SOLR-15029:
-
Summary: More gracefully allow Shard Leader to give up leadership  (was: 
Allow Shard Leader to give up leadership gracefully via shard terms)

> More gracefully allow Shard Leader to give up leadership
> 
>
> Key: SOLR-15029
> URL: https://issues.apache.org/jira/browse/SOLR-15029
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently we have (via SOLR-12412) that when a leader sees an index writing 
> error during an update it will give up leadership by deleting the replica and 
> adding a new replica. One stated benefit of this was that because we are 
> using the overseer and a known code path, that this is done asynchronous and 
> very efficiently.
> I would argue that this approach is too heavy handed.
> In the case of a corrupt index exception, it makes some sense to completely 
> delete the index dir and attempt to sync from a good peer. Even in this case, 
> however, it might be better to allow fingerprinting and other index delta 
> mechanisms take over and allow for a more efficient data transfer.
> In an alternate case where the index error arises due to a disconnected file 
> system (possible with shared file systems, i.e. S3, HDFS, some k8s systems) 
> and the required solution is some kind of reconnect, then this approach has 
> several shortcomings - the core delete and creations are going to fail 
> leaving dangling replicas. Further, the data is still present so there is no 
> need to do so many extra copies.
> I propose that we bring in a mechanism to give up leadership via the existing 
> shard terms language. I believe we would be able to set all replicas 
> currently equal to leader term T to T+1, and then trigger a new leader 
> election. The current leader would know it is ineligible, while the other 
> replicas that were current before the failed update would be eligible. This 
> improvement would entail adding an additional possible operation to terms 
> state machine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1992: SOLR-14939: JSON range faceting to support cache=false parameter

2020-12-10 Thread GitBox


madrob commented on a change in pull request #1992:
URL: https://github.com/apache/lucene-solr/pull/1992#discussion_r540707440



##
File path: 
solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java
##
@@ -531,7 +533,20 @@ private  SimpleOrderedMap getRangeCountsIndexed() throws 
IOException {
   private Query[] filters;
   private DocSet[] intersections;
   private void rangeStats(Range range, int slot) throws IOException {
-Query rangeQ = sf.getType().getRangeQuery(null, sf, range.low == null ? 
null : calc.formatValue(range.low), range.high==null ? null : 
calc.formatValue(range.high), range.includeLower, range.includeUpper);
+final Query rangeQ;
+{
+  final Query rangeQuery = sf.getType().getRangeQuery(null, sf, range.low 
== null ? null : calc.formatValue(range.low), range.high==null ? null : 
calc.formatValue(range.high), range.includeLower, range.includeUpper);
+  if (fcontext.cache) {
+rangeQ = rangeQuery;
+  } else if (rangeQuery instanceof ExtendedQuery) {
+((ExtendedQuery) rangeQuery).setCache(fcontext.cache);

Review comment:
   Here (and in the else) I think I would explicitly do `setCache(false)` 
as it feels more readable to me, but I don't have strong opinions on that.

##
File path: 
solr/core/src/test/org/apache/solr/search/facet/TestJsonRangeFacets.java
##
@@ -41,6 +42,7 @@ public static void beforeTests() throws Exception {
 if (Boolean.getBoolean(NUMERIC_POINTS_SYSPROP)) 
System.setProperty(NUMERIC_DOCVALUES_SYSPROP,"true");
 
 initCore("solrconfig-tlog.xml","schema_latest.xml");
+cache = random().nextBoolean();

Review comment:
   Might as well store the string directly?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9564) Format code automatically and enforce it

2020-12-10 Thread Houston Putman (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247684#comment-17247684
 ] 

Houston Putman commented on LUCENE-9564:


+1, I think this is a terrific idea!

Golang has a formatter built into the language, so many Go projects will 
require the formatting to be correct in order to merge. I have many gripes with 
the language, but this something they got 100% right. It is so nice to have 
consistent code and not have to worry about maintaining it.

> Format code automatically and enforce it
> 
>
> Key: LUCENE-9564
> URL: https://issues.apache.org/jira/browse/LUCENE-9564
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Trivial
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This is a trivial change but a bold move. And I'm sure it's not for everyone.
> I started using google java format [1] in my projects a while ago and have 
> never looked back since. It is an oracle-style formatter (doesn't allow 
> customizations or deviations from the defined 'ideal') - this takes some 
> getting used to - but it also eliminates *all* the potential differences 
> between IDEs, configs, etc.  And the formatted code typically looks much 
> better than hand-edited one. It is also verifiable on precommit (so you can't 
> commit code that deviates from what you'd get from automated formatting 
> output).
> The biggest benefit I see is that refactorings become such a joy and keep the 
> code neat, everywhere. Before you commit you just reformat everything 
> automatically, no matter how much you messed it up.
> This isn't a change for everyone. I myself love hand-edited, neat code... but 
> the reality is that with IDE support for automated code changes and so many 
> people with different styles working on the same codebase keeping it neat is 
> a big pain. 
> Checkstyle and other tools are fine for ensuring certain rules but they don't 
> take the burden of formatting off your shoulders. This tool does. 
> Like I said - I had *great* reservations about using it at the beginning but 
> over time got so used to it that I almost can't live without it now. It's 
> like magic - you play with the code in any way you like, then run formatting 
> and it's nice and neat.
> The downside is that automated formatting does imply potential merge problems 
> in backward patches (or any currently existing branches).
> Like I said, it is a bold move. Just throwing this for your consideration.
> -I've added a PR that adds spotless but it's not ready; some files would have 
> to be excluded as they currently violate header rules.-
> A more interesting thing is here where the current code is automatically 
> reformatted - this branch is for eyeballing only.
> https://github.com/dweiss/lucene-solr/compare/LUCENE-9564...dweiss:LUCENE-9564-example
> [1] https://google.github.io/styleguide/javaguide.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14788) Solr: The Next Big Thing

2020-12-10 Thread Mark Robert Miller (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Robert Miller updated SOLR-14788:
--
Description: 
h3. 
[!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
 Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} duty!*{color}
{quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
have some fun. Try to make some progress. Don't stress too much about the 
impact of your changes or maintaining stability and performance and correctness 
so much. Until the end of phase 1, I've got your back. I have a variety of 
tools and contraptions I have been building over the years and I will continue 
training them on this branch. I will review your changes and peer out across 
the land and course correct where needed. As Mike D will be thinking, "Sounds 
like a bottleneck Mark." And indeed it will be to some extent. Which is why 
once stage one is completed, I will flip The Policeman to off duty. When off 
duty, I'm always* *occasionally*{color} *down for some vigilante justice, but I 
won't be walking the beat, all that stuff about sit back and relax goes out the 
window.*_
{quote}
 

I have stolen this title from Ishan or Noble and Ishan.

This issue is meant to capture the work of a small team that is forming to push 
Solr and SolrCloud to the next phase.

I have kicked off the work with an effort to create a very fast and solid base. 
That work is not 100% done, but it's ready to join the fight.

Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
Noble have already contributed support and testing and have plans for 
additional work to shore up some of our current shortcomings.

Others have expressed an interest in helping and hopefully they will pop up 
here as well.

Let's organize and discuss our efforts here and in various sub issues.

  was:
h3. 
[!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
 Policeman is on duty!*{color}
{quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
have some fun. Try to make some progress. Don't stress too much about the 
impact of your changes or maintaining stability and performance and correctness 
so much. Until the end of phase 1, I've got your back. I have a variety of 
tools and contraptions I have been building over the years and I will continue 
training them on this branch. I will review your changes and peer out across 
the land and course correct where needed. As Mike D will be thinking, "Sounds 
like a bottleneck Mark." And indeed it will be to some extent. Which is why 
once stage one is completed, I will flip The Policeman to off duty. When off 
duty, I'm always* {color:#de350b}*occasionally*{color} *down for some vigilante 
justice, but I won't be walking the beat, all that stuff about sit back and 
relax goes out the window.*{color}_
{quote}
 

I have stolen this title from Ishan or Noble and Ishan.

This issue is meant to capture the work of a small team that is forming to push 
Solr and SolrCloud to the next phase.

I have kicked off the work with an effort to create a very fast and solid base. 
That work is not 100% done, but it's ready to join the fight.

Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
Noble have already contributed support and testing and have plans for 
additional work to shore up some of our current shortcomings.

Others have expressed an interest in helping and hopefully they will pop up 
here as well.

Let's organize and discuss our efforts here and in various sub issues.


> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is {color:#de350b}NOW{color} {color:#de350b}OFF{color} 
> duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking,