[GitHub] [lucene] jpountz commented on issue #11770: Optimization for time series data

2022-09-15 Thread GitBox


jpountz commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1247796727

   > it seems that the core idea in this paper is similar to 
IndexSortSortedNumericDocValuesRangeQuery
   
   This is my understanding as well, though it says it uses the BKD tree to 
figure out the range of doc IDs, not doc values, which seems to be the idea 
that is proposed at https://github.com/apache/lucene/pull/687 (which I just 
realized I had completely forgotten about :grimacing:).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz merged pull request #1068: LUCENE-10674: Update subiterators when BitSetConjDISI exhausts

2022-09-15 Thread GitBox


jpountz merged PR #1068:
URL: https://github.com/apache/lucene/pull/1068


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] shaie opened a new pull request, #11775: Minor refactoring and cleanup to taxonomy index code

2022-09-15 Thread GitBox


shaie opened a new pull request, #11775:
URL: https://github.com/apache/lucene/pull/11775

   ### Description
   
   Aside from some cleanups (typos, improving comments), this PR addresses few 
issues:
   
   1. `DirTaxoWriter.nextID` is declared `volatile` however this `nextID++` is 
not a safe-operation. Switched to `AtomicInteger`.
   2. `DirTaxoReader` protected constructor couldn't really be extended since 
`TaxonomyIndexArrays` is package-private and isn't exported by the module. 
Therefore I think it's safe to change the constructor to package-private too.
   3. Changed the [Double-Check 
Lock](https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html) 
pattern implementation to assign the `volatile` field to a local variable, so 
that we do a volatile read only once if the reference isn't null.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #11774: GH-11172: remove WindowsDirectory and native subproject.

2022-09-15 Thread GitBox


dweiss commented on PR #11774:
URL: https://github.com/apache/lucene/pull/11774#issuecomment-1247974260

   Ah. missed the bull's eye, didn't I.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11772: remove WindowsDirectory

2022-09-15 Thread GitBox


rmuir commented on issue #11772:
URL: https://github.com/apache/lucene/issues/11772#issuecomment-1248043771

   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #11772: remove WindowsDirectory

2022-09-15 Thread GitBox


uschindler commented on issue #11772:
URL: https://github.com/apache/lucene/issues/11772#issuecomment-1248061894

   Let's remove it. Actually the whole code is not tested at all. The removed 
Testcase extends LuceneTestCase and not BaseDirectoryTestcase. The only thing 
it does is to instantiate a Directory and an IndexOutput (!!) that is not even 
triggering custom code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss merged pull request #11774: GH-11172: remove WindowsDirectory and native subproject.

2022-09-15 Thread GitBox


dweiss merged PR #11774:
URL: https://github.com/apache/lucene/pull/11774


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #11772: remove WindowsDirectory

2022-09-15 Thread GitBox


dweiss commented on issue #11772:
URL: https://github.com/apache/lucene/issues/11772#issuecomment-1248177490

   Applied on 9x and main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss closed issue #11772: remove WindowsDirectory

2022-09-15 Thread GitBox


dweiss closed issue #11772: remove WindowsDirectory
URL: https://github.com/apache/lucene/issues/11772


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #11774: GH-11172: remove WindowsDirectory and native subproject.

2022-09-15 Thread GitBox


uschindler commented on PR #11774:
URL: https://github.com/apache/lucene/pull/11774#issuecomment-1248218763

   Thanks. I was just wondering, why this strange title of PR with "GH-"? I 
would just put issue number in usual # notation. This does not highlight at all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] llermaly opened a new issue, #11776: Non self intersecting polygons can't be indexed

2022-09-15 Thread GitBox


llermaly opened a new issue, #11776:
URL: https://github.com/apache/lucene/issues/11776

   ### Description
   
   The following polygons are valid, but considered self intersecting by Lucene 
: 
   
   ```
   POLYGON ((8.8970989818779 54.4134906575883, 8.90042774485873 
54.4146874897743, 8.90594809529893 54.4171621281855, 8.91004327482905 
54.4202335124536, 8.9093605425 54.421660818, 8.923427357 
54.429292006, 8.892597461 54.412569037, 8.8870137444 
54.412826828, 8.8802484759 54.411739775, 8.8704911837 
54.40738926, 8.8572578773 54.406537888, 8.832464316 
54.410877071, 8.83028999859022 54.410779813, 8.8301056348 
54.409738029, 8.83542087096422 54.4081201758963, 8.8434158599249 
54.4059310703591, 8.8498879933749 54.4038371457592, 8.85426620240666 
54.4029805394939, 8.85731191163137 54.4032660762063, 8.86483100908713 
54.4043130389967, 8.87230838608615 54.4060266046257, 8.88148723601366 
54.4091671397265, 8.88577026584612 54.4101189229804, 8.89195686439317 
54.4116417778086, 8.8970989818779 54.4134906575883))
   
   POLYGON ((7.89437024403906 47.5862590252318, 7.89312177803361 
47.5869704801294, 7.89281574806746 47.5870946189537, 7.89525097569983 
47.5857177586665, 7.89806367361792 47.5841274339808, 7.90068804512661 
47.5825559862467, 7.89998956367071 47.5830121477752, 7.89515885167079 
47.585809621127, 7.89437024403906 47.5862590252318))
   
   POLYGON ((11.077430168 54.298432536, 11.0827805841396 
54.2829539912519, 11.0830386027471 54.2818703102967, 11.0832788032709 
54.2797565892852, 11.083278269 54.279818029, 11.0830099985918 
54.282292149, 11.077430168 54.298432536))
   ```
   
   @craigtaverner made some research and noted the following : 
   
   _all have a feature in common, a very narrow constriction, causing one side 
of the polygon to touch (almost) the other side. This is likely the source of 
the issue, and in-line with your theory regarding numerical errors._
   
   _I've just written tests for these polygons in the latest version of lucene, 
and all three fail triangulation (a necessary step for indexing the polygons). 
So this is, I believe, the lucene tessellator (triangulator) code not being 
able to handle these points being so close to the other edge of the polygon._ 
   
   _I think it is the indexing that is failing, since triangulation is only 
needed for indexing, So until you import it to the index, it is fine. I was 
reading the code in this area, and found an interesting comment about it not 
including some special floating point tricks to be more accurate. So it could 
be that we just have to implement those tricks. I saw a link to this page which 
might cover those needs. https://www.cs.cmu.edu/~quake/robust.html_
   
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on issue #11765: Query optimizer statistics

2022-09-15 Thread GitBox


jpountz commented on issue #11765:
URL: https://github.com/apache/lucene/issues/11765#issuecomment-1248281846

   Lucene has a `QueryProfilerIndexSearcher` that allows to capture some of 
this information for a given search, but it adds a lot of overhead. The way 
that Lucene interleaves evaluation of queries doesn't allow tracking statistics 
in a cheap way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz closed issue #11765: Query optimizer statistics

2022-09-15 Thread GitBox


jpountz closed issue #11765: Query optimizer statistics
URL: https://github.com/apache/lucene/issues/11765


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-15 Thread GitBox


jpountz commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1248288919

   Historically this was not configurable and Lucene would allow up to 50% 
deleted documents. When we introduced an option, we made sure to introduce a 
lower bound on the value because a value of zero would essentially require 
Lucene to rewrite every segment that has a deletion after every update 
operation, which is certainly undesirable. Allowing users to go from 50% to 20% 
felt like a significant improvement already.
   
   We could discuss lowering the limit if we feel like this could lead to 
merging patterns that are still acceptable. E.g. I used 
`BaseMergePolicyTestCase#doTestSimulateUpdates` in the past to get a sense of 
how this option would influence write amplification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on issue #11770: Optimization for time series data

2022-09-15 Thread GitBox


LuXugang commented on issue #11770:
URL: https://github.com/apache/lucene/issues/11770#issuecomment-1248298615

   > Could you tell me which lucene's files should I read, so I could implement 
that algorithm?
   
   I think you could first read  `IndexSortSortedNumericDocValuesRangeQuery`, 
then you would understand more about that paper.  I would also be more than 
happy to learn from each other about Lucene with 
[WeChat](https://www.amazingkoala.com.cn/Lucene/2018/1204/22.html).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow commented on issue #11761: Expand TieredMergePolicy deletePctAllowed limits

2022-09-15 Thread GitBox


mdmarshmallow commented on issue #11761:
URL: https://github.com/apache/lucene/issues/11761#issuecomment-1248348122

   Hi, thanks for the response! Your explanation of 0% not being allowed makes 
complete sense. For some context though, using our own forked version of 
`TieredMergePolicy`, we have tested with down to 2% allowable deletion and 
still see that behavior is desirable for us (more specifically much lower index 
sizes than at 20% deletes).
   
   Maybe if we want to maintain those limits, we could create a 
`setDeletesPctAllowedUnsafe()` or something like that which has no limits on it 
(or at least drops the lower bound to being > 0 instead of > 20)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] llermaly opened a new issue, #11777: Unusually slow indexing polygons

2022-09-15 Thread GitBox


llermaly opened a new issue, #11777:
URL: https://github.com/apache/lucene/issues/11777

   ### Description
   
   Some polygons are taking a lot of time to index (13MB, 15 minutes), and some 
way larger ones (50MB+) taking just a couple of minutes. 
   
   Attached two of this polygons.
   
   
[FE-2456.txt](https://github.com/apache/lucene/files/9577391/FE-2456.txt)
   
[ORG-24132378.txt](https://github.com/apache/lucene/files/9577398/ORG-24132378.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] llermaly commented on issue #11767: Does the method #cureLocalIntersections in the Tessellator make any sense?

2022-09-15 Thread GitBox


llermaly commented on issue #11767:
URL: https://github.com/apache/lucene/issues/11767#issuecomment-1248367379

   Hi @iverase would be nice if you could go to 
https://github.com/apache/lucene/issues/11777 and test with those polygons as 
well. We are having Elastic Cloud timeouts because of the time it takes to 
triangulate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nknize commented on issue #11767: Does the method #cureLocalIntersections in the Tessellator make any sense?

2022-09-15 Thread GitBox


nknize commented on issue #11767:
URL: https://github.com/apache/lucene/issues/11767#issuecomment-1248382179

   > My proposal is to remove the method completely or at least not call this 
method if the Tessellator has been called with the flag 
`checkSelfIntersections` set to true.
   > 
   > @nknize introduced this method on the first version of the Tessellator, he 
might have more background about the need of this method. what do you think?
   
   I was actually looking into this before turning my attention to the shape 
doc values and I was just getting ready to come back to it. The method was 
originally introduced to postpone self intersection removal unless absolutely 
necessary (e.g., tessellation failed). Essentially it was a lazy cleaning 
approach. I believe, though this needs thorough evaluation, some of the 
improvements made to [filter 
points](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/geo/Tessellator.java#L1326)
 rendered much of this logic obsolete. My last test (random and explicit) never 
actually exercised this logic, even with self intersections. Furthermore, the 
follow up SPLIT logic was not exercised either so I was exploring the 
possibility of removing both of these phases.  
   
   > if you could go to https://github.com/apache/lucene/issues/11777 and test 
with those polygons as well. We are having Elastic Cloud timeouts because of 
the time it takes to triangulate.
   
   These adversarial cases are important to capture in our tests. Our 
randomized polygon generator doesn't inject any self intersections so we really 
have a gap in testing the logic coverage. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] patelprateek commented on issue #11765: Query optimizer statistics

2022-09-15 Thread GitBox


patelprateek commented on issue #11765:
URL: https://github.com/apache/lucene/issues/11765#issuecomment-1248413173

   @jpountz : After a query runs , i read that lucene uses filter cache where 
it encodes the posting list using compressed bitmaps (roaring) , is there any 
api to retrieve these compressed bitmap rather than iterating over the actual 
document ids ?
   
   My use case is some filters can have possibly large hits (>10 million) and 
in such scenarios the compressed bitmaps can possibly help for downstream logic 
. Any recommendations or pointers for any other approaches ? 
   For a query is it possible to have a quick dry run to get estimated number 
of documents it will return ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] llermaly commented on issue #11767: Does the method #cureLocalIntersections in the Tessellator make any sense?

2022-09-15 Thread GitBox


llermaly commented on issue #11767:
URL: https://github.com/apache/lucene/issues/11767#issuecomment-1248415566

   Here I have some valid polygons being rejected for self intersecting, in 
case are useful for you to test: 
   
   https://github.com/apache/lucene/issues/11776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #11774: GH-11172: remove WindowsDirectory and native subproject.

2022-09-15 Thread GitBox


dweiss commented on PR #11774:
URL: https://github.com/apache/lucene/pull/11774#issuecomment-1248464122

   This is an alternative notation for issue numbers that github actually 
understands; see commit links, for example:
   
![image](https://user-images.githubusercontent.com/199470/190483253-15d25128-57f1-4357-8a93-4d11771d37ba.png)
   I tend to prefer it to hash+number because hash+number is treated as a 
commented line if you edit any previous commit (rebase interactive, amend, 
etc.)... There are ways to work around it, but it requires some manual tweaks - 
I prefer to just use GH-xxx.
   
   
https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/autolinked-references-and-urls#issues-and-pull-requests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] danmuzi commented on pull request #11774: GH-11172: remove WindowsDirectory and native subproject.

2022-09-15 Thread GitBox


danmuzi commented on PR #11774:
URL: https://github.com/apache/lucene/pull/11774#issuecomment-1248479475

   I think the issue number for this patch is wrong again.
   It needs to be changed from #11172 to #11772.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] danmuzi opened a new issue, #11778: add detailed part-of-speech tag for particle and ending on Nori

2022-09-15 Thread GitBox


danmuzi opened a new issue, #11778:
URL: https://github.com/apache/lucene/issues/11778

   ### Description
   
   There are several tag types for **particle**(조사) and **ending**(어미) in 
mecab-ko-dic.
   
(https://docs.google.com/spreadsheets/d/1-9blXKjtjeKZqsf4NzHeYJCrr49-nXeRF6D80udfcwY)
   But Nori only tags **J(particle), E(ending)** for that.
   
   When using a Korean morpheme analyzer, detailed part-of-speech information 
is often required. (E.g., misanalysis debugging)
   Or, there is case that user want to remove specific pos tag.
   For this case, Lucene currently supports `KoreanPartOfSpeechStopFilter`.
   With the current structure, it is impossible to remove a specific tag for 
particle and ending.
   (E.g., I only want to remove "Sentence-closing ending" pos tag)
   
   To solve that, detailed pos tag information for particle and ending is 
needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] danmuzi opened a new pull request, #11779: GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori

2022-09-15 Thread GitBox


danmuzi opened a new pull request, #11779:
URL: https://github.com/apache/lucene/pull/11779

   add detailed part-of-speech tag for particle and ending on nori.
   The part-of-speech name was set based on the **Korean-English Learners' 
Dictionary** of [National Institute of the Korean 
Language](https://korean.go.kr/front_eng/main.do).
   (https://krdict.korean.go.kr/eng/)
   
   Closes #11778 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on issue #10269: Lucene web site broken links [LUCENE-9229]

2022-09-15 Thread GitBox


janhoy commented on issue #10269:
URL: https://github.com/apache/lucene/issues/10269#issuecomment-1248570742

   I'll close this old issue. Anyone discovering any new broken links on the 
site can fix those in new PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy closed issue #10269: Lucene web site broken links [LUCENE-9229]

2022-09-15 Thread GitBox


janhoy closed issue #10269: Lucene web site broken links [LUCENE-9229]
URL: https://github.com/apache/lucene/issues/10269


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] janhoy commented on pull request #591: LUCENE-10365 Wizard changes contributed from Solr

2022-09-15 Thread GitBox


janhoy commented on PR #591:
URL: https://github.com/apache/lucene/pull/591#issuecomment-1248760591

   @msokolov This has been hanging for a while, and I'll now merge it into main 
and then to branch_9x. 
   
   Just though I'd alert you as 9.4.0 RM, although I don't anticipate any 
issues with the ongoing release, as these are mostly bug fixes and improvements 
related to release signing (ability to sign with gradle plugin instead of GPG). 
I'll let you make the call whether you merge it into branch_9_4 for use with 
this release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on issue #11767: Does the method #cureLocalIntersections in the Tessellator make any sense?

2022-09-15 Thread GitBox


iverase commented on issue #11767:
URL: https://github.com/apache/lucene/issues/11767#issuecomment-1248975086

   >The method was originally introduced to postpone self intersection removal 
   
   I don't understand this. We re claiming in the java docs that polygons 
should not be self-intersecting and we do not introduce self-intersections in 
our code, why we want to remove them?
   
   ```
* 
*   Requires valid polygons:
*   
* No self intersections
* Holes may only touch at one vertex
* Polygon must have an area (e.g., no "line" boxes)
* sensitive to overflow (e.g, subatomic values such as E-200 
can cause unexpected
* behavior)
*   
* 
   ```
   Looking at the original code which the tessellator is inspired on, the 
method was introduced to handle some OSM polygons that contain 
self-intersections, hence not valid: https://github.com/mapbox/earcut/issues/8
   As we claim we only support valid polygons, I think it is safe to remove the 
method entirely.
   
   @llermaly I found this issue by looking into one of your polygons so we 
should expect nice performance improvements.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] nknize commented on issue #11767: Does the method #cureLocalIntersections in the Tessellator make any sense?

2022-09-15 Thread GitBox


nknize commented on issue #11767:
URL: https://github.com/apache/lucene/issues/11767#issuecomment-124898

   > I don't understand this. We re claiming in the java docs that polygons 
should not be self-intersecting and we do not introduce self-intersections in 
our code, why we want to remove them?
   
   
   Because real life Geo data doesn't care what our javadocs say. Small self 
intersections are a reality that rears its head every now and then in real data 
and the performance hit to "best effort" detect and clean in the tessellator's 
cure phase at index time was worth more than directing user's to a third party 
cleaning utility before indexing. 
   
   Our blind polygon class does nothing to enforce those javadocs so maybe 
before completely removing we might consider flagging that phase as an optional 
validation step disabled by default? (Think `ignore_malformed`) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org