[jira] [Commented] (LUCENE-9450) Taxonomy index should use DocValues not StoredFields

2021-07-18 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382822#comment-17382822
 ] 

Mayya Sharipova commented on LUCENE-9450:
-

[~gworah]  That's indeed a concern. The workaround  would be to add a binary 
doc values field in version 8.x, force merge to a single segment, so that a 
FieldInfo for $full_path$  contains doc values as well, and then upgrade to 
9.0.  We don't do data structures consistency checks  for older indices on 
individual docs , just on a segment level.

Do you think it is a viable workaround?

 

 

> Taxonomy index should use DocValues not StoredFields
> 
>
> Key: LUCENE-9450
> URL: https://issues.apache.org/jira/browse/LUCENE-9450
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Affects Versions: 8.5.2
>Reporter: Gautam Worah
>Priority: Minor
>  Labels: performance
> Fix For: main (9.0)
>
> Attachments: LUCENE-9450-localrun.py-v1, wip_taxonomy_patch
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The taxonomy index that maps binning labels to ordinals was created before 
> Lucene added BinaryDocValues.
> I've attached a WIP patch (does not pass tests currently)
> Issue suggested by [~mikemccand]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #215: LUCENE-10028: Add git pre-commit hook that runs precommit task.

2021-07-18 Thread GitBox


mocobeta commented on pull request #215:
URL: https://github.com/apache/lucene/pull/215#issuecomment-882058333


   It surely won't fit everyone's (especially git experts') use-cases, and it's 
a local setup anyway; devs should be able to set up `pre-commit` or `pre-push` 
hook without the help of Gradle if they'd like. I threw in this since I would 
like to make sure all linters are locally run before pushing changes to the 
remote repo (and I almost always forget that) though, I don't intend to force 
or recommend others to do so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley opened a new pull request #216: Introduce DocTermVectors in lieu of Fields.

2021-07-18 Thread GitBox


dsmiley opened a new pull request #216:
URL: https://github.com/apache/lucene/pull/216


   https://issues.apache.org/jira/browse/LUCENE-10018
   Let's not use the Fields class anymore for TermVectors.  In this PR, we 
introduce a new class "DocTermVectors" in its stead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dsmiley commented on pull request #216: Introduce DocTermVectors in lieu of Fields.

2021-07-18 Thread GitBox


dsmiley commented on pull request #216:
URL: https://github.com/apache/lucene/pull/216#issuecomment-882247189


   In the first commit of this PR, I introduce "DocTermVectors" subclassing 
Fields.  Another commit can inline Fields.
   
   What do we think of the name?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9935) Bulk merges for stored fields when index sorting is enabled

2021-07-18 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9935.
--
Resolution: Fixed

It looks like this issue has been fully merged, so marking fixed.

> Bulk merges for stored fields when index sorting is enabled
> ---
>
> Key: LUCENE-9935
> URL: https://issues.apache.org/jira/browse/LUCENE-9935
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Nhat Nguyen
>Priority: Minor
> Fix For: 9.0, 8.10
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Today stored fields disable bulk merges entirely when index sorting is 
> enabled. However when sorting by low-cardinality fields or when the index 
> sort is correlated with the order in which documents get indexed, we could 
> likely still have efficient bulk merges.
> For instance, if you are merging two segments that are sorted on a field that 
> can only take 2 values, one could bulk merge the first half of the first 
> segment, then the first half of the second segment, then the second half of 
> the first segment, and finally the second half of the second segment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org