[GitHub] [lucene] mocobeta commented on issue #11851: Luke web interface

2022-11-06 Thread GitBox


mocobeta commented on issue #11851:
URL: https://github.com/apache/lucene/issues/11851#issuecomment-1304733837

   I recognize there have been requests for a web-based Luke-like app. But I 
would prefer to develop/maintain such an application outside Lucene with some 
web application framework. It's a state-full application in nature, I think 
it'd be much easier to implement with a decent framework that supports http 
sessions.
   
   If you are not in a hurry, I'll try to create a web application based on 
Spring MVC and Thymeleaf (100% Java without cool JavaScripts for 
sustainability) that partially emulates Luke desktop GUI. It can be independent 
of Lucene libs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jfboeuf opened a new pull request, #11900: Reduce bloom filter size by using the optimal count for hash functions.

2022-11-06 Thread GitBox


jfboeuf opened a new pull request, #11900:
URL: https://github.com/apache/lucene/pull/11900

   BloomFilteringPostingsFormat currently relies on a bloom filter with one 
hash function (k=1). For a target false positive probability of 10%, 1 is never 
the optimal value for k. Using the best value for k would either:
   * achieve a much better fpp with the same bitset size, or
   * achieve the same fpp with a reduced size (half of what is currently used.
   
   From tests:
   * targeting a better false positive probability doesn't bring significant 
enough better performance for the increased size;
   * Targeting a smaller size by degrading the false positive probability comes 
with a significant performance hit.
   
   As consequence, a target false positive probability of about 10% seems to be 
a good trade-off. I slightly raised this value (to 0.1023f) so the size of 
newly allocated bloom filters is always half the size of what they used to be. 
The effective false positive probability varies from significantly better in 
most cases to slightly worse in rare cases. [This graph 
](https://drive.google.com/file/d/1RgofprJ0GyYaDQUZD59Pdp_b2gfmJASA/view?usp=sharing)
 compares both size and effective false positive probability of the current and 
proposed implementations. Overall performance remains comparable (slightly but 
not significantly better); the reduced size and the improved false positive 
probability compensate for the cost of having additional hashes. You can find 
in branch bloomPerfBench the class BloomBench I used to check for performance.
   
   In addition, the implementation of the bitset is based on a long array, so 
picking up a size lower than 64 bits is pointless.
   
   API change:
   * HashFunction.hash(BytesRef) returns a long: more accuracy with a 64bits 
hash useful to derivate additional hashes from the original one.
   
   The proposed implementation remains compatible with existing/persisted bloom 
filters.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #11852: Luke Webapp

2022-11-06 Thread GitBox


mocobeta commented on PR #11852:
URL: https://github.com/apache/lucene/pull/11852#issuecomment-1304828029

   I'm late to the party. Do we really want to have/maintain a web application 
under Lucene? An HTTP server would not be sufficient to develop a state-full 
web app, you need to write an application server from scratch to interact with 
users. If you create a separate OSS project for that, you can use any standard 
web technology such as Servlet API.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mdmarshmallow opened a new pull request, #11901: Github#11869: Add RangeOnRangeFacetCounts

2022-11-06 Thread GitBox


mdmarshmallow opened a new pull request, #11901:
URL: https://github.com/apache/lucene/pull/11901

   ### Description
   
   
   Issue: https://github.com/apache/lucene/issues/11869
   
   Adds `RangeOnRangeFacetCounts` which supports double ranges and long ranges 
to mirror `RangeFacetCounts`. Currently, this does not have support for 
multivalues fields and uses basic linear scanning to count the facets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org