> does FST provide a predecessor lookup function
Yes [1]. seekCeil, seekFloor, seekExact are the methods provided for
seeking.
1.
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/fst/BytesRefFSTEnum.java
On Fri, Jun 8, 2012 at 4:03 PM, Pavel Yaskevich
Dne 8.6.2012 21:19, Jason Rutherglen napsal(a):
Ok looks like the IndexSummary encapsulates everything, I can start with
hacking that.
do memory part first. i want to test it on existing serialized index data.
Yeah, that is why I wrote "if possible" :) Also, does FST provide a predecessor
lookup function, wasn't clear from the blog post?
On Friday 8 June 2012 at 22:53, Jason Rutherglen wrote:
> Yeah that's fine, however if there isn't a Java implementation that's a lot
> of extra work. The FST approac
Yeah that's fine, however if there isn't a Java implementation that's a lot
of extra work. The FST approach should be a clear quick and easy win. The
current system of in heap keys and a binary search is what the FST replaced
in Lucene. There are plenty of references to the improvement.
On Fri,
I would vote, if possible, to compare it with y-fast trie [1] (it doesn't seem
to be available java implementation unfortunately) by means of memory
efficiency and lookup performance. As we use big integer tokens the main
benefit from that trie could be O(log log M) predecessor lookup and compa
Ok looks like the IndexSummary encapsulates everything, I can start with
hacking that.
On Fri, Jun 8, 2012 at 11:50 AM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> The Cassandra integration is probably beyond the time I have available.
> If the locations in the code that need to be re
The Cassandra integration is probably beyond the time I have available. If
the locations in the code that need to be rewritten to use the FST are
known, and a patch simply 'plugs-in' the FST, that would be much easier.
Eg, I don't know how Cassandra stores the current key index for example...
Bas
If you are interested I can help, I used the FST on a Hadoop project
to implement a fast map side range join.
create JIRA item with patch attached, i will test it.
Seems like a good place to try out Lucene's FST [1] data structure
which would enable more keys to be loaded into RAM (for more granular
seeks), along with their positions. Lucene uses this for the terms
dictionary and it's use made for nice gains in the efficiency of the
terms dictionary. The ef
Implementation is in IndexSummary.java; the core is
private final ArrayList positions;
private final ArrayList keys;
So no, nothing fancy like prefix compression.
On Wed, Jun 6, 2012 at 11:00 AM, Jason Rutherglen
wrote:
> I am wondering how this is currently implemented? Is there prefi
10 matches
Mail list logo