: My data are library call numbers, normalized to be comparable, resulting in : (maximum) 21-character strings of the form "RK 052180H359~999~999" : : Now, these are fine -- they work for sorting and ranges and the whole thing, : but right now I can't use them because I've got two or three for each of my : 6M documents and on a 32-bit machine I run out of heap. : : Another option would be to turn them into longs (using roughly 56 bits of : the 64 bit space) and use a trie type. Is there any sort of a win involved : there?
I don't think Trie fields can be used for sorting (because they result in multiple terms per doc) but i could be wrong about that, smarter people then me may have done something cool with the TreiField that i'm not aware of. As a general rule: if you have character data that fits a rigid enough set of constraints that you can encode any legal value into a single numberic value (either int, or long) such that they still sort properly, sorting on those encoded values is going to be more memory efficient (and probably just as fast) as sorting on the string values. -Hoss