Take a look at the scoring algorithm on the Wiki, it already takes this into account, albeit modified by how many times the term is mentioned in the field. So a field with 5 terms and one match will score higher than one with 10 terms and one match. Where it lands with 10 terms and 2 matches I leave as an exercise for the reader.
I really think you're reinventing the wheel here and looking at the default scoring mechanism would be a good use of your time. Best Erick On Wed, May 26, 2010 at 4:04 AM, Sascha Szott <sz...@zib.de> wrote: > Hi Erick, > > Erick Erickson wrote: > >> Ah, I may have misunderstood, I somehow got it in my mind >> you were talking about the length of each term (as in string length). >> >> But if you're looking at the field length as the count of terms, that's >> another question, sorry for the confusion... >> >> I have to ask, though, why you want to sort this way? The relevance >> calculations already factor in both term frequency and field length. >> What's >> the use-case for sorting by field length given the above? >> > It's not a real world use-case -- I just want to get a better understanding > of the data I'm indexing (therefore, performance is neglectable). In my > current use case, you can think of the field length as an indicator of data > quality (i.e., the longer the field content, the worse the quality is). > Being able to sort the field data in order of decreasing length would allow > me to investigate "exceptional" data items that are not appropriately > handled by my curation process. > > Best, > Sascha > > > >> Best >> Erick >> >> On Tue, May 25, 2010 at 3:40 AM, Sascha Szott<sz...@zib.de> wrote: >> >> Hi Erick, >>> >>> >>> Erick Erickson wrote: >>> >>> Are you sure you want to recompute the length when sorting? >>>> It's the classic time/space tradeoff, but I'd suggest that when >>>> your index is big enough to make taking up some more space >>>> a problem, it's far too big to spend the cycles calculating each >>>> term length for sorting purposes considering you may be >>>> sorting all the terms in your index worst-case. >>>> >>>> Good point, thank you for the clarification. I "thought" that Lucene >>> internally stores the field length (e.g., in order to compute the >>> relevance) >>> and getting this information at query time requires only a simple lookup. >>> >>> -Sascha >>> >>> >>> >>> But you could consider payloads for storing the length, although >>>> that would still be redundant... >>>> >>>> Best >>>> Erick >>>> >>>> On Mon, May 24, 2010 at 8:30 AM, Sascha Szott<sz...@zib.de> wrote: >>>> >>>> Hi folks, >>>> >>>>> >>>>> is it possible to sort by field length without having to (redundantly) >>>>> save >>>>> the length information in a seperate index field? At first, I thought >>>>> to >>>>> accomplish this using a function query, but I couldn't find an >>>>> appropriate >>>>> one. >>>>> >>>>> Thanks in advance, >>>>> Sascha >>>>> >>>>> >>>>> >>>>> >>> >> >