You might be better off starting with the Lucene CheckIndex program.
It walks all of the Lucene index data structures. I have done
forensics by fiddling with the CheckIndex code.

On Thu, Aug 26, 2010 at 9:11 AM, Shawn Heisey <s...@elyograg.org> wrote:
>  On 5/24/2010 6:30 AM, Sascha Szott wrote:
>>
>> Hi folks,
>>
>> is it possible to sort by field length without having to (redundantly)
>> save the length information in a seperate index field? At first, I thought
>> to accomplish this using a function query, but I couldn't find an
>> appropriate one.
>>
>
> I have a slightly different need related to this, though it may turn out
> that what Sascha wants is similar.  I would like to understand my data
> better so I can improve my schema.  I need to do some data mining that is
> (to my knowledge) difficult or impossible with the source database.
>  Performance is irrelevant, as long as it finishes eventually.  Completing
> in less than an hour would be nice.
>
> I would do this on a test system with much lower performance and memory
> (4GB) than my production servers, as a single index instead of multiple
> shards.  When it finishes building, the entire test index is likely to be
> about 75GB.
>
> What I'm after is an output that would look very much like faceting, but I
> want it to show document counts associated with field length (for a simple
> string) and number of terms (for a tokenized field) instead of field value.
>  Can Solr do that, and if so, what do I need to have enabled in the schema
> to get it?  Would branch_3x be enough, or would trunk be better?
>
> Thanks,
> Shawn
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to