On 5/24/2010 6:30 AM, Sascha Szott wrote:
Hi folks,

is it possible to sort by field length without having to (redundantly) save the length information in a seperate index field? At first, I thought to accomplish this using a function query, but I couldn't find an appropriate one.


I have a slightly different need related to this, though it may turn out that what Sascha wants is similar. I would like to understand my data better so I can improve my schema. I need to do some data mining that is (to my knowledge) difficult or impossible with the source database. Performance is irrelevant, as long as it finishes eventually. Completing in less than an hour would be nice.

I would do this on a test system with much lower performance and memory (4GB) than my production servers, as a single index instead of multiple shards. When it finishes building, the entire test index is likely to be about 75GB.

What I'm after is an output that would look very much like faceting, but I want it to show document counts associated with field length (for a simple string) and number of terms (for a tokenized field) instead of field value. Can Solr do that, and if so, what do I need to have enabled in the schema to get it? Would branch_3x be enough, or would trunk be better?

Thanks,
Shawn

Reply via email to