Hello,

I'm trying to implement automatic document classification and store
the classified attributes as an additional field in Solr document.
Then the search goes against that field like
q=classified_category:xyz. The document classification is currently
implemented as an UpdateRequestProcessor and works quite well. The
only problem: for each change in the classification algorithm every
document has to be re-indexed which, of course, makes tests and
experimentation difficult and binds resources (other than Solr) for
several hours.

So, my idea would be to store classified attributes in a meta-index
and search over the main and meta indexes simultaneously. For example:
main index has got fields like color and meta index has got
classified_category. The query "q=classified_category:xyz AND
color:black" should be then split over the main and meta index. This
way, the classification could run on Solr over the main index and
store classified fields in the meta index so that only Solr resources
are bound.

Has anybody already done something like that? It's a little bit like
sharding but different in that each shard would process its part of
the query and live in the same Solr instance.

Regards,
Valeriy

Reply via email to