Dear List,
I have been studying Solr to build up an index of musical incipit encoded
as strings into bibliographical record to retrofit this kind of search into
an existing database.
Basically we store the incipit data (filtered through a custom TokenFilter)
as a multi valued field (one for each different incipit) inside a doc
(which is my bibliographical record).
Searching is very good and precise, but I have a problem: I cannot figure
out how to know which one of the values in my multivalued field generated
the hit! I only get a reference to a whole doc (and when I have 40 incipits
it is a bit of a problem). I thought I could just manually cycle all the
incipits in all the matching docs and match by hand, but this is not easy
since the values are mangled by the TokenFilter.
I saw some references that a solution to this problem is to use a
Highlighter and then extract the matching value. In principle this works (I
used FastVectorHighligter), but I have an additional problem: when
searching very broad queries (like all the incipits which start with a C) I
get obviously a big list of results (65k), but the highlighter would match
only a very small subset of them (2 or 3), whilist the query would return
all the correct (paginated) results. Attaching a debugger and tracing the
highlighter code I found that the FieldQuery just rewrites the fist 1024
queries, and hence for all the results > 1024 it is very easy that my
tokens will not get highlighted (and I cannot retrieve my value in the
multivalue field).
Can anyone help me out here? is there something very obvious I am missing?
Is there an easy mechanism to just get the field that matched a query in a
multiValued one?
Thanks!
Rodolfo

Reply via email to