Chris Hostetter wrote:
: However, I'd like to hear a comment on the approach of doing the parsing
: using Lucene and then constructing a SolrQuery from a Lucene Query:
I believe you are asking about doing this in the client code? using the
Lucene QueryParser to parse a string using an analyzer, then toString'ing
that and sending it across hte wire to Solr?
Yes.
i would strongly advise against it.
Thank you.
Query.toString() is intended purely as a debugging tool, not as a
serialization mechanism. It's very possible for the toString() value of
a query to not be useful in attempting to recreate the query --
particularly if the analyzer being used by Solr for the "re-parse" doesn't
know to expect terms that have already been stemmed, or modified in the
various ways the clinet may hvae done so (and if you have to go to all
that work to make solr know about what you've pre-analyzed, why not just
let solr do it for you?)
Is there a (better) way to construct a Solr's SolrQuery object from a
Lucene's Query object?
: Similarly, at indexing time:
...
: What are the drawbacks of this approach?
Hmmm... well besides hte drawback of doing all the hard work solr will do
for you, i suppose that as long as you are extremely careful to manage
both the indexing side and the query side externally from Solr then there
is nothing wrong with this appraoch -- you would essentailly just have a
single field type in your schema.xml that would use a whitespace tokenizer
-- but again, this would make you lose out on a lot of solr's features
(notably: the stored values in your index would be the post-analyze
tokens, you would be force to trust the clients 100% to send you clean
data at index and query time intead of being able to configure it
centrally, etc...)
The rationale for wanting doing all the analysis (both query time and
indexing time) client side is that I have an application which is using
Lucene and it is already doing that and it has some "unusual"
requirements (i.e. almost all fields are dynamicFields with
custom/configurable analyzers per field).
I completely agree with everything you said and with the "dangers" of
doing the analysis client side and then let Solr re-analyzing again
server side. However, as you suggested, a simple whitespace tokenizer
on Solr should be relatively safe.
Definitely, your previous suggestion of using dynamicFields for each
of the possible analyzer configurations and transparently mapping field
names with "prefixes"|"postfixes" to select the right dynamicField
"type" is a better option.
In short: i don't see any advantages, but i see a lot of room for error.
Yep. Got it.
Paolo
-Hoss