It depends. Okay, the source contains "4 harv. l. rev. 45" .
Do you want a user entered "harv." to ALSO match "harv" (without the
period) in source, and vice versa? Or do you require it NOT match? Or do
you not care?
The default filter analysis chain will index "4 harv. l. rev. 45"
essentially as 4;harv;l;rev;45. A phrase search for
"4 harv. l. rev. 45" will match it, but so will a phrase search for "4
harv l rev 45" , and in fact so will a phrase search for "4 harv. l. rev45"
That could be good, or it could be bad.
The point of the Solr analysis chain is to apply tokenization and
transformation at both index time and query time, so queries will match
source in the way you want. You can customize this analysis chain
however you want, in extreme cases even writing your own analyzers in
Java. If the out of the box default isn't doing what you want, you'll
have to spend some time thinking about how an inverted index like lucene
works, and what you want. You would need to provide a lot more
specifications/details for someone else to figure out what analysis
chain will do what you want, but I bet you can figure it our yourself
after reading up a bit and thinking up a bit.
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
On 8/4/2011 4:30 PM, dhastings wrote:
I have decided to use solr for indexing as well.
the types of searches im doing are professional/academic.
so for example, i need to match:
all over the following exactly from my source data:
"3.56",
"4 harv. l. rev. 45",
"187-532",
"3 llm 56",
"5 unts 8",
"6 u.n.t.s. 78",
"father's obligation"
i seem to keep running into issues getting this to work. the searching is
being done on a text field that is not stored.
--
View this message in context:
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3226611.html
Sent from the Solr - User mailing list archive at Nabble.com.