It depends. Okay, the source contains "4 harv. l. rev. 45" .

Do you want a user entered "harv." to ALSO match "harv" (without the period) in source, and vice versa? Or do you require it NOT match? Or do you not care?

The default filter analysis chain will index "4 harv. l. rev. 45" essentially as 4;harv;l;rev;45. A phrase search for "4 harv. l. rev. 45" will match it, but so will a phrase search for "4 harv l rev 45" , and in fact so will a phrase search for "4 harv. l. rev45"

That could be good, or it could be bad.

The point of the Solr analysis chain is to apply tokenization and transformation at both index time and query time, so queries will match source in the way you want. You can customize this analysis chain however you want, in extreme cases even writing your own analyzers in Java. If the out of the box default isn't doing what you want, you'll have to spend some time thinking about how an inverted index like lucene works, and what you want. You would need to provide a lot more specifications/details for someone else to figure out what analysis chain will do what you want, but I bet you can figure it our yourself after reading up a bit and thinking up a bit.

See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 On 8/4/2011 4:30 PM, dhastings wrote:
I have decided to use solr for indexing as well.

the types of searches im doing are professional/academic.
so for example, i need to match:
all over the following exactly from my source data:
     "3.56",
      "4 harv. l. rev. 45",
      "187-532",
     "3 llm 56",
      "5 unts 8",
     "6 u.n.t.s. 78",
     "father's obligation"


i seem to keep running into issues getting this to work.  the searching is
being done on a text field that is not stored.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3226611.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to