On 3-May-08, at 10:44 PM, JLIST wrote:
Hello Otis,
Do you mean that if I index the URL as a "text" field, I'll
be able to do * for a given prefix because the text will be
tokenized at the "/" and should suffice for my need?
I'm not sure what your needs are, but I use the following to index urls:
<fieldType name="reverse_domain" class="solr.TextField">
<analyzer>
<tokenizer class="solr.PatternTokenizerFactory" pattern="\."/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
(in which is stored the _reversed domain_. That is, "com.example.www")
I also store the url as a textTight (see example schema). If you want
to do prefix matching on the url, I recommend storing it untokenized
in another field (or minimal tokenization, like lowercasing).
If, like me, you want to restrict document to a certain domain and
subdomains, you have to be careful with your query:
reverse_domain:com.example reverse_domain:com.example.*
If you just do reverse_domain:com.example*, you will also match www.foo-example.com
, which you don't want.
-Mike