On 3-May-08, at 10:44 PM, JLIST wrote:

Hello Otis,

Do you mean that if I index the URL as a "text" field, I'll
be able to do * for a given prefix because the text will be
tokenized at the "/" and should suffice for my need?

I'm not sure what your needs are, but I use the following to index urls:

    <fieldType name="reverse_domain" class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.PatternTokenizerFactory" pattern="\."/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

(in which is stored the _reversed domain_.  That is, "com.example.www")

I also store the url as a textTight (see example schema). If you want to do prefix matching on the url, I recommend storing it untokenized in another field (or minimal tokenization, like lowercasing).

If, like me, you want to restrict document to a certain domain and subdomains, you have to be careful with your query:

reverse_domain:com.example reverse_domain:com.example.*

If you just do reverse_domain:com.example*, you will also match www.foo-example.com , which you don't want.

-Mike

Reply via email to