1. the standard way to do this is to use ngrams. The index is larger, but it gives you much quicker searches than trying to to pre-and-postfix wildcards
2. use a fieldType with KeywordTokenizerFactory + (probably) LowerCaseFilterFactory + TrimFilterFactory. And, in your case, NGramTokenizerFactory (I'd start with bigrams, i.e. min=2 and max=2) 3. no. The destination field has it's own field type and that's how the input stream is analyzed. There's no good way to say "don't analyze input from field X when copied to field Y". Probably best not to copy it there at all. Best, Erick On Tue, Jul 24, 2018 at 9:05 AM, Christopher Schultz <ch...@christopherschultz.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > All, > > We are using Solr as a user index, and users have email addresses. > > Our old search behavior used a SQL substring match for any search > terms entered, and so users are used to being able to search for e.g. > "chr" and finding my email address ("ch...@christopherschultz.net"). > > By default, Solr doesn't perform substring matches, and it might be > difficult to re-train users to use *chr* to find email addresses by > substring. > > Is there a way to define the field such that searches are always done > as a substring? While we are at it, I'd like to define the field to > avoid tokenization because it's never useful to search for > "m...@gmail.com" and find a few million search results because many > users use @gmail.com email addresses. > > Here is the current field definition from our create-schema script: > > "add-field":{ > "name":"email_address", > "type":"text_general", > "multiValued" : false, > "stored":true }, > > Later, we add the email address to the "all" field (which aggregates > everything from all useful fields into the field used as the > default-field): > > "add-copy-field":{ > "source":"email_address", > "dest":"all" }, > > Is there a way to define these fields such that: > > 1. The email_address field is always searched using a substring > 2. The email_address field is not tokenized > 3. The copied-email-address is not tokenized in the "all" field > > Thanks, > - -chris > -----BEGIN PGP SIGNATURE----- > Comment: GPGTools - http://gpgtools.org > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltXTkcACgkQHPApP6U8 > pFh1aRAAilB2nVGycjVyY2taAJv6x2ss33UcVL6xBATRUkHTCbyAr5LFN3FWmcOR > iCbZdxCU5LSa0x0clMTlRjR0U8HF+l2J4ArMQYiveA9mXc6fZz+ovAYrBqDguE6b > UZnbOcR3pDF+P5h3ch9aMbdkHAhsVN7AX5yiSIS0fqKn6irNrI7TkvRmiZqNzVFx > sDIPChL9meMfh8rz7vVmu5IjaImnQZ+2tmc+QruFsbgKGXJMR4n+d0CjacIfd5vp > hoZDpg9qcasnYau925xqlj4BBrPS1XiYOqvdgCxnO1l6qqVfBK+lVsPaP5FOtXZP > 7Fe/unkzuK8j1Y0mZNpcZtMYYhsMHboT1Kegrn1mUZp9S6iL1NzbqzmsbDQyNqlg > 8HghvGG7ROj/hkqLPOlGy6wp72GFQYrHuIEzdyDI9wHOaP+cdliCdkkmqIAQJilR > ketzTVhEbOHGEHGa9obHg0NPqmYwP4DDmSOZ42z5UPr2KqaqpeXsqcB2CV7nnvB3 > 6hvKuHVWIrHE1P1k1XFwMF3Vy+YbeojFbvKLH+eNKXXOXu8PEn2MaZU5v12WNWEr > 0l6K16VnFf436WqH/fSa1DZUfuphA4z0qg/oHqcUcfhVFjc+U1wSZVvdvpG+rSf1 > n3NS9pqFAWruWq7V0ID5cV0PVRwp9g6pgs4XJAhKYEkiXVO8u7Y= > =wAsa > -----END PGP SIGNATURE-----