-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Chris,

On 7/24/18 4:46 PM, Chris Hostetter wrote:
> 
> : We are using Solr as a user index, and users have email
> addresses. : : Our old search behavior used a SQL substring match
> for any search : terms entered, and so users are used to being able
> to search for e.g. : "chr" and finding my email address
> ("ch...@christopherschultz.net"). : : By default, Solr doesn't
> perform substring matches, and it might be : difficult to re-train
> users to use *chr* to find email addresses by : substring.
> 
> In the past, were you really doing arbitrary substring matching, or
> just prefix matching?  ie would a search for "sto" match 
> "ch...@christopherschultz.net"

Yes. Searching for "sto" would result in a SQL query with a " WHERE
... LIKE '%sto%'" clause. So it was slow as hell, of course.

> Personally, if you know you have an email field, would suggest
> using a custom tokenizer that splits on "@" and "." (and maybe
> other punctuation characters like "-") and then take your raw user
> input and feed it to the prefix parser (instead of requiring your
> users to add the "*")...
> 
> q={!prefix f=email v=$user_input}&user_input=chr
> 
> ...which would match ch...@gmail.com, f...@chris.com, f...@bar.chr
> etc.
> 
> (this wouldn't help you though if you *really* want arbitrary
> substring matching -- as erick suggested ngrams is pretty much your
> best bet for something like that)
> 
> Bear in mind, you can combine that "forced prefix" query against 
> the (otkenized) email field with other queries that could parse
> your input in other ways...
> 
> user_input=... q=({!prefix f=email v=$user_input} OR {!dismax
> qf="first_name last_name" ..etc.. v=$user_input})
> 
> so if your user input is "chris" you'll get term matches on the 
> first_name field, or the last_name field as well as prefix matches
> on the email field.

The problem is that our users (admins) sometimes need to locate users
by their email address, and people often forget the exact spelling. So
they'll call and say "I can't get in" and we have to search for "chris
schultz" and then "chris" and then it turns out that their email
address was actually sexylove...@yahoo.com, so they often have to try
a bunch of searches before finding the right user record. Having to
search for "sexylover42", a complete-match word, isn't going to work
for their use-case. They need to be able to search for "lover" and
have it work. I think n-grams sounds like the only way to get this
done. I'll have to play-around with it a little bit to see how it behave
s.

Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltYedQACgkQHPApP6U8
pFjzgQ/9GW7kI9Lefnmj7zH8JsqZfW1Y/PrF4YA1RjbliNWRn2dRPz7Q7C2ITO/n
Ys73uUII3qPz8M/H6d0LN57Un96BGAjIhf6WZSiIRAQcvenhGaS/lROciq6I8iN8
hB+1X2GixTG8fbq6Q6Q3jRG22S0GpW+OL2mJcu3wCkQ2dzyBWObWxjF1ag5O4pT+
AP0lqAgpUTsWAeMPPd6dkuStOhXraJQc+1WwwEw36gohwaZwLMftcOl2ohnys/DM
pdyqQEQ6fOldJLBHLU8PyNVHxJA5qZjVTwu3S7zv7w+2N+V8bHOl6y5ir3krOEs0
OIvFX+Do+pbsg+QQ5VY8LDxbPBCjgDiWTpplh3Ym0raaVMoMQ6GfFfsOPF9jYhxS
gb0eMwVTJFWM0xvMaH4xSXLR/Dh6upT/0do1sTr7kKjhIlwc3pfR/vIwqsVer1HJ
Qsj6Pc+ZJckOrPGGIYCZEWZwlS8ONinAx4fh23/C1GltU19kHtRvGTQLzRT+9sus
2stvkD44Lv7zuc49/Y07NISxcUceTlbZHKC5ebzAtKNDS2p+qYLJlbdTZQIofMsb
zmncdP+s5cSYgiCZZS19E2GxP7Yw2rmSn2zsSF6yJMgMy9logJi5HS1UQ54IWvn7
eAzvM+TcV6i+8Hf9kijNcg4/OZPv67DZt6HDcXO2K+a/AMyQElE=
=4Y/b
-----END PGP SIGNATURE-----

Reply via email to