-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Chris,
On 7/24/18 4:46 PM, Chris Hostetter wrote: > > : We are using Solr as a user index, and users have email > addresses. : : Our old search behavior used a SQL substring match > for any search : terms entered, and so users are used to being able > to search for e.g. : "chr" and finding my email address > ("ch...@christopherschultz.net"). : : By default, Solr doesn't > perform substring matches, and it might be : difficult to re-train > users to use *chr* to find email addresses by : substring. > > In the past, were you really doing arbitrary substring matching, or > just prefix matching? ie would a search for "sto" match > "ch...@christopherschultz.net" Yes. Searching for "sto" would result in a SQL query with a " WHERE ... LIKE '%sto%'" clause. So it was slow as hell, of course. > Personally, if you know you have an email field, would suggest > using a custom tokenizer that splits on "@" and "." (and maybe > other punctuation characters like "-") and then take your raw user > input and feed it to the prefix parser (instead of requiring your > users to add the "*")... > > q={!prefix f=email v=$user_input}&user_input=chr > > ...which would match ch...@gmail.com, f...@chris.com, f...@bar.chr > etc. > > (this wouldn't help you though if you *really* want arbitrary > substring matching -- as erick suggested ngrams is pretty much your > best bet for something like that) > > Bear in mind, you can combine that "forced prefix" query against > the (otkenized) email field with other queries that could parse > your input in other ways... > > user_input=... q=({!prefix f=email v=$user_input} OR {!dismax > qf="first_name last_name" ..etc.. v=$user_input}) > > so if your user input is "chris" you'll get term matches on the > first_name field, or the last_name field as well as prefix matches > on the email field. The problem is that our users (admins) sometimes need to locate users by their email address, and people often forget the exact spelling. So they'll call and say "I can't get in" and we have to search for "chris schultz" and then "chris" and then it turns out that their email address was actually sexylove...@yahoo.com, so they often have to try a bunch of searches before finding the right user record. Having to search for "sexylover42", a complete-match word, isn't going to work for their use-case. They need to be able to search for "lover" and have it work. I think n-grams sounds like the only way to get this done. I'll have to play-around with it a little bit to see how it behave s. Thanks, - -chris -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltYedQACgkQHPApP6U8 pFjzgQ/9GW7kI9Lefnmj7zH8JsqZfW1Y/PrF4YA1RjbliNWRn2dRPz7Q7C2ITO/n Ys73uUII3qPz8M/H6d0LN57Un96BGAjIhf6WZSiIRAQcvenhGaS/lROciq6I8iN8 hB+1X2GixTG8fbq6Q6Q3jRG22S0GpW+OL2mJcu3wCkQ2dzyBWObWxjF1ag5O4pT+ AP0lqAgpUTsWAeMPPd6dkuStOhXraJQc+1WwwEw36gohwaZwLMftcOl2ohnys/DM pdyqQEQ6fOldJLBHLU8PyNVHxJA5qZjVTwu3S7zv7w+2N+V8bHOl6y5ir3krOEs0 OIvFX+Do+pbsg+QQ5VY8LDxbPBCjgDiWTpplh3Ym0raaVMoMQ6GfFfsOPF9jYhxS gb0eMwVTJFWM0xvMaH4xSXLR/Dh6upT/0do1sTr7kKjhIlwc3pfR/vIwqsVer1HJ Qsj6Pc+ZJckOrPGGIYCZEWZwlS8ONinAx4fh23/C1GltU19kHtRvGTQLzRT+9sus 2stvkD44Lv7zuc49/Y07NISxcUceTlbZHKC5ebzAtKNDS2p+qYLJlbdTZQIofMsb zmncdP+s5cSYgiCZZS19E2GxP7Yw2rmSn2zsSF6yJMgMy9logJi5HS1UQ54IWvn7 eAzvM+TcV6i+8Hf9kijNcg4/OZPv67DZt6HDcXO2K+a/AMyQElE= =4Y/b -----END PGP SIGNATURE-----