Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
If you reindex, I’ve become a big fan of adding a date field with an index timestamp. That will allow you to check whether everything has been reindexed. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 28, 2020, at 2:11 PM, Jörn Franke wrot

Re: Searching for credit card numbers

2020-07-28 Thread Jörn Franke
A regex search at query time would leave room for attacks (eg a regex can easily be designed to block the Solr server forever). If the field is store you can also try to use a cursor to go through all entries using a cursor and reindex the doc based on the field: https://lucene.apache.org/solr/

Re: Searching for credit card numbers

2020-07-28 Thread lstusr 5u93n4
Possible... yes. Agreed that this is the right approach. But if we already have a big index that we're searching through? Any way to "hack it"? On Tue, 28 Jul 2020 at 14:55, Walter Underwood wrote: > I’d do that at index time. Add an update request processor script that > does the regex and adds

Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
I’d do that at index time. Add an update request processor script that does the regex and adds a field has_credit_card_number:true. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4 wrote: > > Let's say I have

Searching for credit card numbers

2020-07-28 Thread lstusr 5u93n4
Let's say I have a text field that's been indexed with the standard tokenizer, and I want to match the docs that have credit card numbers in them (this is for altruistic purposes, not nefarious ones!). What's the best way to build a search that will do this? Searching for " " see