It's a bit of a privacy through obscurity measure, unfortunately. The
problem is that American courts do a lousy job of removing social
security numbers from cases that I put on my site. I do anonymization
before sending the cases to Solr, but if you're clever (and the
stopwords weren't in plac
Mike -
Indeed users won't be able to *search* for things removed by the stop filter at
index time (the terms literally aren't in the index then). But be careful with
the stored value. Analysis does not affect stored content.
Are you anonymizing before sending to Solr (if so, why stop-word blo
I've got them configured at index and query time, so sounds like I'm
all set.
I'm doing anonymization of social security numbers, converting them to
xxx-xx-. I don't *think* users can find a way of identifying these
docs if the stopwords-based block works.
Thank you both for the confirma
On Mon, Jan 9, 2012 at 5:03 AM, Michael Lissner
wrote:
> I have a unique use case where I have words in my corpus that users
> shouldn't ever be allowed to search for. My theory is that if I add these to
> the stopwords list, that should do the trick.
Yes, that should work. Are you including the
On Sun, Jan 8, 2012 at 3:33 PM, Michael Lissner <
mliss...@michaeljaylissner.com> wrote:
> I have a unique use case where I have words in my corpus that users
> shouldn't ever be allowed to search for. My theory is that if I add these
> to the stopwords list, that should do the trick.
>
That shou
I have a unique use case where I have words in my corpus that users
shouldn't ever be allowed to search for. My theory is that if I add
these to the stopwords list, that should do the trick.
I'm using the edismax parser and it seems to be working in my dev
environment. Is there any risk to thi