I have an interesting query filter problem.

The application that I am working on contains a directory of user
profiles that acts a lot like a social networking site. Each user
profile is composed of a set of fields (first_name, last_name, bio,
phone_number, etc). Every user profile field is associated with a
privacy setting that optionally hides the data inside of the field
from other users. The privacy settings allow people to show the field
to nobody, only their contacts on the site, all logged in users, or
anyone.

This presents a problem while designing a search interface for the
profiles. All of the filtering options I have seen allow for per
document filtering, but that is not sufficient. Since users have the
option of selectively displaying portions of their profile to
different users, we need to be able to remove individual fields from
specific documents from consideration on a per query basis.

The only idea we have had for resolving this is to construct an
elaborate filter query to restrict the set of documents that the
actual search is performed upon, but it has problems:

We created a series of multivalue fields to store the user profile information

first_name_contact
last_name_contact
etc...

In these fields we stored the privacy preference: anonymous,
logged_in, or the set of contact ids that were allowed access.Then we
created a query filter that was dynamic depending on the identity of
the logged in user, that looked something like this for a query for
the term secret:

(first_name: secret AND (first_name_contact:anonymous OR
first_name_contact:member)) OR
(last_name: secret AND (last_name_contact:anonymous OR
last_name_contact:member)) OR....

This was used as a filter query with a standard query for the search
term secret performed on the resulting filtered set of documents. This
worked great if the search was a single word. However, if the users'
search query contained multiple terms - for instance,  'my secret'
results might be inappropriately revealed. This is because matches
might occur for one term in a public field while the other term might
only exist in fields that are private to the user making the query.
Because documents would be allowed into the filtered set of potential
results in that case, they would be matched by the actual query. By
executing a set of queries, a user could infer the contents of a
protected document field even though they would be unable to view its
contents.

  We have been unable to think of a way to construct a query that
overcomes this issue. Looking briefly at Lucene, there does not seem
to be an obvious way to do the sort of field based filtering that
varies on a per query basis that we need to do, even if we were
willing to dig deeper and write some custom code. Does anyone know of
any tricks that we might use? Is it even possible to do this given how
the low level architecture of Lucene may or may not work?

Any help would be greatly appreciated.

Thanks,

Nathan Woodhull

Reply via email to