I have an interesting query filter problem. The application that I am working on contains a directory of user profiles that acts a lot like a social networking site. Each user profile is composed of a set of fields (first_name, last_name, bio, phone_number, etc). Every user profile field is associated with a privacy setting that optionally hides the data inside of the field from other users. The privacy settings allow people to show the field to nobody, only their contacts on the site, all logged in users, or anyone.
This presents a problem while designing a search interface for the profiles. All of the filtering options I have seen allow for per document filtering, but that is not sufficient. Since users have the option of selectively displaying portions of their profile to different users, we need to be able to remove individual fields from specific documents from consideration on a per query basis. The only idea we have had for resolving this is to construct an elaborate filter query to restrict the set of documents that the actual search is performed upon, but it has problems: We created a series of multivalue fields to store the user profile information first_name_contact last_name_contact etc... In these fields we stored the privacy preference: anonymous, logged_in, or the set of contact ids that were allowed access.Then we created a query filter that was dynamic depending on the identity of the logged in user, that looked something like this for a query for the term secret: (first_name: secret AND (first_name_contact:anonymous OR first_name_contact:member)) OR (last_name: secret AND (last_name_contact:anonymous OR last_name_contact:member)) OR.... This was used as a filter query with a standard query for the search term secret performed on the resulting filtered set of documents. This worked great if the search was a single word. However, if the users' search query contained multiple terms - for instance, 'my secret' results might be inappropriately revealed. This is because matches might occur for one term in a public field while the other term might only exist in fields that are private to the user making the query. Because documents would be allowed into the filtered set of potential results in that case, they would be matched by the actual query. By executing a set of queries, a user could infer the contents of a protected document field even though they would be unable to view its contents. We have been unable to think of a way to construct a query that overcomes this issue. Looking briefly at Lucene, there does not seem to be an obvious way to do the sort of field based filtering that varies on a per query basis that we need to do, even if we were willing to dig deeper and write some custom code. Does anyone know of any tricks that we might use? Is it even possible to do this given how the low level architecture of Lucene may or may not work? Any help would be greatly appreciated. Thanks, Nathan Woodhull