Re: Searching with access controls

2006-08-11 Thread Chris Hostetter

: I was just reading about the limit on boolean operators in a query (it
: seems to default to 1024 in Solr).
:
: Using option 2 would mean that a user can't be in any more than 1024
: communities (assuming no other boolean logic in the query).

that limit applies to boolean query clauses which are used in scoring, and
can be changed in the solrconfig (see , it's really
justa lucene settting that helps to save you from yourself) ... but for
things like access control you don't care about scoring -- just set
membership, so you can use use and combine Filters which can be cached
independently.

Reading up on Lucene Filters is definitely the next best step to get a
sense of how you can achieve your goal -- just don't get confused beween
Filters used in searching and "TokenFilters" used when analyzing text --
they have regretably similar names.

searching the general Lucene user groups for "access control",
"permissions" and "security" should turn up quite a few suggestions on how
to approach this problem with Lucene indexes in general, all of which can
be done in Solr as well -- many of which can be done efficiently
much easier in Solr becuase Solr takes care of the Query->Filter
conversions for you on the fly when you don't care about scoring, and
because Solr manages (and can autowarm when changes occur) your caches for
you.



-Hoss



Highlighting problem - mutivalue field

2006-08-11 Thread Andrew May

Hi,

I'm afraid I've found another slightly odd thing with Highlighting, in this case in a 
multi-valued field I'm using for author names.


The author names are typically Surname, initials (e.g. May, A.D.), and these are the kind 
of results I'm getting:


authors:Buxton



02

 
  Duncan, W.I.Buxton, N.W.K.
 
 
  Buxton, M.W.N.Pedley, H.M.
 


 
  
.Buxton, N.W.K
  
 
 
  
Buxton, M.W.N
  
 



So in the first case, where the second author name was matched, the final period has 
disappeared, and there's a stray period at the start. In the second case where the first 
author name was matched, the final period is also missing, but there's no extra period at 
the start.


This pattern is the same for other author searches, which suggests that it's picking up 
the last character from the previous field and returning that at the start, and loosing 
the last character.


However, some searches on keywords (also multi-valued) seem to suggest that it's not that 
simple:


keywords:rock (with maxSnippets=100)



02


 
  fracture (rock)porosity 
(rock)permeability (rock)

nuclear magnetic resonance
 
 
  United KingdomCarboniferousclastie 
rocks

coal seamssedimentary rocks
 


 
  
fracture (rock
)porosity (rock
)permeability (rock
  
 
 
  
clastie rocks
sedimentary rocks
  
 



The first document seems to have the same behaviour as the authors searching, but the 
second one where there's no punctuation, there's no missing/moved characters (as far as I 
can tell this seems to be true whether the highlight is at the start/end of the value, or 
in the middle).


Any thoughts? Let me know if I should open a JIRA issue.

Thanks,

Andrew



Re: Highlighting problem - mutivalue field

2006-08-11 Thread Mike Klaas

On 8/11/06, Andrew May <[EMAIL PROTECTED]> wrote:

Hi,

I'm afraid I've found another slightly odd thing with Highlighting, in this 
case in a
multi-valued field I'm using for author names.


Thanks for the report.  This is a known Lucene Highlighter issue; see
http://issues.apache.org/jira/browse/LUCENE-645.

The issue contains a patch which you may want to apply to your local
code, though there are some cases which could cause relatively severe
problems (namely, very with large fields, which you may not care
about).

regards,
-MIke