Modelling Access Control
Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl: AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl: OR acl: OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Re: A bug in ComplexPhraseQuery ?
iorixxx wrote: > >> > class="org.apache.solr.search.ComplexPhraseQParserPlugin"> >> > name="inOrder">false >> >> > > I added this change to SOLR-1604, can you test it give us feedback? > > May thanks. I'll test this quite soon and let you know. J-Michel -- View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1757145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: xpath processing
> processor="FileListEntityProcessor" fileName=".*xml" recursive="true" Shouldn't this be fileName="*.xml"? Ben On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote: > > > > > > processor="FileListEntityProcessor" fileName=".*xml" recursive="true" > baseDir="C:\data\sample_records\mods\starr"> > url="${f.fileAbsolutePath}" stream="false" forEach="/mods" > transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> > > > > > > > > > > /> > > > > > > Quoting Ken Stanley : > >> Parinita, >> >> In its simplest form, what does your entity definition for DIH look like; >> also, what does one record from your xml look like? We need more information >> before we can really be of any help. :) >> >> - Ken >> >> It looked like something resembling white marble, which was >> probably what it was: something resembling white marble. >>-- Douglas Adams, "The Hitchhikers Guide to the Galaxy" >> >> >> On Fri, Oct 22, 2010 at 8:00 PM, wrote: >> >>> Quoting pghorp...@ucla.edu: >>> Can someone help me please? >>> >>> I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath="/mods/name/namepa...@type = 'date']" I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita >>> >>> >> > >
Re: Spatial
On Oct 20, 2010, at 12:14 PM, Pradeep Singh wrote: > Thanks for your response Grant. > > I already have the bounding box based implementation in place. And on a > document base of around 350K it is super fast. > > What about a document base of millions of documents? While a tier based > approach will narrow down the document space significantly this concern > might be misplaced because there are other numeric range queries I am going > to run anyway which don't have anything to do with spatial query. But the > keyword here is numeric range query based on NumericField, which is going to > be significantly faster than regular number based queries. I see that the > dynamic field type _latLon is of type double and not tdouble by default. Can > I have your input about that decision? It's just an example. There shouldn't be any problem with using tdouble (or tfloat if you don't need the precision) > > -Pradeep > > On Tue, Oct 19, 2010 at 6:10 PM, Grant Ingersoll wrote: > >> >> On Oct 19, 2010, at 6:23 PM, Pradeep Singh wrote: >> >>> https://issues.apache.org/jira/browse/LUCENE-2519 >>> >>> If I change my code as per 2519 >>> >>> to have this - >>> >>> public double[] coords(double latitude, double longitude) { >>> double rlat = Math.toRadians(latitude); >>> double rlong = Math.toRadians(longitude); >>> double nlat = rlong * Math.cos(rlat); >>> return new double[]{nlat, rlong}; >>> >>> } >>> >>> >>> return this - >>> >>> x = (gamma - gamma[0]) cos(phi) >>> y = phi >>> >>> would it make it give correct results? Correct projections, tier ids? >> >> I'm not sure. I have a lot of doubt around that code. After making that >> correction, I spent several days trying to get the tests to pass and >> ultimately gave up. Does that mean it is wrong? I don't know. I just >> don't have enough confidence to recommend it given that the tests I were >> asking it to do I could verify through other tools. Personally, I would >> recommend seeing if one of the non-tier based approaches suffices for your >> situation and use that. >> >> -Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
Re: Import From MYSQL database
what i know is to define you field in schema.xml file and build database_conf.xml file which contain identification for your database finally you should define dataimporthandler in solrconfig.xml file i put sample from what you should done in first post in this topic you can check it, if i know additional information i will tell you good luck -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1756744.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Import From MYSQL database
i found this files but i can't found any useful info. inside it, what i found is GET command in http request -- View this message in context: http://lucene.472066.n3.nabble.com/Import-From-MYSQL-database-tp1738753p1756778.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index long words with StandardTokenizerFactory?
Here are all the files: http://rghost.net/3016862 1) StandardAnalyzer.java, StandardTokenizer.java - patched files from lucene-2.9.3 2) I patch these files and build lucene by typing "ant" 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my lucene-core-2.9.3-dev.jar that I'd just compiled 4) than I do "ant compile" and "ant dist" in solr folder 5) after that I recompile solr/example/webapps/solr.war with my new solr and lucene-core jars 6) I put my schema.xml in solr/example/solr/conf/ 7) then I do "java -jar start.jar" in solr/example 8) index big_post.xml 9) trying to find this document by "curl http://localhost:8983/solr/select?q=body:big*"; (big_post.xml contains a long word biga...) 10) solr returns nothing On 23 October 2010 02:43, Steven A Rowe wrote: > Hi Sergey, > > What does your ~34kb field value look like? Does StandardTokenizer think > it's just one token? > > What doesn't work? What happens? > > Steve > >> -Original Message- >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> Sent: Friday, October 22, 2010 3:18 PM >> To: solr-user@lucene.apache.org >> Subject: Re: How to index long words with StandardTokenizerFactory? >> >> I'm using Solr 1.4.1. Now I'm successed with replacing lucene-core jar >> but maxTokenValue seems to be used in very strange way. Currenty for >> me it's set to 1024*1024, but I couldn't index a field with just size >> of ~34kb. I understand that it's a little weird to index such a big >> data, but I just want to know it doesn't work >> >> On 22 October 2010 20:36, Steven A Rowe wrote: >> > Hi Sergey, >> > >> > I've opened an issue to add a maxTokenLength param to the >> StandardTokenizerFactory configuration: >> > >> > https://issues.apache.org/jira/browse/SOLR-2188 >> > >> > I'll work on it this weekend. >> > >> > Are you using Solr 1.4.1? I ask because of your mention of Lucene >> 2.9.3. I'm not sure there will ever be a Solr 1.4.2 release. I plan on >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. >> > >> > I'm not sure why you didn't get the results you wanted with your Lucene >> hack - is it possible you have other Lucene jars in your Solr classpath? >> > >> > Steve >> > >> >> -Original Message- >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> >> Sent: Friday, October 22, 2010 12:08 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: How to index long words with StandardTokenizerFactory? >> >> >> >> I'm trying to force solr to index words which length is more than 255 >> >> symbols (this constant is DEFAULT_MAX_TOKEN_LENGTH in lucene >> >> StandardAnalyzer.java) using StandardTokenizerFactory as 'filter' tag >> >> in schema configuration XML. Specifying the maxTokenLength attribute >> >> won't work. >> >> >> >> I'd tried to make the dirty hack: I downloaded lucene-core-2.9.3 src >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH to 100, built it to jar >> >> and replaced original lucene-core jar in solr /lib. But seems like >> >> that it had bring no effect.
Re: Solr Javascript+JSON not optimized for SEO
Unfortunately its not online yet, but is there anything I can clarify in more detail? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Javascript-JSON-not-optimized-for-SEO-tp1751641p1758054.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to index long words with StandardTokenizerFactory?
Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under apache-solr-1.4.1\example\work? --- On Sat, 10/23/10, Sergey Bartunov wrote: > From: Sergey Bartunov > Subject: Re: How to index long words with StandardTokenizerFactory? > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 3:56 PM > Here are all the files: http://rghost.net/3016862 > > 1) StandardAnalyzer.java, StandardTokenizer.java - patched > files from > lucene-2.9.3 > 2) I patch these files and build lucene by typing "ant" > 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my > lucene-core-2.9.3-dev.jar that I'd just compiled > 4) than I do "ant compile" and "ant dist" in solr folder > 5) after that I recompile solr/example/webapps/solr.war > with my new > solr and lucene-core jars > 6) I put my schema.xml in solr/example/solr/conf/ > 7) then I do "java -jar start.jar" in solr/example > 8) index big_post.xml > 9) trying to find this document by "curl > http://localhost:8983/solr/select?q=body:big*"; > (big_post.xml contains > a long word biga...) > 10) solr returns nothing > > On 23 October 2010 02:43, Steven A Rowe > wrote: > > Hi Sergey, > > > > What does your ~34kb field value look like? Does > StandardTokenizer think it's just one token? > > > > What doesn't work? What happens? > > > > Steve > > > >> -Original Message- > >> From: Sergey Bartunov [mailto:sbos@gmail.com] > >> Sent: Friday, October 22, 2010 3:18 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: How to index long words with > StandardTokenizerFactory? > >> > >> I'm using Solr 1.4.1. Now I'm successed with > replacing lucene-core jar > >> but maxTokenValue seems to be used in very strange > way. Currenty for > >> me it's set to 1024*1024, but I couldn't index a > field with just size > >> of ~34kb. I understand that it's a little weird to > index such a big > >> data, but I just want to know it doesn't work > >> > >> On 22 October 2010 20:36, Steven A Rowe > wrote: > >> > Hi Sergey, > >> > > >> > I've opened an issue to add a maxTokenLength > param to the > >> StandardTokenizerFactory configuration: > >> > > >> > https://issues.apache.org/jira/browse/SOLR-2188 > >> > > >> > I'll work on it this weekend. > >> > > >> > Are you using Solr 1.4.1? I ask because of > your mention of Lucene > >> 2.9.3. I'm not sure there will ever be a Solr > 1.4.2 release. I plan on > >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. > >> > > >> > I'm not sure why you didn't get the results > you wanted with your Lucene > >> hack - is it possible you have other Lucene jars > in your Solr classpath? > >> > > >> > Steve > >> > > >> >> -Original Message- > >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] > >> >> Sent: Friday, October 22, 2010 12:08 PM > >> >> To: solr-user@lucene.apache.org > >> >> Subject: How to index long words with > StandardTokenizerFactory? > >> >> > >> >> I'm trying to force solr to index words > which length is more than 255 > >> >> symbols (this constant is > DEFAULT_MAX_TOKEN_LENGTH in lucene > >> >> StandardAnalyzer.java) using > StandardTokenizerFactory as 'filter' tag > >> >> in schema configuration XML. Specifying > the maxTokenLength attribute > >> >> won't work. > >> >> > >> >> I'd tried to make the dirty hack: I > downloaded lucene-core-2.9.3 src > >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH > to 100, built it to jar > >> >> and replaced original lucene-core jar in > solr /lib. But seems like > >> >> that it had bring no effect. >
Re: Modelling Access Control
Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey wrote: > Hi > > My domain model is made of users that have access to projects which > are composed of items. I'm hoping to use Solr and would like to make > sure that searches only return results for items that users have > access to. > > I've looked over some of the older posts on this mailing list about > access control and saw a suggestion along the lines of > acl: AND (actual query). > > While this obviously works, there are a couple of niggles. Every item > must have a list of valid user ids (typically less than 100 in my > case). Every time a collaborator is added to or removed from a > project, I need to update every item in that project. This will > typically be fewer than 1000 items, so I guess is no big deal. > > I wondered if the following might be a reasonable alternative, > assuming the number of projects to which a user has access is lower > than a certain bound. > (acl: OR acl: OR ... ) AND (actual query) > > When the numbers are small - e.g. each user has access to ~20 projects > and each project has ~20 collaborators - is one approach preferable > over another? And when outliers exist - e.g. a project with 2000 > collaborators, or a user with access to 2000 projects - is one > approach more liable to fail than the other? > > Many thanks > > Paul > -- °O° "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: How to index long words with StandardTokenizerFactory?
Yes. I did. Won't help. On 23 October 2010 17:45, Ahmet Arslan wrote: > Did you delete the folder Jetty_0_0_0_0_8983_solr.war_** under > apache-solr-1.4.1\example\work? > > --- On Sat, 10/23/10, Sergey Bartunov wrote: > >> From: Sergey Bartunov >> Subject: Re: How to index long words with StandardTokenizerFactory? >> To: solr-user@lucene.apache.org >> Date: Saturday, October 23, 2010, 3:56 PM >> Here are all the files: http://rghost.net/3016862 >> >> 1) StandardAnalyzer.java, StandardTokenizer.java - patched >> files from >> lucene-2.9.3 >> 2) I patch these files and build lucene by typing "ant" >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by my >> lucene-core-2.9.3-dev.jar that I'd just compiled >> 4) than I do "ant compile" and "ant dist" in solr folder >> 5) after that I recompile solr/example/webapps/solr.war >> with my new >> solr and lucene-core jars >> 6) I put my schema.xml in solr/example/solr/conf/ >> 7) then I do "java -jar start.jar" in solr/example >> 8) index big_post.xml >> 9) trying to find this document by "curl >> http://localhost:8983/solr/select?q=body:big*"; >> (big_post.xml contains >> a long word biga...) >> 10) solr returns nothing >> >> On 23 October 2010 02:43, Steven A Rowe >> wrote: >> > Hi Sergey, >> > >> > What does your ~34kb field value look like? Does >> StandardTokenizer think it's just one token? >> > >> > What doesn't work? What happens? >> > >> > Steve >> > >> >> -Original Message- >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> >> Sent: Friday, October 22, 2010 3:18 PM >> >> To: solr-user@lucene.apache.org >> >> Subject: Re: How to index long words with >> StandardTokenizerFactory? >> >> >> >> I'm using Solr 1.4.1. Now I'm successed with >> replacing lucene-core jar >> >> but maxTokenValue seems to be used in very strange >> way. Currenty for >> >> me it's set to 1024*1024, but I couldn't index a >> field with just size >> >> of ~34kb. I understand that it's a little weird to >> index such a big >> >> data, but I just want to know it doesn't work >> >> >> >> On 22 October 2010 20:36, Steven A Rowe >> wrote: >> >> > Hi Sergey, >> >> > >> >> > I've opened an issue to add a maxTokenLength >> param to the >> >> StandardTokenizerFactory configuration: >> >> > >> >> > https://issues.apache.org/jira/browse/SOLR-2188 >> >> > >> >> > I'll work on it this weekend. >> >> > >> >> > Are you using Solr 1.4.1? I ask because of >> your mention of Lucene >> >> 2.9.3. I'm not sure there will ever be a Solr >> 1.4.2 release. I plan on >> >> targeting Solr 3.1 and 4.0 for the SOLR-2188 fix. >> >> > >> >> > I'm not sure why you didn't get the results >> you wanted with your Lucene >> >> hack - is it possible you have other Lucene jars >> in your Solr classpath? >> >> > >> >> > Steve >> >> > >> >> >> -Original Message- >> >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> >> >> Sent: Friday, October 22, 2010 12:08 PM >> >> >> To: solr-user@lucene.apache.org >> >> >> Subject: How to index long words with >> StandardTokenizerFactory? >> >> >> >> >> >> I'm trying to force solr to index words >> which length is more than 255 >> >> >> symbols (this constant is >> DEFAULT_MAX_TOKEN_LENGTH in lucene >> >> >> StandardAnalyzer.java) using >> StandardTokenizerFactory as 'filter' tag >> >> >> in schema configuration XML. Specifying >> the maxTokenLength attribute >> >> >> won't work. >> >> >> >> >> >> I'd tried to make the dirty hack: I >> downloaded lucene-core-2.9.3 src >> >> >> and changed the DEFAULT_MAX_TOKEN_LENGTH >> to 100, built it to jar >> >> >> and replaced original lucene-core jar in >> solr /lib. But seems like >> >> >> that it had bring no effect. >> > > > >
Re: How to index long words with StandardTokenizerFactory?
I think you should replace your new lucene-core-2.9.3-dev.jar in \apache-solr-1.4.1\lib and then create a new solr.war under \apache-solr-1.4.1\dist. And copy this new solr.war to solr/example/webapps/solr.war --- On Sat, 10/23/10, Sergey Bartunov wrote: > From: Sergey Bartunov > Subject: Re: How to index long words with StandardTokenizerFactory? > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 5:45 PM > Yes. I did. Won't help. > > On 23 October 2010 17:45, Ahmet Arslan > wrote: > > Did you delete the folder > Jetty_0_0_0_0_8983_solr.war_** under > apache-solr-1.4.1\example\work? > > > > --- On Sat, 10/23/10, Sergey Bartunov > wrote: > > > >> From: Sergey Bartunov > >> Subject: Re: How to index long words with > StandardTokenizerFactory? > >> To: solr-user@lucene.apache.org > >> Date: Saturday, October 23, 2010, 3:56 PM > >> Here are all the files: http://rghost.net/3016862 > >> > >> 1) StandardAnalyzer.java, StandardTokenizer.java - > patched > >> files from > >> lucene-2.9.3 > >> 2) I patch these files and build lucene by typing > "ant" > >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by > my > >> lucene-core-2.9.3-dev.jar that I'd just compiled > >> 4) than I do "ant compile" and "ant dist" in solr > folder > >> 5) after that I recompile > solr/example/webapps/solr.war > >> with my new > >> solr and lucene-core jars > >> 6) I put my schema.xml in solr/example/solr/conf/ > >> 7) then I do "java -jar start.jar" in > solr/example > >> 8) index big_post.xml > >> 9) trying to find this document by "curl > >> http://localhost:8983/solr/select?q=body:big*"; > >> (big_post.xml contains > >> a long word biga...) > >> 10) solr returns nothing > >> > >> On 23 October 2010 02:43, Steven A Rowe > >> wrote: > >> > Hi Sergey, > >> > > >> > What does your ~34kb field value look like? > Does > >> StandardTokenizer think it's just one token? > >> > > >> > What doesn't work? What happens? > >> > > >> > Steve > >> > > >> >> -Original Message- > >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] > >> >> Sent: Friday, October 22, 2010 3:18 PM > >> >> To: solr-user@lucene.apache.org > >> >> Subject: Re: How to index long words > with > >> StandardTokenizerFactory? > >> >> > >> >> I'm using Solr 1.4.1. Now I'm successed > with > >> replacing lucene-core jar > >> >> but maxTokenValue seems to be used in > very strange > >> way. Currenty for > >> >> me it's set to 1024*1024, but I couldn't > index a > >> field with just size > >> >> of ~34kb. I understand that it's a little > weird to > >> index such a big > >> >> data, but I just want to know it doesn't > work > >> >> > >> >> On 22 October 2010 20:36, Steven A Rowe > > >> wrote: > >> >> > Hi Sergey, > >> >> > > >> >> > I've opened an issue to add a > maxTokenLength > >> param to the > >> >> StandardTokenizerFactory configuration: > >> >> > > >> >> > https://issues.apache.org/jira/browse/SOLR-2188 > >> >> > > >> >> > I'll work on it this weekend. > >> >> > > >> >> > Are you using Solr 1.4.1? I ask > because of > >> your mention of Lucene > >> >> 2.9.3. I'm not sure there will ever be > a Solr > >> 1.4.2 release. I plan on > >> >> targeting Solr 3.1 and 4.0 for the > SOLR-2188 fix. > >> >> > > >> >> > I'm not sure why you didn't get the > results > >> you wanted with your Lucene > >> >> hack - is it possible you have other > Lucene jars > >> in your Solr classpath? > >> >> > > >> >> > Steve > >> >> > > >> >> >> -Original Message- > >> >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] > >> >> >> Sent: Friday, October 22, 2010 > 12:08 PM > >> >> >> To: solr-user@lucene.apache.org > >> >> >> Subject: How to index long words > with > >> StandardTokenizerFactory? > >> >> >> > >> >> >> I'm trying to force solr to > index words > >> which length is more than 255 > >> >> >> symbols (this constant is > >> DEFAULT_MAX_TOKEN_LENGTH in lucene > >> >> >> StandardAnalyzer.java) using > >> StandardTokenizerFactory as 'filter' tag > >> >> >> in schema configuration XML. > Specifying > >> the maxTokenLength attribute > >> >> >> won't work. > >> >> >> > >> >> >> I'd tried to make the dirty > hack: I > >> downloaded lucene-core-2.9.3 src > >> >> >> and changed the > DEFAULT_MAX_TOKEN_LENGTH > >> to 100, built it to jar > >> >> >> and replaced original > lucene-core jar in > >> solr /lib. But seems like > >> >> >> that it had bring no effect. > >> > > > > > > > > >
Re: How to index long words with StandardTokenizerFactory?
On Fri, Oct 22, 2010 at 12:07 PM, Sergey Bartunov wrote: > I'm trying to force solr to index words which length is more than 255 If the field is not a text field, the Solr's default analyzer is used, which currently limits the token to 256 bytes. Out of curiosity, what's your usecase that you really need a single 34KB token? -Yonik http://www.lucidimagination.com
Re: How to index long words with StandardTokenizerFactory?
Look at the scheme.xml that I provided. I use my own "text_block" type which is derived from "TextField". And I force using StandardTokenizerFactory using tokenizer tag. If I use StrField type there are no problems with big data indexing. The problem is in the tokenizer. On 23 October 2010 18:55, Yonik Seeley wrote: > On Fri, Oct 22, 2010 at 12:07 PM, Sergey Bartunov wrote: >> I'm trying to force solr to index words which length is more than 255 > > If the field is not a text field, the Solr's default analyzer is used, > which currently limits the token to 256 bytes. > Out of curiosity, what's your usecase that you really need a single 34KB > token? > > -Yonik > http://www.lucidimagination.com >
Re: How to index long words with StandardTokenizerFactory?
This is exactly what I did. Look: >> >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by >> my >> >> lucene-core-2.9.3-dev.jar that I'd just compiled >> >> 4) than I do "ant compile" and "ant dist" in solr >> folder >> >> 5) after that I recompile >> solr/example/webapps/solr.war On 23 October 2010 18:53, Ahmet Arslan wrote: > I think you should replace your new lucene-core-2.9.3-dev.jar in > \apache-solr-1.4.1\lib and then create a new solr.war under > \apache-solr-1.4.1\dist. And copy this new solr.war to > solr/example/webapps/solr.war > > --- On Sat, 10/23/10, Sergey Bartunov wrote: > >> From: Sergey Bartunov >> Subject: Re: How to index long words with StandardTokenizerFactory? >> To: solr-user@lucene.apache.org >> Date: Saturday, October 23, 2010, 5:45 PM >> Yes. I did. Won't help. >> >> On 23 October 2010 17:45, Ahmet Arslan >> wrote: >> > Did you delete the folder >> Jetty_0_0_0_0_8983_solr.war_** under >> apache-solr-1.4.1\example\work? >> > >> > --- On Sat, 10/23/10, Sergey Bartunov >> wrote: >> > >> >> From: Sergey Bartunov >> >> Subject: Re: How to index long words with >> StandardTokenizerFactory? >> >> To: solr-user@lucene.apache.org >> >> Date: Saturday, October 23, 2010, 3:56 PM >> >> Here are all the files: http://rghost.net/3016862 >> >> >> >> 1) StandardAnalyzer.java, StandardTokenizer.java - >> patched >> >> files from >> >> lucene-2.9.3 >> >> 2) I patch these files and build lucene by typing >> "ant" >> >> 3) I replace lucene-core-2.9.3.jar in solr/lib/ by >> my >> >> lucene-core-2.9.3-dev.jar that I'd just compiled >> >> 4) than I do "ant compile" and "ant dist" in solr >> folder >> >> 5) after that I recompile >> solr/example/webapps/solr.war >> >> with my new >> >> solr and lucene-core jars >> >> 6) I put my schema.xml in solr/example/solr/conf/ >> >> 7) then I do "java -jar start.jar" in >> solr/example >> >> 8) index big_post.xml >> >> 9) trying to find this document by "curl >> >> http://localhost:8983/solr/select?q=body:big*"; >> >> (big_post.xml contains >> >> a long word biga...) >> >> 10) solr returns nothing >> >> >> >> On 23 October 2010 02:43, Steven A Rowe >> >> wrote: >> >> > Hi Sergey, >> >> > >> >> > What does your ~34kb field value look like? >> Does >> >> StandardTokenizer think it's just one token? >> >> > >> >> > What doesn't work? What happens? >> >> > >> >> > Steve >> >> > >> >> >> -Original Message- >> >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> >> >> Sent: Friday, October 22, 2010 3:18 PM >> >> >> To: solr-user@lucene.apache.org >> >> >> Subject: Re: How to index long words >> with >> >> StandardTokenizerFactory? >> >> >> >> >> >> I'm using Solr 1.4.1. Now I'm successed >> with >> >> replacing lucene-core jar >> >> >> but maxTokenValue seems to be used in >> very strange >> >> way. Currenty for >> >> >> me it's set to 1024*1024, but I couldn't >> index a >> >> field with just size >> >> >> of ~34kb. I understand that it's a little >> weird to >> >> index such a big >> >> >> data, but I just want to know it doesn't >> work >> >> >> >> >> >> On 22 October 2010 20:36, Steven A Rowe >> >> >> wrote: >> >> >> > Hi Sergey, >> >> >> > >> >> >> > I've opened an issue to add a >> maxTokenLength >> >> param to the >> >> >> StandardTokenizerFactory configuration: >> >> >> > >> >> >> > https://issues.apache.org/jira/browse/SOLR-2188 >> >> >> > >> >> >> > I'll work on it this weekend. >> >> >> > >> >> >> > Are you using Solr 1.4.1? I ask >> because of >> >> your mention of Lucene >> >> >> 2.9.3. I'm not sure there will ever be >> a Solr >> >> 1.4.2 release. I plan on >> >> >> targeting Solr 3.1 and 4.0 for the >> SOLR-2188 fix. >> >> >> > >> >> >> > I'm not sure why you didn't get the >> results >> >> you wanted with your Lucene >> >> >> hack - is it possible you have other >> Lucene jars >> >> in your Solr classpath? >> >> >> > >> >> >> > Steve >> >> >> > >> >> >> >> -Original Message- >> >> >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] >> >> >> >> Sent: Friday, October 22, 2010 >> 12:08 PM >> >> >> >> To: solr-user@lucene.apache.org >> >> >> >> Subject: How to index long words >> with >> >> StandardTokenizerFactory? >> >> >> >> >> >> >> >> I'm trying to force solr to >> index words >> >> which length is more than 255 >> >> >> >> symbols (this constant is >> >> DEFAULT_MAX_TOKEN_LENGTH in lucene >> >> >> >> StandardAnalyzer.java) using >> >> StandardTokenizerFactory as 'filter' tag >> >> >> >> in schema configuration XML. >> Specifying >> >> the maxTokenLength attribute >> >> >> >> won't work. >> >> >> >> >> >> >> >> I'd tried to make the dirty >> hack: I >> >> downloaded lucene-core-2.9.3 src >> >> >> >> and changed the >> DEFAULT_MAX_TOKEN_LENGTH >> >> to 100, built it to jar >> >> >> >> and replaced original >> lucene-core jar in >> >> solr /lib. But seems like >> >> >> >> that it had bring no effect. >> >> >> > >> > >> > >> > >> > > > >
Re: xpath processing
On Fri, Oct 22, 2010 at 11:52 PM, wrote: > > > > > > processor="FileListEntityProcessor" fileName=".*xml" recursive="true" > baseDir="C:\data\sample_records\mods\starr"> > url="${f.fileAbsolutePath}" stream="false" forEach="/mods" > transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> > > > > > > > > > > /> > > > > The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: Modelling Access Control
Two things will lessen the solr admininstrative load : 1/ Follow examples of databases and *nix OSs. Give each user their own group, or set up groups that don't have regular users as OWNERS, but can have users assigned to the group to give them particular permissions. I.E. Roles, like publishers, reviewers, friends, etc. 2/ Put your ACL outside of Solr, using your server-side/command line language's object oriented properties. Force all searches to come from a single location in code (not sure how to do that), and make the piece of code check authentication and authorization. This is what my research shows how others do it, and how I plan to do it. ANY insight others have on this, I really want to hear. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Paul Carey wrote: > From: Paul Carey > Subject: Modelling Access Control > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 1:03 AM > Hi > > My domain model is made of users that have access to > projects which > are composed of items. I'm hoping to use Solr and would > like to make > sure that searches only return results for items that users > have > access to. > > I've looked over some of the older posts on this mailing > list about > access control and saw a suggestion along the lines of > acl: AND (actual query). > > While this obviously works, there are a couple of niggles. > Every item > must have a list of valid user ids (typically less than 100 > in my > case). Every time a collaborator is added to or removed > from a > project, I need to update every item in that project. This > will > typically be fewer than 1000 items, so I guess is no big > deal. > > I wondered if the following might be a reasonable > alternative, > assuming the number of projects to which a user has access > is lower > than a certain bound. > (acl: OR acl: OR ... ) > AND (actual query) > > When the numbers are small - e.g. each user has access to > ~20 projects > and each project has ~20 collaborators - is one approach > preferable > over another? And when outliers exist - e.g. a project with > 2000 > collaborators, or a user with access to 2000 projects - is > one > approach more liable to fail than the other? > > Many thanks > > Paul >
Re: Modelling Access Control
why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Israel Ekpo wrote: > From: Israel Ekpo > Subject: Re: Modelling Access Control > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 7:01 AM > Hi Paul, > > Regardless of how you implement it, I would recommend you > use filter queries > for the permissions check rather than making it part of the > main query. > > On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey > wrote: > > > Hi > > > > My domain model is made of users that have access to > projects which > > are composed of items. I'm hoping to use Solr and > would like to make > > sure that searches only return results for items that > users have > > access to. > > > > I've looked over some of the older posts on this > mailing list about > > access control and saw a suggestion along the lines > of > > acl: AND (actual query). > > > > While this obviously works, there are a couple of > niggles. Every item > > must have a list of valid user ids (typically less > than 100 in my > > case). Every time a collaborator is added to or > removed from a > > project, I need to update every item in that project. > This will > > typically be fewer than 1000 items, so I guess is no > big deal. > > > > I wondered if the following might be a reasonable > alternative, > > assuming the number of projects to which a user has > access is lower > > than a certain bound. > > (acl: OR acl: OR > ... ) AND (actual query) > > > > When the numbers are small - e.g. each user has access > to ~20 projects > > and each project has ~20 collaborators - is one > approach preferable > > over another? And when outliers exist - e.g. a project > with 2000 > > collaborators, or a user with access to 2000 projects > - is one > > approach more liable to fail than the other? > > > > Many thanks > > > > Paul > > > > > > -- > °O° > "Good Enough" is not good enough. > To give anything less than your best is to sacrifice the > gift. > Quality First. Measure Twice. Cut Once. > http://www.israelekpo.com/ >
Re: Modelling Access Control
Forgot to add, 3/ The external, application code selects the GROUPS that the user has permission to read (Solr will only serve up what is to be read?) then search on those groups. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Dennis Gearon wrote: > From: Dennis Gearon > Subject: Re: Modelling Access Control > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 11:49 AM > Two things will lessen the solr > admininstrative load : > > 1/ Follow examples of databases and *nix OSs. Give each > user their own group, or set up groups that don't have > regular users as OWNERS, but can have users assigned to the > group to give them particular permissions. I.E. Roles, like > publishers, reviewers, friends, etc. > > 2/ Put your ACL outside of Solr, using your > server-side/command line language's object oriented > properties. Force all searches to come from a single > location in code (not sure how to do that), and make the > piece of code check authentication and authorization. > > This is what my research shows how others do it, and how I > plan to do it. ANY insight others have on this, I really > want to hear. > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. > It is usually a better idea to learn from others’ > mistakes, so you do not have to make them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Sat, 10/23/10, Paul Carey > wrote: > > > From: Paul Carey > > Subject: Modelling Access Control > > To: solr-user@lucene.apache.org > > Date: Saturday, October 23, 2010, 1:03 AM > > Hi > > > > My domain model is made of users that have access to > > projects which > > are composed of items. I'm hoping to use Solr and > would > > like to make > > sure that searches only return results for items that > users > > have > > access to. > > > > I've looked over some of the older posts on this > mailing > > list about > > access control and saw a suggestion along the lines > of > > acl: AND (actual query). > > > > While this obviously works, there are a couple of > niggles. > > Every item > > must have a list of valid user ids (typically less > than 100 > > in my > > case). Every time a collaborator is added to or > removed > > from a > > project, I need to update every item in that project. > This > > will > > typically be fewer than 1000 items, so I guess is no > big > > deal. > > > > I wondered if the following might be a reasonable > > alternative, > > assuming the number of projects to which a user has > access > > is lower > > than a certain bound. > > (acl: OR acl: OR > ... ) > > AND (actual query) > > > > When the numbers are small - e.g. each user has access > to > > ~20 projects > > and each project has ~20 collaborators - is one > approach > > preferable > > over another? And when outliers exist - e.g. a project > with > > 2000 > > collaborators, or a user with access to 2000 projects > - is > > one > > approach more liable to fail than the other? > > > > Many thanks > > > > Paul > > >
Re: Multiple indexes inside a single core
Ah, I should have read more carefully... I remember this being discussed on the dev list, and I thought there might be a Jira attached but I sure can't find it. If you're willing to work on it, you might hop over to the solr dev list and start a discussion, maybe ask for a place to start. I'm sure some of the devs have thought about this... If nobody on the dev list says "There's already a JIRA on it", then you should open one. The Jira issues are generally preferred when you start getting into design because the comments are preserved for the next person who tries the idea or makes changes, etc Best Erick On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess wrote: > Thanks Erick. The problem with multiple cores is that the documents are > scored independently in each core. I would like to be able to search across > both cores and have the scores 'normalized' in a way that's similar to what > Lucene's MultiSearcher would do. As far a I understand, multiple cores > would likely result in seriously skewed scores in my case since the > documents are not distributed evenly or randomly. I could have one > core/index with 20 million docs and another with 200. > > I've poked around in the code and this feature doesn't seem to exist. I > would be happy with finding a decent place to try to add it. I'm not sure > if there is a clean place for it. > > Ben > > On Oct 20, 2010, at 8:36 PM, Erick Erickson > wrote: > > > It seems to me that multiple cores are along the lines you > > need, a single instance of Solr that can search across multiple > > sub-indexes that do not necessarily share schemas, and are > > independently maintainable.. > > > > This might be a good place to start: > http://wiki.apache.org/solr/CoreAdmin > > > > HTH > > Erick > > > > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess > wrote: > > > >> We are trying to convert a Lucene-based search solution to a > >> Solr/Lucene-based solution. The problem we have is that we currently > have > >> our data split into many indexes and Solr expects things to be in a > single > >> index unless you're sharding. In addition to this, our indexes wouldn't > >> work well using the distributed search functionality in Solr because the > >> documents are not evenly or randomly distributed. We are currently > using > >> Lucene's MultiSearcher to search over subsets of these indexes. > >> > >> I know this has been brought up a number of times in previous posts and > the > >> typical response is that the best thing to do is to convert everything > into > >> a single index. One of the major reasons for having the indexes split > up > >> the way we do is because different types of data need to be indexed at > >> different intervals. You may need one index to be updated every 20 > minutes > >> and another is only updated every week. If we move to a single index, > then > >> we will constantly be warming and replacing searchers for the entire > >> dataset, and will essentially render the searcher caches useless. If we > >> were able to have multiple indexes, they would each have a searcher and > >> updates would be isolated to a subset of the data. > >> > >> The other problem is that we will likely need to shard this large single > >> index and there isn't a clean way to shard randomly and evenly across > the > >> of > >> the data. We would, however like to shard a single data type. If we > could > >> use multiple indexes, we would likely be also sharding a small sub-set > of > >> them. > >> > >> Thanks in advance, > >> > >> Ben > >> >
Re: FieldCache
Why do you want to? Basically, the caches are there to improve #searching#. To search something, you must index it. Retrieving it is usually a rare enough operation that caching is irrelevant. This smells like an XY problem, see: http://people.apache.org/~hossman/#xyproblem If this seems like gibberish, could you explain your problem a little more? Best Erick On Thu, Oct 21, 2010 at 10:20 AM, Mathias Walter wrote: > Hi, > > does a field which should be cached needs to be indexed? > > I have a binary field which is just stored. Retrieving it via > FieldCache.DEFAULT.getTerms returns empty ByteRefs. > > Then I found the following post: > http://www.mail-archive.com/d...@lucene.apache.org/msg05403.html > > How can I use the FieldCache with a binary field? > > -- > Kind regards, > Mathias > >
Re: How to index long words with StandardTokenizerFactory?
Ops I am sorry, I thought that solr/lib refers to solrhome/lib. I just tested this and it seems that you have successfully increased the max token length. You can verify this by analysis.jsp page. Although analysis.jsp's output, it seems that some other mechanism is preventing this huge token to be indexed. Response of http://localhost:8983/solr/terms?terms.fl=body does not have that huge token. If you are interested in only prefix queries, as a workaround, you can use at index time. So the query (without star) solr/select?q=body:big will return that document. By the way for this particular task you don't need to edit lucene/solr disto. You can use this class for this with standard pre-compiled solr.war. By putting jar into SolrHome/lib directory. package foo.solr.analysis; import org.apache.lucene.analysis.standard.StandardTokenizer; import org.apache.solr.analysis.BaseTokenizerFactory; import java.io.Reader; public class CustomStandardTokenizerFactory extends BaseTokenizerFactory { public StandardTokenizer create(Reader input) { final StandardTokenizer tokenizer = new StandardTokenizer(input); tokenizer.setMaxTokenLength(Integer.MAX_VALUE); return tokenizer; } } --- On Sat, 10/23/10, Sergey Bartunov wrote: > From: Sergey Bartunov > Subject: Re: How to index long words with StandardTokenizerFactory? > To: solr-user@lucene.apache.org > Date: Saturday, October 23, 2010, 6:01 PM > This is exactly what I did. Look: > > >> >> 3) I replace lucene-core-2.9.3.jar in > solr/lib/ by > >> my > >> >> lucene-core-2.9.3-dev.jar that I'd just > compiled > >> >> 4) than I do "ant compile" and "ant dist" > in solr > >> folder > >> >> 5) after that I recompile > >> solr/example/webapps/solr.war > > On 23 October 2010 18:53, Ahmet Arslan > wrote: > > I think you should replace your new > lucene-core-2.9.3-dev.jar in \apache-solr-1.4.1\lib and then > create a new solr.war under \apache-solr-1.4.1\dist. And > copy this new solr.war to solr/example/webapps/solr.war > > > > --- On Sat, 10/23/10, Sergey Bartunov > wrote: > > > >> From: Sergey Bartunov > >> Subject: Re: How to index long words with > StandardTokenizerFactory? > >> To: solr-user@lucene.apache.org > >> Date: Saturday, October 23, 2010, 5:45 PM > >> Yes. I did. Won't help. > >> > >> On 23 October 2010 17:45, Ahmet Arslan > >> wrote: > >> > Did you delete the folder > >> Jetty_0_0_0_0_8983_solr.war_** under > >> apache-solr-1.4.1\example\work? > >> > > >> > --- On Sat, 10/23/10, Sergey Bartunov > >> wrote: > >> > > >> >> From: Sergey Bartunov > >> >> Subject: Re: How to index long words > with > >> StandardTokenizerFactory? > >> >> To: solr-user@lucene.apache.org > >> >> Date: Saturday, October 23, 2010, 3:56 > PM > >> >> Here are all the files: http://rghost.net/3016862 > >> >> > >> >> 1) StandardAnalyzer.java, > StandardTokenizer.java - > >> patched > >> >> files from > >> >> lucene-2.9.3 > >> >> 2) I patch these files and build lucene > by typing > >> "ant" > >> >> 3) I replace lucene-core-2.9.3.jar in > solr/lib/ by > >> my > >> >> lucene-core-2.9.3-dev.jar that I'd just > compiled > >> >> 4) than I do "ant compile" and "ant dist" > in solr > >> folder > >> >> 5) after that I recompile > >> solr/example/webapps/solr.war > >> >> with my new > >> >> solr and lucene-core jars > >> >> 6) I put my schema.xml in > solr/example/solr/conf/ > >> >> 7) then I do "java -jar start.jar" in > >> solr/example > >> >> 8) index big_post.xml > >> >> 9) trying to find this document by "curl > >> >> http://localhost:8983/solr/select?q=body:big*"; > >> >> (big_post.xml contains > >> >> a long word biga...) > >> >> 10) solr returns nothing > >> >> > >> >> On 23 October 2010 02:43, Steven A Rowe > > >> >> wrote: > >> >> > Hi Sergey, > >> >> > > >> >> > What does your ~34kb field value > look like? > >> Does > >> >> StandardTokenizer think it's just one > token? > >> >> > > >> >> > What doesn't work? What happens? > >> >> > > >> >> > Steve > >> >> > > >> >> >> -Original Message- > >> >> >> From: Sergey Bartunov [mailto:sbos@gmail.com] > >> >> >> Sent: Friday, October 22, 2010 > 3:18 PM > >> >> >> To: solr-user@lucene.apache.org > >> >> >> Subject: Re: How to index long > words > >> with > >> >> StandardTokenizerFactory? > >> >> >> > >> >> >> I'm using Solr 1.4.1. Now I'm > successed > >> with > >> >> replacing lucene-core jar > >> >> >> but maxTokenValue seems to be > used in > >> very strange > >> >> way. Currenty for > >> >> >> me it's set to 1024*1024, but I > couldn't > >> index a > >> >> field with just size > >> >> >> of ~34kb. I understand that it's > a little > >> weird to > >> >> index such a big > >> >> >> data, but I just want to know it > doesn't > >> work > >> >> >> > >> >> >> On 22 October 2010 20:36, Steven > A Rowe > >> > >> >> wrote: > >> >> >> > Hi Sergey, > >> >> >> > > >> >> >> > I've opened an issue to add > a > >> maxTokenLength > >> >> param to the > >> >>
Re: Solr sorting problem
In general, the behavior when sorting is not predictable when sorting on a tokenized field, which "text" is. What would it mean to sort on a field with "erick" "Moazzam" as tokens in a single document? Should it be in the "e"s or the "m"s? That said, you probably want to watch out for case Best Erick On Fri, Oct 22, 2010 at 10:02 AM, Moazzam Khan wrote: > For anyone who faced the same problem, changing the field to string > from text worked! > > -Moazzam > > On Fri, Oct 22, 2010 at 8:50 AM, Moazzam Khan wrote: > > The field type of the first name and last name is text. Could that be > > why it's not sorting properly? I just changed it to string and started > > a full-import. Hopefully that will work. > > > > Thanks, > > Moazzam > > > > On Thu, Oct 21, 2010 at 7:42 PM, Jayendra Patil > > wrote: > >> need additional information . > >> Sorting is easy in Solr just by passing the sort parameter > >> > >> However, when it comes to text sorting it depends on how you analyse > >> and tokenize your fields > >> Sorting does not work on fields with multiple tokens. > >> > http://wiki.apache.org/solr/FAQ#Why_Isn.27t_Sorting_Working_on_my_Text_Fields.3F > >> > >> On Thu, Oct 21, 2010 at 7:24 PM, Moazzam Khan > wrote: > >> > >>> Hey guys, > >>> > >>> I have a list of people indexed in Solr. I am trying to sort by their > >>> first names but I keep getting results that are not alphabetically > >>> sorted (I see the names starting with W before the names starting with > >>> A). I have a feeling that the results are first being sorted by > >>> relevancy then sorted by first name. > >>> > >>> Is there a way I can get the results to be sorted alphabetically? > >>> > >>> Thanks, > >>> Moazzam > >>> > >> > > >
Re: MoreLikeThis explanation?
Hi Darren, Usually patches are written for the latest trunk branch at the time. I've just updated the patch. Try it for the current trunk if you prefer. Koji -- http://www.rondhuit.com/en/ (10/10/22 19:10), Darren Govoni wrote: Hi Koji, I tried to apply your patch to the 1.4.0 tagged branch, but it didn't take completely. What branch does it work for? Darren On Thu, 2010-10-21 at 23:03 +0900, Koji Sekiguchi wrote: (10/10/21 20:33), dar...@ontrenet.com wrote: Hi, Does the latest Solr provide an explanation for results returned by MLT? No, but there is an open issue: https://issues.apache.org/jira/browse/SOLR-860 Koji
Re: Modelling Access Control
Pushing ACL logic outside Solr sounds like a prudent choice indeed as in, my opinion, all of the business rules/conceptual logic should reside only within the code boundaries. This way your domain will be easier to model and your code to read, understand and maintain. More information on Filter Queries, when they should be used and how they affect performance can be found here: http://wiki.apache.org/solr/FilterQueryGuidance On 23 October 2010 20:00, Dennis Gearon wrote: > Forgot to add, > 3/ The external, application code selects the GROUPS that the user has > permission to read (Solr will only serve up what is to be read?) then search > on those groups. > > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make them > yourself. from ' > http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Sat, 10/23/10, Dennis Gearon wrote: > > > From: Dennis Gearon > > Subject: Re: Modelling Access Control > > To: solr-user@lucene.apache.org > > Date: Saturday, October 23, 2010, 11:49 AM > > Two things will lessen the solr > > admininstrative load : > > > > 1/ Follow examples of databases and *nix OSs. Give each > > user their own group, or set up groups that don't have > > regular users as OWNERS, but can have users assigned to the > > group to give them particular permissions. I.E. Roles, like > > publishers, reviewers, friends, etc. > > > > 2/ Put your ACL outside of Solr, using your > > server-side/command line language's object oriented > > properties. Force all searches to come from a single > > location in code (not sure how to do that), and make the > > piece of code check authentication and authorization. > > > > This is what my research shows how others do it, and how I > > plan to do it. ANY insight others have on this, I really > > want to hear. > > > > Dennis Gearon > > > > Signature Warning > > > > It is always a good idea to learn from your own mistakes. > > It is usually a better idea to learn from others’ > > mistakes, so you do not have to make them yourself. from ' > http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > EARTH has a Right To Life, > > otherwise we all die. > > > > > > --- On Sat, 10/23/10, Paul Carey > > wrote: > > > > > From: Paul Carey > > > Subject: Modelling Access Control > > > To: solr-user@lucene.apache.org > > > Date: Saturday, October 23, 2010, 1:03 AM > > > Hi > > > > > > My domain model is made of users that have access to > > > projects which > > > are composed of items. I'm hoping to use Solr and > > would > > > like to make > > > sure that searches only return results for items that > > users > > > have > > > access to. > > > > > > I've looked over some of the older posts on this > > mailing > > > list about > > > access control and saw a suggestion along the lines > > of > > > acl: AND (actual query). > > > > > > While this obviously works, there are a couple of > > niggles. > > > Every item > > > must have a list of valid user ids (typically less > > than 100 > > > in my > > > case). Every time a collaborator is added to or > > removed > > > from a > > > project, I need to update every item in that project. > > This > > > will > > > typically be fewer than 1000 items, so I guess is no > > big > > > deal. > > > > > > I wondered if the following might be a reasonable > > > alternative, > > > assuming the number of projects to which a user has > > access > > > is lower > > > than a certain bound. > > > (acl: OR acl: OR > > ... ) > > > AND (actual query) > > > > > > When the numbers are small - e.g. each user has access > > to > > > ~20 projects > > > and each project has ~20 collaborators - is one > > approach > > > preferable > > > over another? And when outliers exist - e.g. a project > > with > > > 2000 > > > collaborators, or a user with access to 2000 projects > > - is > > > one > > > approach more liable to fail than the other? > > > > > > Many thanks > > > > > > Paul > > > > > >
Re: pf parameter in edismax (SOLR-1553)
Answering my own question: The "pf" feature only kicks in with multi term "q" param. In my case I used a field tokenized by KeywordTokenizer, hence pf never kicked in. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 14. okt. 2010, at 13.29, Jan Høydahl / Cominvent wrote: > Hi, > > Have applied SOLR-1553 to 1.4.2 and it works great. > However, I can't get the pf param to work. Example: > q=foo bar&qf=title^2.0 body^0.5&pf=title^50.0 > > Shouldn't I see the phrase query boost in debugQuery? Currently I see no > trace of pf being used. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com >
Re: Modelling Access Control
Hi All, I think using filter queries will be a good option to consider because of the following reasons * The filter query does not affect the score of the items in the result set. If the ACL logic is part of the main query, it could influence the scores of the items in the result set. * Using a filter query could lead to better performance in complex queries because the results from the query specified with fq are cached independently from that of the main query. Since the result of a filter query is cached, it will be used to filter the primary query result using set intersection without having to fetch the ids of the documents from the fq again a second time. It think this will be useful because we could assume that the ACL portion in the fq is relatively constant since the permissions for each user is not something that is changing frequently. http://wiki.apache.org/solr/FilterQueryGuidance On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon wrote: > why use filter queries? > > Wouldn't reducing the set headed into the filters by putting it in the main > query be faster? (A question to learn, since I do NOT know :-) > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make them > yourself. from ' > http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Sat, 10/23/10, Israel Ekpo wrote: > > > From: Israel Ekpo > > Subject: Re: Modelling Access Control > > To: solr-user@lucene.apache.org > > Date: Saturday, October 23, 2010, 7:01 AM > > Hi Paul, > > > > Regardless of how you implement it, I would recommend you > > use filter queries > > for the permissions check rather than making it part of the > > main query. > > > > On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey > > wrote: > > > > > Hi > > > > > > My domain model is made of users that have access to > > projects which > > > are composed of items. I'm hoping to use Solr and > > would like to make > > > sure that searches only return results for items that > > users have > > > access to. > > > > > > I've looked over some of the older posts on this > > mailing list about > > > access control and saw a suggestion along the lines > > of > > > acl: AND (actual query). > > > > > > While this obviously works, there are a couple of > > niggles. Every item > > > must have a list of valid user ids (typically less > > than 100 in my > > > case). Every time a collaborator is added to or > > removed from a > > > project, I need to update every item in that project. > > This will > > > typically be fewer than 1000 items, so I guess is no > > big deal. > > > > > > I wondered if the following might be a reasonable > > alternative, > > > assuming the number of projects to which a user has > > access is lower > > > than a certain bound. > > > (acl: OR acl: OR > > ... ) AND (actual query) > > > > > > When the numbers are small - e.g. each user has access > > to ~20 projects > > > and each project has ~20 collaborators - is one > > approach preferable > > > over another? And when outliers exist - e.g. a project > > with 2000 > > > collaborators, or a user with access to 2000 projects > > - is one > > > approach more liable to fail than the other? > > > > > > Many thanks > > > > > > Paul > > > > > > > > > > > -- > > °O° > > "Good Enough" is not good enough. > > To give anything less than your best is to sacrifice the > > gift. > > Quality First. Measure Twice. Cut Once. > > http://www.israelekpo.com/ > > > -- °O° "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: How to delete a SOLR document if that particular data doesnt exist in DB?
Thanks a lot for all your replies. I finally wrote a program which will fetch and store all the UID from source (DB) in one list and fetch and store all the UID from SOLR document in another list. Next using the binarySearch method of collection I was able to filter out the list of UID's that are not present in SOLR UID list with that of DB UID list and passed those UID for deletion using deletebyQuery. It took under 7 minutes to compare 2 list with over 3 million records (in each list) and delete the orphan documents from SOLR index. Again thanks a lot for all your replies. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1761093.html Sent from the Solr - User mailing list archive at Nabble.com.