RE: Computing an md5 of a text field.
XML escaping is probably the best approach. Either surround the whole thing with "<[CDATA[" and "]]>", or do use one of the many libraries out there that will escape the string for you. While an MD5 is designed to be cryptographically secure one way function, it is NOT guaranteed to be a one-to-one (invertible) function. You could theoretically have two distinct URLs that have the same MD5. > -Original Message- > From: Nuno Leitao [mailto:[EMAIL PROTECTED] > Sent: Monday, July 23, 2007 5:22 PM > To: solr-user@lucene.apache.org > Subject: Re: Computing an md5 of a text field. > > Thanks Yonik, > > Basically, I am indexing a number of items where the unique > ID is a URL. Because URL's can contain invalid XML > characters, and I will be doing some XSLT postprocessing, I > was thinking that a good way to solve the problem would be to > store these unique ID's as md5's instead. > > I think I found another alternative - it follows the > pre-processing avenue you suggested. > > Best Regards. > > --Nuno > > On 23 Jul 2007, at 18:25, Yonik Seeley wrote: > > > On 7/23/07, Nuno Leitao <[EMAIL PROTECTED]> wrote: > >> I would like to be able to compute and store the MD5 sum > for a given > >> text in a field (in my case, I am talking about a URL string). For > >> example, if I have a field called 'url' the following would happen: > >> > >> 'http://wiki.apache.org' -> 'cb4f7e6ca1a0c00b146894b75d9f98dc' > > > > First, what are you trying to achieve by this? If you give > people the > > higher level problem, they might be able to suggest a better way. > > > > Since you construct the XML document to send to Solr, > simply compute > > the MD5 and add that also: > > > > http://wiki.apache.org > > cb4f7e6ca1a0c00b146894b75d9f98dc > > > > Or did you want to store the MD5 instead of the URL? Did > you want it > > searchable somehow? > > > > -Yonik >
RE: Filtering using data only available at query time
Can you add some fields that let set a filter or query that weed out the results that the user doesn't have access too? If its as simple as Admin versus User, you could have a boolean field called AdminOnly, and when a User is querying, add a fq=[* TO *] -AdminOnly:true You could get more specific if you need to, just provide the information that you would use to determine the availability of the record to any given user, and then construct the filter based on the current user. > -Original Message- > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > Sent: Monday, August 27, 2007 10:00 AM > To: solr-user@lucene.apache.org > Subject: Filtering using data only available at query time > > I've got a Lucene-based search implementation which searches > over documents in a CMS and weeds out those hits which aren't > accessible to the user carrying out the search. The raw > search results are returned as an iterator, and I wrap > another iterator around this to silently consume the > inaccessible hits. (Yes, I know... wasteful!) The search is > therefore based on data (user permissions) which can't be > known at indexing time. > > I'm now porting the search implementation over to Solr. I > took a look at FunctionQuery, and wondered if there was some > way I could use it to do this kind of filtering - but as far > as I can tell, it's only about scoring a hit > - ValueSource can't signal 'don't include this at all'. Is > there a case for introducing some kind of boolean > include/exclude factor somewhere along the API? Or is there > another obvious way to do this? I guess I could implement my > own Query subclass and use it as a filter [query] in the > search, but I wonder if it would be still be useful in FunctionQuery. > > Jon > >
RE: Filtering using data only available at query time
I think you're missing my point. Don't index which users have permission, index which type of user has permission. Then _filter_ based on that. > -Original Message- > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > Sent: Monday, August 27, 2007 10:26 AM > To: solr-user@lucene.apache.org > Subject: RE: Filtering using data only available at query time > > I know what you mean, and maybe I'm just being obstinate. > But in the general case, it isn't possible to know these > things ahead of time. The indexing machinery isn't told > about changes in user permissions (e.g. > demotion from administrative to ordinary user), and even if > it were I'd hate to have to reindex everything just to > reflect that change. > > Jon > > > -Original Message- > > From: Daniel Pitts [mailto:[EMAIL PROTECTED] > > Sent: 27 August 2007 18:10 > > To: solr-user@lucene.apache.org > > Subject: RE: Filtering using data only available at query time > > > > Can you add some fields that let set a filter or query that > weed out > > the results that the user doesn't have access too? > > > > If its as simple as Admin versus User, you could have a > boolean field > > called AdminOnly, and when a User is querying, add a fq=[* TO *] > > -AdminOnly:true > > > > You could get more specific if you need to, just provide the > > information that you would use to determine the availability of the > > record to any given user, and then construct the filter > based on the > > current user. > > > > > -Original Message- > > > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > > > Sent: Monday, August 27, 2007 10:00 AM > > > To: solr-user@lucene.apache.org > > > Subject: Filtering using data only available at query time > > > > > > I've got a Lucene-based search implementation which searches over > > > documents in a CMS and weeds out those hits which aren't > > accessible to > > > the user carrying out the search. The raw search results > > are returned > > > as an iterator, and I wrap another iterator around this > to silently > > > consume the inaccessible hits. (Yes, I know... wasteful!) > > The search > > > is therefore based on data (user permissions) which can't > > be known at > > > indexing time. > > > > > > I'm now porting the search implementation over to Solr. I > > took a look > > > at FunctionQuery, and wondered if there was some way I > > could use it to > > > do this kind of filtering - but as far as I can tell, it's > > only about > > > scoring a hit > > > - ValueSource can't signal 'don't include this at all'. > Is there a > > > case for introducing some kind of boolean include/exclude factor > > > somewhere along the API? Or is there another obvious way > > to do this? > > > I guess I could implement my own Query subclass and use it > > as a filter > > > [query] in the search, but I wonder if it would be still be > > useful in > > > FunctionQuery. > > > > > > Jon > > > > > > > > > > > > >
RE: Filtering using data only available at query time
Okay, but you can put into your index the [permission affecting data], and add a filter for the [current access permission]. In other words, you're front-end handles the current business rules to create the appropriate filter query, and passes that to the solr query handler. > -Original Message- > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > Sent: Monday, August 27, 2007 12:02 PM > To: solr-user@lucene.apache.org > Subject: RE: Filtering using data only available at query time > > But [the type of user] which has permission can change too. > > > -----Original Message- > > From: Daniel Pitts [mailto:[EMAIL PROTECTED] > > Sent: 27 August 2007 19:07 > > To: solr-user@lucene.apache.org > > Subject: RE: Filtering using data only available at query time > > > > I think you're missing my point. > > > > Don't index which users have permission, index which type > of user has > > permission. Then _filter_ based on that. > > > > > -Original Message- > > > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > > > Sent: Monday, August 27, 2007 10:26 AM > > > To: solr-user@lucene.apache.org > > > Subject: RE: Filtering using data only available at query time > > > > > > I know what you mean, and maybe I'm just being obstinate. > > > But in the general case, it isn't possible to know these > > things ahead > > > of time. The indexing machinery isn't told about changes in user > > > permissions (e.g. > > > demotion from administrative to ordinary user), and even > if it were > > > I'd hate to have to reindex everything just to reflect > that change. > > > > > > Jon > > > > > > > -Original Message- > > > > From: Daniel Pitts [mailto:[EMAIL PROTECTED] > > > > Sent: 27 August 2007 18:10 > > > > To: solr-user@lucene.apache.org > > > > Subject: RE: Filtering using data only available at query time > > > > > > > > Can you add some fields that let set a filter or query that > > > weed out > > > > the results that the user doesn't have access too? > > > > > > > > If its as simple as Admin versus User, you could have a > > > boolean field > > > > called AdminOnly, and when a User is querying, add a > fq=[* TO *] > > > > -AdminOnly:true > > > > > > > > You could get more specific if you need to, just provide the > > > > information that you would use to determine the > > availability of the > > > > record to any given user, and then construct the filter > > > based on the > > > > current user. > > > > > > > > > -Original Message- > > > > > From: Jonathan Woods [mailto:[EMAIL PROTECTED] > > > > > Sent: Monday, August 27, 2007 10:00 AM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Filtering using data only available at query time > > > > > > > > > > I've got a Lucene-based search implementation which > > searches over > > > > > documents in a CMS and weeds out those hits which aren't > > > > accessible to > > > > > the user carrying out the search. The raw search results > > > > are returned > > > > > as an iterator, and I wrap another iterator around this > > > to silently > > > > > consume the inaccessible hits. (Yes, I know... wasteful!) > > > > The search > > > > > is therefore based on data (user permissions) which can't > > > > be known at > > > > > indexing time. > > > > > > > > > > I'm now porting the search implementation over to Solr. I > > > > took a look > > > > > at FunctionQuery, and wondered if there was some way I > > > > could use it to > > > > > do this kind of filtering - but as far as I can tell, it's > > > > only about > > > > > scoring a hit > > > > > - ValueSource can't signal 'don't include this at all'. > > > Is there a > > > > > case for introducing some kind of boolean > > include/exclude factor > > > > > somewhere along the API? Or is there another obvious way > > > > to do this? > > > > > I guess I could implement my own Query subclass and use it > > > > as a filter > > > > > [query] in the search, but I wonder if it would be still be > > > > useful in > > > > > FunctionQuery. > > > > > > > > > > Jon > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
RE: Forced Top Document
I'm going to be doing something similar, and I don't think I'll be sorting by score (although, that might be feasible). In my use-case though, we don't want to include something unless it is already matched by our filters. I'll probably end up just making two search hits, but it would be nice if solr could handle it for us. > -Original Message- > From: Charlie Jackson [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 24, 2007 10:57 AM > To: solr-user@lucene.apache.org > Subject: RE: Forced Top Document > > Yes, this will only work if the results are sorted by score > (the default). > > One thing I thought of after I sent this out was that this > will include the specified document even if it doesn't match > your search criteria, which may not be what you want. > > > -Original Message- > From: mark angelillo [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 24, 2007 12:44 PM > To: solr-user@lucene.apache.org > Subject: Re: Forced Top Document > > Charlie, > > That's interesting. I did try something like this. Did you > try your query with a sorting parameter? > > What I've read suggests that all the results are returned > based on the query specified, but then resorted as specified. > Boosting (which modifies the document's score) should not > change the order unless the results are sorted by score. > > Mark > > On Oct 24, 2007, at 1:05 PM, Charlie Jackson wrote: > > > Do you know which document you want at the top? If so, I > believe you > > could just add an "OR" clause to your query to boost that document > > very high, such as > > > > ?q=foo OR id:bar^1000 > > > > Tried this on my installation and it did, indeed push the document > > specified to the top. > > > > > > > > -Original Message- > > From: Matthew Runo [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, October 24, 2007 10:17 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Forced Top Document > > > > I'd love to know this, as I just got a development request for this > > very feature. I'd rather not spend time on it if it already exists. > > > > ++ > > | Matthew Runo > > | Zappos Development > > | [EMAIL PROTECTED] > > | 702-943-7833 > > ++ > > > > > > On Oct 23, 2007, at 10:12 PM, mark angelillo wrote: > > > >> Hi all, > >> > >> Is there a way to get a specific document to appear on top > of search > >> results even if a sorting parameter would push it further down? > >> > >> Thanks in advance, > >> Mark > >> > >> mark angelillo > >> snooth inc. > >> o: 646.723.4328 > >> c: 484.437.9915 > >> [EMAIL PROTECTED] > >> snooth -- 1.8 million ratings and counting... > >> > >> > > > > mark angelillo > snooth inc. > o: 646.723.4328 > c: 484.437.9915 > [EMAIL PROTECTED] > snooth -- 1.8 million ratings and counting... > >
RE: Solr logo poll
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Yonik Seeley > Sent: Friday, April 06, 2007 10:52 AM > To: solr-user@lucene.apache.org > Subject: Solr logo poll > > Quick poll... Solr 2.1 release planning is underway, and a > new logo may be a part of that. > What "form" of logo do you prefer, A or B? There may be > further tweaks to these pictures, but I'd like to get a sense > of what the user community likes. > > A) > http://issues.apache.org/jira/secure/attachment/12349897/logo- > solr-d.jpg > > B) > http://issues.apache.org/jira/secure/attachment/12353535/12353 > 535_solr-nick.gif > > Just respond to this thread with your preference. > I like A bit better than B. To me, B looks like its out of a 90's kids show. (No offense to the creator of that image)
RE: Solr Query Language
It looks like (from the exception) that you missed a space. Perhaps your actual query was constructed like: String query = "width:[" + lowWidth + " TO" + highWidth +"]"; Where you *wanted* String query = "width:[" + lowWidth + " TO " + highWidth +"]"; > -Original Message- > From: Jack L [mailto:[EMAIL PROTECTED] > Sent: Sunday, April 15, 2007 10:32 PM > To: solr-user@lucene.apache.org > Subject: Solr Query Language > > > Is the lucene query syntax available in solr? I saw this page > about lucene query syntax: > http://lucene.apache.org/java/docs/queryparsersyntax.html > > I tried "width:[0 TO 500]" and got an exception: > java.lang.NumberFormatException: For input string: "TO500" > at > java.lang.NumberFormatException.forInputString(NumberFormatExc > eption.java:48) > at java.lang.Integer.parseInt(Integer.java:447) > at java.lang.Integer.parseInt(Integer.java:497) > > If solr query language is different from that of Lucene, is > there a page that documents this? > > -- > Best regards, > Jack >
RE: Heap Out of Memory Error
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 22, 2007 12:31 PM > > Hi, > > I am running Solr within the Jetty using start.jar. I am > indexing about 200,000 documents. Sometimes out of the blue, > the Solr instance cannot process any more requests and > returns "heap out of memory" error. > > This happens more often when I issue queries against the > index that is being updated. > > Is there some configuration setting I need to change? > > Also, the server itself has plenty of RAM even when this > error appears. So it appears Java is running out of its heap > memory while there is still enough of RAM available for other > processes. > > Thanks, > Av This seems like the java heap size is set too low. Wherever you start the JVM, you can pass in -Xmx512m -Xms256m. Adjusting the values as necessary.