Re: Implementing PhraseQuery and MoreLikeThis Query in one app
Otis, Here're the logs - method calls along with their outputs (sorry for the bulk data :) ). I compared 3 runs. 1) GetMethod a) url=http://localhost:8080/solr/mlt b) query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score Output: INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand(): url=http://localhost:8080/solr/mlt ; queryString=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+sc ore INFO MLT2SearchRequestProcessor:76 - 002.098612S.G.SG_Book0.28923997O. HenryS.G.Four Million, The0.08667877Katherine MosbyThe Season of Lillian Dawes0.07947738Jerome K. JeromeThree Men in a Boat0.047219563Charles OliverS.G.ABC's of Science1.01.01.01.01.0 2) GetMethod a) url=http://localhost:8080/solr/select b) query=q=id:10&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+author+score Output: INFO MLT2SearchRequestProcessor:87 - In method sendGetCommand(): url=http://localhost:8080/solr/sel ect; queryString=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=tit le+author+score INFO MLT2SearchRequestProcessor:76 - 015title author scorecontent_mltid:10truedetails5< /lst>2.098612S.G.SG_Book0.24578805O. HenryS.G.Four Million, The0.22171465Jerome K. JeromeThree Men in a Boat0.22018899Katherine MosbyTh e Season of Lillian Dawes0.098666154 Charles OliverS.G.ABC's of Scienceid:10id:10id:10id:10 2.098612 = (MATCH) weight(id:10 in 3), product of: 0.9994 = queryWeight(id:10), product of: 2.0986123 = idf(docFreq=1, numDocs=5) 0.47650534 = queryNorm 2.0986123 = (MATCH) fieldWeight(id:10 in 3), product of: 1.0 = tf(termFreq(id:10)=1) 2.0986123 = idf(docFreq=1, numDocs=5) 1.0 = fieldNorm(field=id, doc=3) OldLuceneQParser15.00.00.00.00.00.00.015.00.00.015.00.00.0 3) SolrJ call a) url=http://localhost:8080/solr b) query=q=id:10&mlt=true&mlt.fl=content_mlt&mlt.maxqt=5&mlt.interestingTerms=details&fl=title+auth or+score Output: INFO MLTSearchRequestProcessor:45 - SolrServer url: http://localhost:8080/solr INFO MLTSearchRequestProcessor:51 - id = 10 INFO MLTSearchRequestProcessor:53 - constructedQuery> id:10 INFO MLTSearchRequestProcessor:63 - solrQuery> q=id%3A10&mlt=true&mlt.fl=content_mlt&mlt.maxqt= 5&mlt.interestingTerms=details&fl=title+author+score INFO MLTSearchRequestProcessor:69 - Number of docs found = 1 INFO MLTSearchRequestProcessor:73 - title = SG_Book; score = 2.098612 One can see that the results of 2 runs with GetMethod are almost identical: docs found and their weights are the same. (Although the values themselves are doubtful: for example, the response contains the original doc, though it wasn't supposed to be in the returned list of "more like this" docs. Then its weight shows that its id=10 was found in three other docs what shouldn't be like that. (Or it's just that rare coincidence that 10 is among the most important terms of this doc and other docs happen to contain it. But it looks very unlikely. Or I simply misinterpret it?) Plus individual weights for "intestingTerms" are the same (1.0) and that's also questionable. And the 3rd run (SolrJ call) returned just the original doc (with the same weight as in the first two calls). Maybe the problem lurks somewhere in solrconfig.xml? Now I don't have a slightest idea where to look for a hint. Anyway, it's a holiday today. (Hopefully my message doesn't interrupt it. :) ) Have a great 4th of July! Sergey Otis Gospodnetic wrote: > > > Sergey, > > I think I confused you. The comment about the fields listed in the "fl" > parameter has nothing to do with the SolrJ calls not working. > > For SolrJ calls not working my suggestion is to look at the logs and > compare the GetMethod call with the SolrJ call. Paste them if you want > more people to look at them. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: SergeyG >> To: solr-user@lucene.apache.org >> Sent: Friday, July 3, 2009 4:08:37 AM >> Subject: Re: Implementing PhraseQuery and MoreLikeThis Query in one app >> >> >> Otis, >> >> Thanks a lot. I'd certainly follow your advice and check the logs. >> Although, >> I must say that I've already tried all possible variations of the string >> for >> the "fl" parameter (spaces, commas, plus signs). More than that - the >> query >> still doesn't want to fetch any docs (other than the one with the id >> specified in the query) even when the line solrQuery.setParam("fl", >> "title >> author score"); is commented out. So I suspect that the problem is that >> the >> request with the url >> "http://localhost:8080/solr/select?q=id:1&mlt=true&mlt.fl=content&..."; >> due >> to some reason doesn't work properly. And when I use the GetMethod(url) >> approach and send url directly in the form >> "http://localhost:8080/solr/mlt?q=id:1&mlt.fl=content&...";, Solr picks up >> the mlt component. (At lea
Re: Trie vs long string for sorting
: My data are library call numbers, normalized to be comparable, resulting in : (maximum) 21-character strings of the form "RK 052180H359~999~999" : : Now, these are fine -- they work for sorting and ranges and the whole thing, : but right now I can't use them because I've got two or three for each of my : 6M documents and on a 32-bit machine I run out of heap. : : Another option would be to turn them into longs (using roughly 56 bits of : the 64 bit space) and use a trie type. Is there any sort of a win involved : there? I don't think Trie fields can be used for sorting (because they result in multiple terms per doc) but i could be wrong about that, smarter people then me may have done something cool with the TreiField that i'm not aware of. As a general rule: if you have character data that fits a rigid enough set of constraints that you can encode any legal value into a single numberic value (either int, or long) such that they still sort properly, sorting on those encoded values is going to be more memory efficient (and probably just as fast) as sorting on the string values. -Hoss
Re: Delete, Commit, Add Interaction
: :collection:foo : : : : : . : : ... : Finally, here's the behavior we're seeing. In some cases, usually when : the index is starting to get larger (approaching 500,000 documents), : the above procedure will fail to add anything to the index. That is, none : of the commands return an error code, there is no indication of a problem : in the log files and the process DOES take some amount of time to That really shouldn't happen. if you were using embedded solr, or some crazy UpdateProcessor, i can imagine encountering a code path where your adds got processed before your delete -- but not if you are using HTTP to send XML like that each in a separate HTTP Connection as you describe. : If this is happening, how can I know when the delete has been processed : before initiating the add process? When the command after the delete returns a 200 status code, the delete is done. *DONE* Done, completley done, over and done nothing funky going on under the covers done. can you post some of your log messages from one of these problematic instances? I'm particularly intersted in the INFO level messages from the LogUpdateProcessor.finsh and SolrCore.execute that say things like... Jul 4, 2009 12:38:43 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 0 Jul 4, 2009 12:38:43 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=0 QTime=0 ...that was a delete (not sure why the msg from LogUpdateProcessor is empty) then somehting like this from the commit... Jul 4, 2009 12:39:55 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Jul 4, 2009 12:39:55 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening searc...@15ccfb1 main < ... snip a bunch of logging about autowarming various caches ... > Jul 4, 2009 12:39:55 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {commit=} 0 50 Jul 4, 2009 12:39:55 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=0 QTime=50 ...and then a bunch of adds... Jul 4, 2009 12:41:37 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[SP2514N, 6H500F0]} 0 24 Jul 4, 2009 12:41:37 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=0 QTime=24 Jul 4, 2009 12:41:37 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[F8V7067-APL-KIT, IW-02]} 0 9 ...which should be followed by another commit getting logged. These log messages are all from the example runnning in jetty, your log format may vary. What I'm particularly interested is the timestamps on these log messages so if you can turn on millisecond time resolution that would be best ... i want to see when exactly the delete/commit/add/commit comands are getting executed. -Hoss
Re: Delete, Commit, Add Interaction
: Jul 4, 2009 12:38:43 PM org.apache.solr.update.processor.LogUpdateProcessor finish : INFO: {} 0 0 : Jul 4, 2009 12:38:43 PM org.apache.solr.core.SolrCore execute : INFO: [] webapp=/solr path=/update params={} status=0 QTime=0 : : ...that was a delete (not sure why the msg from LogUpdateProcessor is : empty) then somehting like this from the commit... ...it was fat finger user error on my part, the log msg from delete should look like... Jul 4, 2009 12:46:30 PM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {deleteByQuery=name:foo} 0 12 Jul 4, 2009 12:46:30 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={} status=0 QTime=12 -Hoss
Re: Trie vs long string for sorting
Trie has a custom parser that can load the FieldCache for sorting. Its basically a built in type now, that supports fieldcache, sorting, stored fields, etc. On Sat, Jul 4, 2009 at 3:27 PM, Chris Hostetter wrote: > > : My data are library call numbers, normalized to be comparable, resulting > in > : (maximum) 21-character strings of the form "RK 052180H359~999~999" > : > : Now, these are fine -- they work for sorting and ranges and the whole > thing, > : but right now I can't use them because I've got two or three for each of > my > : 6M documents and on a 32-bit machine I run out of heap. > : > : Another option would be to turn them into longs (using roughly 56 bits of > : the 64 bit space) and use a trie type. Is there any sort of a win > involved > : there? > > I don't think Trie fields can be used for sorting (because they result in > multiple terms per doc) but i could be wrong about that, smarter people > then me may have done something cool with the TreiField that i'm not aware > of. > > As a general rule: if you have character data that fits a rigid enough set > of constraints that you can encode any legal value into a single > numberic value (either int, or long) such that they still sort properly, > sorting on those encoded values is going to be more memory efficient (and > probably just as fast) as sorting on the string values. > > > -Hoss > > -- -- - Mark http://www.lucidimagination.com
Problem in parsing non-string dynamic field by using IndexReader
I have a task to parse all documents in a solr index. I use Lucene IndexReader to read the index and go through each field from all documents. However, for float or int dynamic fields, the stringValue() call always returns some special characters. I tried tokenStreamValue, byteValue, readerValue, and they return null. Following is my method to parse the solr index. My question is, how can I get the values from non-string dynamic fields properly? public static void main(String[] args) throws Exception { IndexReader reader = IndexReader.open("/path/to/my/index/directory"); int total = reader.numDocs(); System.out.println("Total documents: " + total); for (int i = 0; i < 1; i++) { Document d = reader.document(i); List fields = d.getFields(); for (Field f : fields) { String name = f.name(); String val = f.stringValue(); System.out.println("get field / value: [" + name + "=" + val + "]");} } reader.close(); }