soft commits in EmbeddedSolrServer
Hi all, I'm checking how to do soft commits with the new version of Solr. I'm using EmbeddedSolrServer to add documents to my index. How can I perform a soft commit using this class? Is it possible? Or should I use the trunk? http://wiki.apache.org/solr/NearRealtimeSearch http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html Thanks in advance, Raimon Bosch.
Time Stats
Hi, Today I was playing with StatsComponent just to extract some statistics from my index. I'm using a solr index to store user searches. Basically what I did is to aggregate data from accesslog into my solr index. So now I can see average bounce rate for a group of user searches and see which ones are performing better in google. Now I would like to see the evolution of this stats throught time. For that I would need to have a field with a different values throught time i.e. "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% There is any solr type field that could fit to solve this? Thanks in advance, Raimon Bosch.
Re: Time Stats
Anyone up to provide an answer? The idea is have a kind of CustomInteger compound by an array of timestamps. The value shown in this field would be based in the date range that you're sending. Biggest problem will be that this field would be in all the documents on your solr index so you need to calculate this number in real-time. 2012/2/26 Raimon Bosch > > Hi, > > Today I was playing with StatsComponent just to extract some statistics > from my index. I'm using a solr index to store user searches. Basically > what I did is to aggregate data from accesslog into my solr index. So now I > can see average bounce rate for a group of user searches and see which ones > are performing better in google. > > Now I would like to see the evolution of this stats throught time. For > that I would need to have a field with a different values throught time i.e. > > "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% > "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% > "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% > > There is any solr type field that could fit to solve this? > > Thanks in advance, > Raimon Bosch. >
Re: Time Stats
The answer is so easy. Just need to create an index with each visit. In this way I could use faceted date search to create time statistics. "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% date:[1/12/2011 - 1/1/2012] "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% mean=49.15% date:[1/1/2012 - 1/2/2012] "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% mean=49.05% With my initial approach I would save some disk and memory space. I'm still wondering if it is possible. 2012/2/27 Raimon Bosch > > Anyone up to provide an answer? > > The idea is have a kind of CustomInteger compound by an array of > timestamps. The value shown in this field would be based in the date range > that you're sending. > > Biggest problem will be that this field would be in all the documents on > your solr index so you need to calculate this number in real-time. > > > 2012/2/26 Raimon Bosch > >> >> Hi, >> >> Today I was playing with StatsComponent just to extract some statistics >> from my index. I'm using a solr index to store user searches. Basically >> what I did is to aggregate data from accesslog into my solr index. So now I >> can see average bounce rate for a group of user searches and see which ones >> are performing better in google. >> >> Now I would like to see the evolution of this stats throught time. For >> that I would need to have a field with a different values throught time i.e. >> >> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% >> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% >> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% >> >> There is any solr type field that could fit to solve this? >> >> Thanks in advance, >> Raimon Bosch. >> > >
Re: Time Stats
second mean is 48.05%... 2012/3/9 Raimon Bosch > The answer is so easy. Just need to create an index with each visit. In > this way I could use faceted date search to create time statistics. > > "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% > "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% > "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% > > date:[1/12/2011 - 1/1/2012] > "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% > "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% > mean=49.15% > > date:[1/1/2012 - 1/2/2012] > "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% > "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% > mean=49.05% > > With my initial approach I would save some disk and memory space. I'm > still wondering if it is possible. > > 2012/2/27 Raimon Bosch > >> >> Anyone up to provide an answer? >> >> The idea is have a kind of CustomInteger compound by an array of >> timestamps. The value shown in this field would be based in the date range >> that you're sending. >> >> Biggest problem will be that this field would be in all the documents on >> your solr index so you need to calculate this number in real-time. >> >> >> 2012/2/26 Raimon Bosch >> >>> >>> Hi, >>> >>> Today I was playing with StatsComponent just to extract some statistics >>> from my index. I'm using a solr index to store user searches. Basically >>> what I did is to aggregate data from accesslog into my solr index. So now I >>> can see average bounce rate for a group of user searches and see which ones >>> are performing better in google. >>> >>> Now I would like to see the evolution of this stats throught time. For >>> that I would need to have a field with a different values throught time i.e. >>> >>> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6% >>> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7% >>> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4% >>> >>> There is any solr type field that could fit to solve this? >>> >>> Thanks in advance, >>> Raimon Bosch. >>> >> >> >
Re: soft commits in EmbeddedSolrServer
Old question but I'm still wondering if this is possible. I'm using Solr 4.0. Can I use the EmbeddedSolrServer to perform soft commits? 2011/9/16 Raimon Bosch > Hi all, > > I'm checking how to do soft commits with the new version of Solr. I'm > using EmbeddedSolrServer to add documents to my index. How can I perform a > soft commit using this class? Is it possible? Or should I use the trunk? > > http://wiki.apache.org/solr/NearRealtimeSearch > > http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html > > Thanks in advance, > Raimon Bosch. >
Re: soft commits in EmbeddedSolrServer
Yes, This worked for me: //Solr Server initialization System.setProperty("solr.solr.home", solrHome); CoreContainer.Initializer initializer = new CoreContainer.Initializer(); coreContainer = initializer.initialize(); server = new EmbeddedSolrServer(coreContainer, "your_corename"); //Create your SolrInputDocument doc ... //Soft commit UpdateRequest req = new UpdateRequest(); req.setAction(ACTION.COMMIT, false, false, true); req.add( doc ); UpdateResponse rsp = req.process( server ); Regards, Raimon Bosch. 2012/6/26 Mark Miller > Yes - just pass the param same as you would if not using embedded > > On Jun 25, 2012, at 4:40 PM, Raimon Bosch wrote: > > > Old question but I'm still wondering if this is possible. I'm using Solr > > 4.0. > > > > Can I use the EmbeddedSolrServer to perform soft commits? > > > > 2011/9/16 Raimon Bosch > > > >> Hi all, > >> > >> I'm checking how to do soft commits with the new version of Solr. I'm > >> using EmbeddedSolrServer to add documents to my index. How can I > perform a > >> soft commit using this class? Is it possible? Or should I use the trunk? > >> > >> http://wiki.apache.org/solr/NearRealtimeSearch > >> > >> > http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html > >> > >> Thanks in advance, > >> Raimon Bosch. > >> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >
More Like this without a document?
Hi, I'm designing a K-nearest neighbors classifier for Solr. So I am taking information IMDB and creating a set of documents with the description of each movie and the categories selected for each document. To validate if the classification is correct I'm using cross-validation. So I do not include in the index the documents that I want to guess. If I want to use MoreLikeThis algorithm I need to add this documents in the index? The MoreLikeThis will work with soft commits? Is there a solution to do a MoreLikeThis without adding the document in the index? Thanks, Raimon Bosch.
Using relevance scores for psuedo-random-probabilistic ordenation
Hi, I've just implemented my PseudoRandomFieldComparator (migrated from PseudoRandomComparatorSource). The problem that I see is that I don't have acces to the relevance's scores in the deprecated PseudoRandomComparatorSource. I'm trying to fill the scores from my PseudoRandomComponent (in the process() method). I don't know if use a PseudoRandomComparator that extends from QueryComponent and then repeat the query or sth similar like reorder my doclist, or if use two diferent components QueryComponent and PseudoComponent (extends from SearchComponent) and look for a good combination. How can I have my relevance scores on my PseudoRandomFieldComparator? Any ideas? Regards, Raimon Bosch. -- View this message in context: http://www.nabble.com/Using-relevance-scores-for-psuedo-random-probabilistic-ordenation-tp24392432p24392432.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using relevance scores for pseudo-random-probabilistic ordenation
It Worked for me changing: public void setScorer(Scorer scorer) { this.scorer = new ScoreCachingWrappingScorer(scorer); } by public void setScorer(Scorer scorer) { this.scorer = scorer; } in my PseudoRandomFieldComparator. Regards, Raimon Bosch. Raimon Bosch wrote: > > Hi, > > I've just implemented my PseudoRandomFieldComparator (migrated from > PseudoRandomComparatorSource). The problem that I see is that I don't have > acces to the relevance's scores like in the deprecated class > ComparatorSource. I'm trying to fill the scores from my > PseudoRandomComponent (in the process() method). > > I don't know if use a PseudoRandomComparator that extends from > QueryComponent and then repeat the query or sth similar like reorder my > doclist, or if use two diferent components QueryComponent and > PseudoRandomComponent (extending from SearchComponent) and look for a good > combination. > > How can I have my relevance scores on my PseudoRandomFieldComparator? Any > ideas? > > > Regards, > Raimon Bosch. > -- View this message in context: http://www.nabble.com/Using-relevance-scores-for-pseudo-random-probabilistic-ordenation-tp24392432p24409785.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it posible to exclude results from other languages?
Hi, In our indexes, sometimes we have some documents written in other languages different to the most common index's language. Is there any way to give less boosting to this documents? Thanks in advance, Raimon Bosch. -- View this message in context: http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27455759.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it posible to exclude results from other languages?
Yes, It's true that we could do it in index time if we had a way to know. I was thinking in some solution in search time, maybe measuring the % of stopwords of each document. Normally, a document of another language won't have any stopword of its main language. If you know some external software to detect the language of a source text, it would be useful too. Thanks, Raimon Bosch. Ahmet Arslan wrote: > > >> In our indexes, sometimes we have some documents written in >> other languages >> different to the most common index's language. Is there any >> way to give less >> boosting to this documents? > > If you are aware of those documents, at index time you can boost those > documents with a value less than 1.0: > > > > // document written in other languages > ... > ... > > > > http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22 > > > > > > -- View this message in context: http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27457165.html Sent from the Solr - User mailing list archive at Nabble.com.
some scores to 0 using omitNorns=false
Hi, We did some tests with omitNorms=false. We have seen that in the last result's page we have some scores set to 0.0. This scores setted to 0 are problematic to our sorters. It could be some kind of bug? Regrads, Raimon Bosch. -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: some scores to 0 using omitNorns=false
I am not an expert in lucene scoring formula, but omintNorms=false makes the scoring formula a little bit more complex, taking into account boosting for fields and documents. If I'm not wrong (if I am please, correct me) I think that with omitNorms=false take into account the queryNorm(q) and norm(t,d) from formula: score(q,d) = coord(q,d) · queryNorm(q) ·∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) ) so the formula will be more complex. See http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html, and http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039 multiValued option is used to create fields with multiple values. We use it one of our indexed modifying the schema.xml, adding a new field ... ... This field is processed in a specific UpdateRequestProcessorFactory (write by us) from a comma separated field called 's_similar_names': ... public void processAdd(AddUpdateCommand cmd) throws IOException { SolrInputDocument doc = cmd.getSolrInputDocument(); String v = (String)doc.getFieldValue( "s_similar_names" ); if( v != null ) { String s_similar_names[] = v.split(","); for(String s_similar_name : s_similar_names){ if(!s_similar_name.equals("")) doc.addField( "s_similar_name", s_similar_name ); } } // pass it up the chain super.processAdd(cmd); } ... A processofactory is specified in solrconfig.xml ... # # # # # ... and adding this chain to XmlUpdateRequestHandler in solrconfig.xml: ... # # #mychain # # ... termVector is used to save more info about terns of a document in the index and save computational time in functions like MoreLikeThis. http://wiki.apache.org/solr/TermVectorComponent. We don't use it. adeelmahmood wrote: > > I was gonna ask a question about this but you seem like you might have the > answer for me .. wat exactly is the omitNorms field do (or is expected to > do) .. also if you could please help me understand what termVectors and > multiValued options do ?? > Thanks for ur help > > > Raimon Bosch wrote: >> >> >> Hi, >> >> We did some tests with omitNorms=false. We have seen that in the last >> result's page we have some scores set to 0.0. This scores setted to 0 are >> problematic to our sorters. >> >> It could be some kind of bug? >> >> Regrads, >> Raimon Bosch. >> > > -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: some scores to 0 using omitNorns=false
We have just tested it with the last version of Solr and we still have scores to 0. adeelmahmood wrote: > > I was gonna ask a question about this but you seem like you might have the > answer for me .. wat exactly is the omitNorms field do (or is expected to > do) .. also if you could please help me understand what termVectors and > multiValued options do ?? > Thanks for ur help > > > Raimon Bosch wrote: >> >> >> Hi, >> >> We did some tests with omitNorms=false. We have seen that in the last >> result's page we have some scores set to 0.0. This scores setted to 0 are >> problematic to our sorters. >> >> It could be some kind of bug? >> >> Regrads, >> Raimon Bosch. >> > > -- View this message in context: http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27714191.html Sent from the Solr - User mailing list archive at Nabble.com.