soft commits in EmbeddedSolrServer

2011-09-16 Thread Raimon Bosch
Hi all,

I'm checking how to do soft commits with the new version of Solr. I'm using
EmbeddedSolrServer to add documents to my index. How can I perform a soft
commit using this class? Is it possible? Or should I use the trunk?

http://wiki.apache.org/solr/NearRealtimeSearch
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html

Thanks in advance,
Raimon Bosch.


Time Stats

2012-02-26 Thread Raimon Bosch
Hi,

Today I was playing with StatsComponent just to extract some statistics
from my index. I'm using a solr index to store user searches. Basically
what I did is to aggregate data from accesslog into my solr index. So now I
can see average bounce rate for a group of user searches and see which ones
are performing better in google.

Now I would like to see the evolution of this stats throught time. For that
I would need to have a field with a different values throught time i.e.

"flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
"flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
"flats for rent new york" at 1/2/2012 => bounce_rate=46.4%

There is any solr type field that could fit to solve this?

Thanks in advance,
Raimon Bosch.


Re: Time Stats

2012-02-27 Thread Raimon Bosch
Anyone up to provide an answer?

The idea is have a kind of CustomInteger compound by an array of
timestamps. The value shown in this field would be based in the date range
that you're sending.

Biggest problem will be that this field would be in all the documents on
your solr index so you need to calculate this number in real-time.

2012/2/26 Raimon Bosch 

>
> Hi,
>
> Today I was playing with StatsComponent just to extract some statistics
> from my index. I'm using a solr index to store user searches. Basically
> what I did is to aggregate data from accesslog into my solr index. So now I
> can see average bounce rate for a group of user searches and see which ones
> are performing better in google.
>
> Now I would like to see the evolution of this stats throught time. For
> that I would need to have a field with a different values throught time i.e.
>
> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
>
> There is any solr type field that could fit to solve this?
>
> Thanks in advance,
> Raimon Bosch.
>


Re: Time Stats

2012-03-09 Thread Raimon Bosch
The answer is so easy. Just need to create an index with each visit. In
this way I could use faceted date search to create time statistics.

"flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
"flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
"flats for rent new york" at 1/2/2012 => bounce_rate=46.4%

date:[1/12/2011 - 1/1/2012]
"flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
"flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
mean=49.15%

date:[1/1/2012 - 1/2/2012]
"flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
"flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
mean=49.05%

With my initial approach I would save some disk and memory space. I'm still
wondering if it is possible.

2012/2/27 Raimon Bosch 

>
> Anyone up to provide an answer?
>
> The idea is have a kind of CustomInteger compound by an array of
> timestamps. The value shown in this field would be based in the date range
> that you're sending.
>
> Biggest problem will be that this field would be in all the documents on
> your solr index so you need to calculate this number in real-time.
>
>
> 2012/2/26 Raimon Bosch 
>
>>
>> Hi,
>>
>> Today I was playing with StatsComponent just to extract some statistics
>> from my index. I'm using a solr index to store user searches. Basically
>> what I did is to aggregate data from accesslog into my solr index. So now I
>> can see average bounce rate for a group of user searches and see which ones
>> are performing better in google.
>>
>> Now I would like to see the evolution of this stats throught time. For
>> that I would need to have a field with a different values throught time i.e.
>>
>> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
>> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
>> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
>>
>> There is any solr type field that could fit to solve this?
>>
>> Thanks in advance,
>> Raimon Bosch.
>>
>
>


Re: Time Stats

2012-03-09 Thread Raimon Bosch
second mean is 48.05%...

2012/3/9 Raimon Bosch 

> The answer is so easy. Just need to create an index with each visit. In
> this way I could use faceted date search to create time statistics.
>
> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
>
> date:[1/12/2011 - 1/1/2012]
> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
> mean=49.15%
>
> date:[1/1/2012 - 1/2/2012]
> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
> mean=49.05%
>
> With my initial approach I would save some disk and memory space. I'm
> still wondering if it is possible.
>
> 2012/2/27 Raimon Bosch 
>
>>
>> Anyone up to provide an answer?
>>
>> The idea is have a kind of CustomInteger compound by an array of
>> timestamps. The value shown in this field would be based in the date range
>> that you're sending.
>>
>> Biggest problem will be that this field would be in all the documents on
>> your solr index so you need to calculate this number in real-time.
>>
>>
>> 2012/2/26 Raimon Bosch 
>>
>>>
>>> Hi,
>>>
>>> Today I was playing with StatsComponent just to extract some statistics
>>> from my index. I'm using a solr index to store user searches. Basically
>>> what I did is to aggregate data from accesslog into my solr index. So now I
>>> can see average bounce rate for a group of user searches and see which ones
>>> are performing better in google.
>>>
>>> Now I would like to see the evolution of this stats throught time. For
>>> that I would need to have a field with a different values throught time i.e.
>>>
>>> "flats for rent new york" at 1/12/2011 => bounce_rate=48.6%
>>> "flats for rent new york" at 1/1/2012 => bounce_rate=49.7%
>>> "flats for rent new york" at 1/2/2012 => bounce_rate=46.4%
>>>
>>> There is any solr type field that could fit to solve this?
>>>
>>> Thanks in advance,
>>> Raimon Bosch.
>>>
>>
>>
>


Re: soft commits in EmbeddedSolrServer

2012-06-25 Thread Raimon Bosch
Old question but I'm still wondering if this is possible. I'm using Solr
4.0.

Can I use the EmbeddedSolrServer to perform soft commits?

2011/9/16 Raimon Bosch 

> Hi all,
>
> I'm checking how to do soft commits with the new version of Solr. I'm
> using EmbeddedSolrServer to add documents to my index. How can I perform a
> soft commit using this class? Is it possible? Or should I use the trunk?
>
> http://wiki.apache.org/solr/NearRealtimeSearch
>
> http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html
>
> Thanks in advance,
> Raimon Bosch.
>


Re: soft commits in EmbeddedSolrServer

2012-06-28 Thread Raimon Bosch
Yes,

This worked for me:

//Solr Server initialization
System.setProperty("solr.solr.home", solrHome);
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
coreContainer = initializer.initialize();
server = new EmbeddedSolrServer(coreContainer, "your_corename");

//Create your SolrInputDocument doc
...

//Soft commit
UpdateRequest req = new UpdateRequest();
req.setAction(ACTION.COMMIT, false, false, true);
req.add( doc );
UpdateResponse rsp = req.process( server );

Regards,
Raimon Bosch.

2012/6/26 Mark Miller 

> Yes - just pass the param same as you would if not using embedded
>
> On Jun 25, 2012, at 4:40 PM, Raimon Bosch wrote:
>
> > Old question but I'm still wondering if this is possible. I'm using Solr
> > 4.0.
> >
> > Can I use the EmbeddedSolrServer to perform soft commits?
> >
> > 2011/9/16 Raimon Bosch 
> >
> >> Hi all,
> >>
> >> I'm checking how to do soft commits with the new version of Solr. I'm
> >> using EmbeddedSolrServer to add documents to my index. How can I
> perform a
> >> soft commit using this class? Is it possible? Or should I use the trunk?
> >>
> >> http://wiki.apache.org/solr/NearRealtimeSearch
> >>
> >>
> http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html
> >>
> >> Thanks in advance,
> >> Raimon Bosch.
> >>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>


More Like this without a document?

2012-11-05 Thread Raimon Bosch
Hi,

I'm designing a K-nearest neighbors classifier for Solr. So I am taking
information IMDB and creating a set of documents with the description of
each movie and the categories selected for each document.

To validate if the classification is correct I'm using cross-validation. So
I do not include in the index the documents that I want to guess.

If I want to use MoreLikeThis algorithm I need to add this documents in the
index? The MoreLikeThis will work with soft commits? Is there a solution to
do a MoreLikeThis without adding the document in the index?

Thanks,
Raimon Bosch.


Using relevance scores for psuedo-random-probabilistic ordenation

2009-07-08 Thread Raimon Bosch


Hi,

I've just implemented my PseudoRandomFieldComparator (migrated from
PseudoRandomComparatorSource). The problem that I see is that I don't have
acces to the relevance's scores in the deprecated
PseudoRandomComparatorSource. I'm trying to fill the scores from my
PseudoRandomComponent (in the process() method).

I don't know if use a PseudoRandomComparator that extends from
QueryComponent and then repeat the query or sth similar like reorder my
doclist, or if use two diferent components QueryComponent and
PseudoComponent (extends from SearchComponent) and look for a good
combination.

How can I have my relevance scores on my PseudoRandomFieldComparator? Any
ideas?


Regards,
Raimon Bosch.
-- 
View this message in context: 
http://www.nabble.com/Using-relevance-scores-for-psuedo-random-probabilistic-ordenation-tp24392432p24392432.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using relevance scores for pseudo-random-probabilistic ordenation

2009-07-09 Thread Raimon Bosch


It Worked for me changing:

public void setScorer(Scorer scorer) {
  this.scorer = new ScoreCachingWrappingScorer(scorer);
}

by

public void setScorer(Scorer scorer) {
  this.scorer = scorer;
}

in my PseudoRandomFieldComparator.

Regards,
Raimon Bosch.


Raimon Bosch wrote:
> 
> Hi,
> 
> I've just implemented my PseudoRandomFieldComparator (migrated from
> PseudoRandomComparatorSource). The problem that I see is that I don't have
> acces to the relevance's scores like in the deprecated class
> ComparatorSource. I'm trying to fill the scores from my
> PseudoRandomComponent (in the process() method).
> 
> I don't know if use a PseudoRandomComparator that extends from
> QueryComponent and then repeat the query or sth similar like reorder my
> doclist, or if use two diferent components QueryComponent and
> PseudoRandomComponent (extending from SearchComponent) and look for a good
> combination.
> 
> How can I have my relevance scores on my PseudoRandomFieldComparator? Any
> ideas?
> 
> 
> Regards,
> Raimon Bosch.
> 

-- 
View this message in context: 
http://www.nabble.com/Using-relevance-scores-for-pseudo-random-probabilistic-ordenation-tp24392432p24409785.html
Sent from the Solr - User mailing list archive at Nabble.com.



Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch


Hi,

In our indexes, sometimes we have some documents written in other languages
different to the most common index's language. Is there any way to give less
boosting to this documents?

Thanks in advance,
Raimon Bosch.
-- 
View this message in context: 
http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27455759.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Is it posible to exclude results from other languages?

2010-02-04 Thread Raimon Bosch


Yes, It's true that we could do it in index time if we had a way to know. I
was thinking in some solution in search time, maybe measuring the % of
stopwords of each document. Normally, a document of another language won't
have any stopword of its main language.

If you know some external software to detect the language of a source text,
it would be useful too.

Thanks,
Raimon Bosch.



Ahmet Arslan wrote:
> 
> 
>> In our indexes, sometimes we have some documents written in
>> other languages
>> different to the most common index's language. Is there any
>> way to give less
>> boosting to this documents?
> 
> If you are aware of those documents, at index time you can boost those
> documents with a value less than 1.0:
> 
> 
>   
> // document written in other languages
> ...
> ...
>   
> 
> 
> http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_on_.22doc.22
>  
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Is-it-posible-to-exclude-results-from-other-languages--tp27455759p27457165.html
Sent from the Solr - User mailing list archive at Nabble.com.



some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch


Hi,

We did some tests with omitNorms=false. We have seen that in the last
result's page we have some scores set to 0.0. This scores setted to 0 are
problematic to our sorters.

It could be some kind of bug?

Regrads,
Raimon Bosch.
-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637436.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: some scores to 0 using omitNorns=false

2010-02-18 Thread Raimon Bosch


I am not an expert in lucene scoring formula, but omintNorms=false makes the
scoring formula a little bit more complex, taking into account boosting for
fields and documents. If I'm not wrong (if I am please, correct me) I think
that with omitNorms=false take into account the queryNorm(q) and norm(t,d)
from formula: score(q,d)   =   coord(q,d)  ·  queryNorm(q)  ·∑  
 ( 
tf(t in d)  ·  idf(t)2  ·  t.getBoost() ·  norm(t,d)  ) so the formula will
be more complex.

See
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html,
and
http://old.nabble.com/scores-are-the-same-for-many-diferent-documents-td27623039.html#a27623039

multiValued option is used to create fields with multiple values.

We use it one of our indexed modifying the schema.xml, adding a new field

...

...

This field is processed in a specific UpdateRequestProcessorFactory (write
by us) from a comma separated field called 's_similar_names':
...
public void processAdd(AddUpdateCommand cmd) throws IOException {
SolrInputDocument doc = cmd.getSolrInputDocument();

String v = (String)doc.getFieldValue( "s_similar_names" );
if( v != null ) {
  String s_similar_names[] = v.split(",");
  for(String s_similar_name : s_similar_names){
if(!s_similar_name.equals(""))
doc.addField( "s_similar_name", s_similar_name );
  }
}

// pass it up the chain
super.processAdd(cmd);
  }
...

A processofactory is specified in solrconfig.xml

...
# 
#   
#   
#   
#   
...

and adding this chain to XmlUpdateRequestHandler in solrconfig.xml:

...
#   
#   
#mychain  
#
#   
...

termVector is used to save more info about terns of a document in the index
and save computational time in functions like MoreLikeThis.
http://wiki.apache.org/solr/TermVectorComponent. We don't use it.


adeelmahmood wrote:
> 
> I was gonna ask a question about this but you seem like you might have the
> answer for me .. wat exactly is the omitNorms field do (or is expected to
> do) .. also if you could please help me understand what termVectors and
> multiValued options do ??
> Thanks for ur help
> 
> 
> Raimon Bosch wrote:
>> 
>> 
>> Hi,
>> 
>> We did some tests with omitNorms=false. We have seen that in the last
>> result's page we have some scores set to 0.0. This scores setted to 0 are
>> problematic to our sorters.
>> 
>> It could be some kind of bug?
>> 
>> Regrads,
>> Raimon Bosch.
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27637827.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: some scores to 0 using omitNorns=false

2010-02-24 Thread Raimon Bosch


We have just tested it with the last version of Solr and we still have
scores to 0.


adeelmahmood wrote:
> 
> I was gonna ask a question about this but you seem like you might have the
> answer for me .. wat exactly is the omitNorms field do (or is expected to
> do) .. also if you could please help me understand what termVectors and
> multiValued options do ??
> Thanks for ur help
> 
> 
> Raimon Bosch wrote:
>> 
>> 
>> Hi,
>> 
>> We did some tests with omitNorms=false. We have seen that in the last
>> result's page we have some scores set to 0.0. This scores setted to 0 are
>> problematic to our sorters.
>> 
>> It could be some kind of bug?
>> 
>> Regrads,
>> Raimon Bosch.
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/some-scores-to-0-using-omitNorns%3Dfalse-tp27637436p27714191.html
Sent from the Solr - User mailing list archive at Nabble.com.